feat: Add stale process monitor, users route, landing page, archive old scripts

- Add backend stale process monitoring API (/api/stale-processes)
- Add users management route
- Add frontend landing page and stale process monitor UI on /scraper-tools
- Move old development scripts to backend/archive/
- Update frontend build with new features

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Kelly
2025-12-05 04:07:31 -07:00
parent d2d44d2aeb
commit d91c55a344
3115 changed files with 5755 additions and 719 deletions

View File

@@ -0,0 +1,180 @@
# Session Summary - Age Gate Bypass Implementation
## Problem
Brands weren't populating when clicking "Scrape Store" for Curaleaf dispensaries. Root cause: category discovery was finding 0 categories because age verification gates were blocking access to the Dutchie menu.
## What Was Accomplished
### 1. Created Universal Age Gate Bypass System
**File:** `/home/kelly/dutchie-menus/backend/src/utils/age-gate.ts`
Three main functions:
- `setAgeGateCookies(page, url, state)` - Sets age gate bypass cookies BEFORE navigation
- `bypassAgeGate(page, state)` - Attempts to bypass age gate AFTER page load using multiple methods
- `detectStateFromUrl(url)` - Auto-detects state from URL patterns
**Key Features:**
- Multiple bypass strategies: custom dropdowns, standard selects, buttons, state cards
- Enhanced React event dispatching (mousedown, mouseup, click, change, input)
- Proper failure detection - checks final URL to verify success
- Cookie-based approach as primary method
### 2. Updated Category Discovery
**File:** `/home/kelly/dutchie-menus/backend/src/services/category-discovery.ts`
Changes at lines 120-129:
```typescript
// Set age gate bypass cookies BEFORE navigation
const state = detectStateFromUrl(baseUrl);
await setAgeGateCookies(page, baseUrl, state);
await page.goto(baseUrl, { waitUntil: 'domcontentloaded', timeout: 60000 });
// If age gate still appears, try to bypass it
await bypassAgeGate(page, state);
```
### 3. Updated Scraper
**File:** `/home/kelly/dutchie-menus/backend/src/services/scraper.ts`
Changes at lines 379-392:
- Imports setAgeGateCookies (line 10)
- Sets cookies before navigation (line 381)
- Attempts bypass if age gate still appears (line 392)
### 4. Test Scripts Created
**`test-improved-age-gate.ts`** - Tests cookie-based bypass approach
**`capture-age-gate-cookies.ts`** - Opens visible browser to manually capture real cookies (requires X11)
**`debug-age-gate-detailed.ts`** - Visible browser debugging
**`debug-after-state-select.ts`** - Checks state after selecting Arizona
## Current Status
### ✅ Working
1. Age gate detection properly identifies gates
2. Cookie setting function works (cookies are set correctly)
3. Multiple bypass methods attempt to click through gates
4. Failure detection now accurately reports when bypass fails
5. Category discovery successfully created 10 categories for Curaleaf store (ID 18)
### ❌ Not Working
**Curaleaf's Specific Age Gate:**
- URL: `https://curaleaf.com/age-gate?returnurl=...`
- Uses React-based custom dropdown (shadcn/radix UI)
- Ignores cookies we set
- Doesn't respond to automated clicks
- Doesn't trigger navigation after state selection
**Why It Fails:**
- Curaleaf's React implementation likely checks for:
- Real user interaction patterns
- Additional signals beyond cookies (session storage, local storage)
- Specific event sequences that automation doesn't replicate
- Automated clicks don't trigger the React state changes needed
## Test Results
```bash
# Latest test - Cookie approach
DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npx tsx test-improved-age-gate.ts
```
**Result:** ❌ FAILED
- Cookies set successfully
- Page still redirects to `/age-gate`
- Click automation finds elements but navigation doesn't occur
## Database State
**Categories created:** 10 categories for store_id 18 (Curaleaf - 48th Street)
- Categories are in the database and ready
- Dutchie menu detection works
- Category URLs are properly formatted
**Brands:** Not tested yet (can't scrape without bypassing age gate)
## Files Modified
1. `/home/kelly/dutchie-menus/backend/src/utils/age-gate.ts` - Created
2. `/home/kelly/dutchie-menus/backend/src/services/category-discovery.ts` - Updated
3. `/home/kelly/dutchie-menus/backend/src/services/scraper.ts` - Updated
4. `/home/kelly/dutchie-menus/backend/curaleaf-cookies.json` - Test cookies (don't work)
## Next Steps / Options
### Option 1: Test with Different Store
Find a Dutchie dispensary with a simpler age gate that responds to automation.
### Option 2: Get Real Cookies
Run `capture-age-gate-cookies.ts` on a machine with display/X11:
```bash
# On machine with display
DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npx tsx capture-age-gate-cookies.ts
# Manually complete age gate in browser
# Press ENTER to capture real cookies
# Copy cookies to production server
```
### Option 3: Switch to Playwright
Playwright may handle React apps better than Puppeteer. Consider migrating the age gate bypass to use Playwright.
### Option 4: Use Proxy with Session Persistence
Use a rotating proxy service that maintains session state between requests.
### Option 5: Focus on Other Stores First
Skip Curaleaf for now and implement scraping for dispensaries without complex age gates, then circle back.
## How to Continue
1. **Check if categories exist:**
```sql
SELECT * FROM categories WHERE store_id = 18;
```
2. **Test scraping (will fail at age gate):**
```bash
# Via UI: Click "Scrape Store" button for Curaleaf
# OR via test script
```
3. **Try different store:**
- Find another Dutchie dispensary
- Add it to the stores table
- Run category discovery
- Test scraping
## Important Notes
- All cannabis dispensary sites will have age gates (as you correctly noted)
- The bypass infrastructure is solid and should work for simpler gates
- Curaleaf specifically uses advanced React patterns that resist automation
- Categories ARE created successfully despite age gate (using fallback detection)
- The scraper WILL work once we can bypass the age gate
## Key Insight
Category discovery succeeded because it checks the page source to detect Dutchie, then uses predefined categories. Product scraping requires actually viewing the product listings, which can't happen while stuck at the age gate.
## Commands to Resume
```bash
# Start backend (if not running)
cd /home/kelly/dutchie-menus/backend
PORT=3012 DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npm run dev
# Check categories
DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npx tsx -e "
import { pool } from './src/db/migrate';
const result = await pool.query('SELECT * FROM categories WHERE store_id = 18');
console.log(result.rows);
process.exit(0);
"
# Test age gate bypass
DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npx tsx test-improved-age-gate.ts
```
## What to Tell Claude When Resuming
"Continue working on the age gate bypass for Curaleaf. We have categories created but can't scrape products because the age gate blocks us. The cookie approach didn't work. Options: try a different store, get real cookies, or switch to Playwright."

View File

@@ -0,0 +1,71 @@
const { Pool } = require('pg');
const pool = new Pool({
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
});
const azStores = [
{ slug: 'curaleaf-az-48th-street-med', name: 'Curaleaf - 48th Street (Medical)', city: 'Phoenix' },
{ slug: 'curaleaf-az-83rd-dispensary-med', name: 'Curaleaf - 83rd Ave (Medical)', city: 'Phoenix' },
{ slug: 'curaleaf-az-bell-med', name: 'Curaleaf - Bell (Medical)', city: 'Phoenix' },
{ slug: 'curaleaf-az-camelback-med', name: 'Curaleaf - Camelback (Medical)', city: 'Phoenix' },
{ slug: 'curaleaf-az-central-med', name: 'Curaleaf - Central (Medical)', city: 'Phoenix' },
{ slug: 'curaleaf-az-gilbert-med', name: 'Curaleaf - Gilbert (Medical)', city: 'Gilbert' },
{ slug: 'curaleaf-az-glendale-east', name: 'Curaleaf - Glendale East', city: 'Glendale' },
{ slug: 'curaleaf-az-glendale-east-the-kind-relief-med', name: 'Curaleaf - Glendale East Kind Relief (Medical)', city: 'Glendale' },
{ slug: 'curaleaf-az-glendale-med', name: 'Curaleaf - Glendale (Medical)', city: 'Glendale' },
{ slug: 'curaleaf-az-midtown-med', name: 'Curaleaf - Midtown (Medical)', city: 'Phoenix' },
{ slug: 'curaleaf-az-peoria-med', name: 'Curaleaf - Peoria (Medical)', city: 'Peoria' },
{ slug: 'curaleaf-az-phoenix-med', name: 'Curaleaf - Phoenix Airport (Medical)', city: 'Phoenix' },
{ slug: 'curaleaf-az-queen-creek', name: 'Curaleaf - Queen Creek', city: 'Queen Creek' },
{ slug: 'curaleaf-az-queen-creek-whoa-qc-inc-med', name: 'Curaleaf - Queen Creek WHOA (Medical)', city: 'Queen Creek' },
{ slug: 'curaleaf-az-scottsdale-natural-remedy-patient-center-med', name: 'Curaleaf - Scottsdale Natural Remedy (Medical)', city: 'Scottsdale' },
{ slug: 'curaleaf-az-sedona-med', name: 'Curaleaf - Sedona (Medical)', city: 'Sedona' },
{ slug: 'curaleaf-az-tucson-med', name: 'Curaleaf - Tucson (Medical)', city: 'Tucson' },
{ slug: 'curaleaf-az-youngtown-med', name: 'Curaleaf - Youngtown (Medical)', city: 'Youngtown' }
];
async function addStores() {
const client = await pool.connect();
try {
for (const store of azStores) {
const dutchieUrl = `https://curaleaf.com/stores/${store.slug}`;
// Skip sandbox stores
if (store.slug.includes('sandbox')) continue;
const result = await client.query(`
INSERT INTO stores (
name,
slug,
dutchie_url,
active,
scrape_enabled,
logo_url
)
VALUES ($1, $2, $3, $4, $5, $6)
ON CONFLICT (slug) DO UPDATE
SET name = $1, dutchie_url = $3
RETURNING id, name
`, [
store.name,
store.slug,
dutchieUrl,
true,
true,
'https://curaleaf.com/favicon.ico' // Using favicon as logo for now
]);
console.log(`✅ Added: ${result.rows[0].name} (ID: ${result.rows[0].id})`);
}
console.log(`\n🎉 Successfully added ${azStores.length - 1} Curaleaf Arizona stores!`);
} catch (error) {
console.error('❌ Error adding stores:', error);
} finally {
client.release();
pool.end();
}
}
addStores();

View File

@@ -0,0 +1,71 @@
const { Pool } = require('pg');
const pool = new Pool({
connectionString: 'postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus'
});
const azStores = [
{ slug: 'curaleaf-az-48th-street-med', name: 'Curaleaf - 48th Street (Medical)', city: 'Phoenix' },
{ slug: 'curaleaf-az-83rd-dispensary-med', name: 'Curaleaf - 83rd Ave (Medical)', city: 'Phoenix' },
{ slug: 'curaleaf-az-bell-med', name: 'Curaleaf - Bell (Medical)', city: 'Phoenix' },
{ slug: 'curaleaf-az-camelback-med', name: 'Curaleaf - Camelback (Medical)', city: 'Phoenix' },
{ slug: 'curaleaf-az-central-med', name: 'Curaleaf - Central (Medical)', city: 'Phoenix' },
{ slug: 'curaleaf-az-gilbert-med', name: 'Curaleaf - Gilbert (Medical)', city: 'Gilbert' },
{ slug: 'curaleaf-az-glendale-east', name: 'Curaleaf - Glendale East', city: 'Glendale' },
{ slug: 'curaleaf-az-glendale-east-the-kind-relief-med', name: 'Curaleaf - Glendale East Kind Relief (Medical)', city: 'Glendale' },
{ slug: 'curaleaf-az-glendale-med', name: 'Curaleaf - Glendale (Medical)', city: 'Glendale' },
{ slug: 'curaleaf-az-midtown-med', name: 'Curaleaf - Midtown (Medical)', city: 'Phoenix' },
{ slug: 'curaleaf-az-peoria-med', name: 'Curaleaf - Peoria (Medical)', city: 'Peoria' },
{ slug: 'curaleaf-az-phoenix-med', name: 'Curaleaf - Phoenix Airport (Medical)', city: 'Phoenix' },
{ slug: 'curaleaf-az-queen-creek', name: 'Curaleaf - Queen Creek', city: 'Queen Creek' },
{ slug: 'curaleaf-az-queen-creek-whoa-qc-inc-med', name: 'Curaleaf - Queen Creek WHOA (Medical)', city: 'Queen Creek' },
{ slug: 'curaleaf-az-scottsdale-natural-remedy-patient-center-med', name: 'Curaleaf - Scottsdale Natural Remedy (Medical)', city: 'Scottsdale' },
{ slug: 'curaleaf-az-sedona-med', name: 'Curaleaf - Sedona (Medical)', city: 'Sedona' },
{ slug: 'curaleaf-az-tucson-med', name: 'Curaleaf - Tucson (Medical)', city: 'Tucson' },
{ slug: 'curaleaf-az-youngtown-med', name: 'Curaleaf - Youngtown (Medical)', city: 'Youngtown' }
];
async function addStores() {
const client = await pool.connect();
try {
for (const store of azStores) {
const dutchieUrl = `https://curaleaf.com/stores/${store.slug}`;
// Skip sandbox stores
if (store.slug.includes('sandbox')) continue;
const result = await client.query(`
INSERT INTO stores (
name,
slug,
dutchie_url,
active,
scrape_enabled,
logo_url
)
VALUES ($1, $2, $3, $4, $5, $6)
ON CONFLICT (slug) DO UPDATE
SET name = $1, dutchie_url = $3, logo_url = $6
RETURNING id, name
`, [
store.name,
store.slug,
dutchieUrl,
true,
true,
'https://curaleaf.com/favicon.ico' // Using favicon as logo for now
]);
console.log(`✅ Added: ${result.rows[0].name} (ID: ${result.rows[0].id})`);
}
console.log(`\n🎉 Successfully added ${azStores.length} Curaleaf Arizona stores!`);
} catch (error) {
console.error('❌ Error adding stores:', error);
} finally {
client.release();
pool.end();
}
}
addStores();

View File

@@ -0,0 +1,22 @@
import { pool } from './src/db/migrate.js';
async function main() {
console.log('Adding discount columns to products table...');
try {
await pool.query(`
ALTER TABLE products
ADD COLUMN IF NOT EXISTS discount_type VARCHAR(50),
ADD COLUMN IF NOT EXISTS discount_value VARCHAR(100);
`);
console.log('✅ Successfully added discount_type and discount_value columns');
} catch (error: any) {
console.error('❌ Error adding columns:', error.message);
throw error;
} finally {
await pool.end();
}
}
main().catch(console.error);

View File

@@ -0,0 +1,82 @@
import { pool } from './src/db/migrate';
async function addGeoFields() {
console.log('🗺️ Adding geo-location fields...\n');
try {
await pool.query(`
ALTER TABLE stores
ADD COLUMN IF NOT EXISTS latitude DECIMAL(10, 8),
ADD COLUMN IF NOT EXISTS longitude DECIMAL(11, 8),
ADD COLUMN IF NOT EXISTS region VARCHAR(100),
ADD COLUMN IF NOT EXISTS market_area VARCHAR(255),
ADD COLUMN IF NOT EXISTS timezone VARCHAR(50)
`);
console.log('✅ Added geo fields to stores table');
// Create indexes for geo queries
await pool.query(`
CREATE INDEX IF NOT EXISTS idx_stores_location ON stores(latitude, longitude) WHERE latitude IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_stores_city_state ON stores(city, state);
CREATE INDEX IF NOT EXISTS idx_stores_region ON stores(region);
CREATE INDEX IF NOT EXISTS idx_stores_market ON stores(market_area);
`);
console.log('✅ Created geo indexes');
// Create location-based views
await pool.query(`
CREATE OR REPLACE VIEW stores_by_region AS
SELECT
region,
state,
COUNT(*) as store_count,
COUNT(DISTINCT city) as cities,
array_agg(DISTINCT name ORDER BY name) as store_names
FROM stores
WHERE active = true
GROUP BY region, state
ORDER BY store_count DESC
`);
await pool.query(`
CREATE OR REPLACE VIEW market_coverage AS
SELECT
city,
state,
zip,
COUNT(*) as dispensaries,
array_agg(name ORDER BY name) as store_names,
COUNT(DISTINCT id) as unique_stores
FROM stores
WHERE active = true
GROUP BY city, state, zip
ORDER BY dispensaries DESC
`);
console.log('✅ Created location views');
console.log('\n✅ Geo-location setup complete!');
console.log('\n📊 Available location views:');
console.log(' - stores_by_region: Stores grouped by region/state');
console.log(' - market_coverage: Dispensary density by city');
console.log('\n💡 Your database now supports:');
console.log(' ✅ Lead Generation (contact info + locations)');
console.log(' ✅ Market Research (pricing + inventory data)');
console.log(' ✅ Investment Planning (market coverage + trends)');
console.log(' ✅ Retail Partner Discovery (store directory)');
console.log(' ✅ Geo-targeted Campaigns (lat/long + regions)');
console.log(' ✅ Trend Analysis (price history + timestamps)');
console.log(' ✅ Directory/App Creation (full store catalog)');
console.log(' ✅ Delivery Optimization (locations + addresses)');
} catch (error) {
console.error('❌ Error:', error);
} finally {
await pool.end();
}
}
addGeoFields();

View File

@@ -0,0 +1,215 @@
import { pool } from './src/db/migrate';
async function addPriceHistory() {
console.log('💰 Adding price history tracking...\n');
const client = await pool.connect();
try {
await client.query('BEGIN');
// Step 1: Create price_history table
console.log('1. Creating price_history table...');
await client.query(`
CREATE TABLE IF NOT EXISTS price_history (
id SERIAL PRIMARY KEY,
product_id INTEGER REFERENCES products(id) ON DELETE CASCADE,
store_id INTEGER REFERENCES stores(id) ON DELETE CASCADE,
brand_id INTEGER REFERENCES brands(id) ON DELETE SET NULL,
category_id INTEGER REFERENCES categories(id) ON DELETE SET NULL,
product_name VARCHAR(500),
price DECIMAL(10, 2),
sale_price DECIMAL(10, 2),
original_price DECIMAL(10, 2),
discount_percentage DECIMAL(5, 2),
discount_amount DECIMAL(10, 2),
in_stock BOOLEAN DEFAULT true,
is_special BOOLEAN DEFAULT false,
recorded_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
`);
// Step 2: Add indexes for fast price queries
console.log('2. Creating indexes for price queries...');
await client.query(`
CREATE INDEX IF NOT EXISTS idx_price_history_product ON price_history(product_id, recorded_at DESC);
CREATE INDEX IF NOT EXISTS idx_price_history_store ON price_history(store_id, recorded_at DESC);
CREATE INDEX IF NOT EXISTS idx_price_history_brand ON price_history(brand_id, recorded_at DESC);
CREATE INDEX IF NOT EXISTS idx_price_history_date ON price_history(recorded_at DESC);
CREATE INDEX IF NOT EXISTS idx_price_history_price_change ON price_history(product_id, price, recorded_at DESC);
`);
// Step 3: Create function to log price changes
console.log('3. Creating price change trigger function...');
await client.query(`
CREATE OR REPLACE FUNCTION log_price_change()
RETURNS TRIGGER AS $$
BEGIN
-- Only log if price actually changed or product just appeared
IF (TG_OP = 'INSERT') OR
(OLD.price IS DISTINCT FROM NEW.price) OR
(OLD.sale_price IS DISTINCT FROM NEW.sale_price) OR
(OLD.discount_percentage IS DISTINCT FROM NEW.discount_percentage) OR
(OLD.in_stock IS DISTINCT FROM NEW.in_stock) THEN
INSERT INTO price_history (
product_id, store_id, brand_id, category_id, product_name,
price, sale_price, original_price, discount_percentage, discount_amount,
in_stock, is_special, recorded_at
) VALUES (
NEW.id, NEW.store_id, NEW.brand_id, NEW.category_id, NEW.name,
NEW.price, NEW.sale_price, NEW.original_price,
NEW.discount_percentage, NEW.discount_amount,
NEW.in_stock, NEW.is_special, CURRENT_TIMESTAMP
);
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
`);
// Step 4: Create trigger on products table
console.log('4. Creating trigger on products table...');
await client.query(`
DROP TRIGGER IF EXISTS price_change_trigger ON products;
CREATE TRIGGER price_change_trigger
AFTER INSERT OR UPDATE ON products
FOR EACH ROW
EXECUTE FUNCTION log_price_change();
`);
// Step 5: Populate initial price history from existing products
console.log('5. Populating initial price history...');
await client.query(`
INSERT INTO price_history (
product_id, store_id, brand_id, category_id, product_name,
price, sale_price, original_price, discount_percentage, discount_amount,
in_stock, is_special, recorded_at
)
SELECT
id, store_id, brand_id, category_id, name,
price, sale_price, original_price, discount_percentage, discount_amount,
in_stock, is_special, COALESCE(first_seen_at, created_at)
FROM products
WHERE price IS NOT NULL
ON CONFLICT DO NOTHING
`);
const countResult = await client.query('SELECT COUNT(*) FROM price_history');
console.log(`${countResult.rows[0].count} price records created`);
// Step 6: Create helpful views for price monitoring
console.log('6. Creating price monitoring views...');
// Price changes view
await client.query(`
CREATE OR REPLACE VIEW price_changes AS
WITH price_with_previous AS (
SELECT
ph.*,
LAG(ph.price) OVER (PARTITION BY ph.product_id ORDER BY ph.recorded_at) as previous_price,
LAG(ph.recorded_at) OVER (PARTITION BY ph.product_id ORDER BY ph.recorded_at) as previous_date
FROM price_history ph
)
SELECT
pwp.product_id,
pwp.product_name,
s.name as store_name,
b.name as brand_name,
pwp.previous_price,
pwp.price as current_price,
pwp.price - pwp.previous_price as price_change,
ROUND(((pwp.price - pwp.previous_price) / NULLIF(pwp.previous_price, 0) * 100)::numeric, 2) as price_change_percent,
pwp.previous_date,
pwp.recorded_at as current_date
FROM price_with_previous pwp
JOIN stores s ON pwp.store_id = s.id
LEFT JOIN brands b ON pwp.brand_id = b.id
WHERE pwp.previous_price IS NOT NULL
AND pwp.price IS DISTINCT FROM pwp.previous_price
ORDER BY pwp.recorded_at DESC
`);
// Current prices view
await client.query(`
CREATE OR REPLACE VIEW current_prices AS
SELECT DISTINCT ON (product_id)
ph.product_id,
ph.product_name,
s.name as store_name,
b.name as brand_name,
c.name as category_name,
ph.price,
ph.sale_price,
ph.discount_percentage,
ph.in_stock,
ph.is_special,
ph.recorded_at as last_updated
FROM price_history ph
JOIN stores s ON ph.store_id = s.id
LEFT JOIN brands b ON ph.brand_id = b.id
LEFT JOIN categories c ON ph.category_id = c.id
ORDER BY ph.product_id, ph.recorded_at DESC
`);
// Price trends view (last 30 days)
await client.query(`
CREATE OR REPLACE VIEW price_trends_30d AS
WITH price_data AS (
SELECT
ph.product_id,
ph.product_name,
s.name as store_name,
ph.price,
ph.recorded_at,
ROW_NUMBER() OVER (PARTITION BY ph.product_id ORDER BY ph.recorded_at ASC) as first_row,
ROW_NUMBER() OVER (PARTITION BY ph.product_id ORDER BY ph.recorded_at DESC) as last_row
FROM price_history ph
JOIN stores s ON ph.store_id = s.id
WHERE ph.recorded_at >= CURRENT_DATE - INTERVAL '30 days'
)
SELECT
product_id,
product_name,
store_name,
COUNT(*) as price_points,
MIN(price) as min_price,
MAX(price) as max_price,
ROUND(AVG(price)::numeric, 2) as avg_price,
MAX(CASE WHEN first_row = 1 THEN price END) as starting_price,
MAX(CASE WHEN last_row = 1 THEN price END) as current_price
FROM price_data
GROUP BY product_id, product_name, store_name
`);
await client.query('COMMIT');
console.log('\n✅ Price history tracking enabled!');
console.log('\n📊 Available price monitoring views:');
console.log(' - price_changes: See all price increases/decreases');
console.log(' - current_prices: Latest price for each product');
console.log(' - price_trends_30d: Price trends over last 30 days');
console.log('\n💡 Example queries:');
console.log(' -- Recent price increases:');
console.log(' SELECT * FROM price_changes WHERE price_change > 0 ORDER BY current_date DESC LIMIT 10;');
console.log(' ');
console.log(' -- Products on sale:');
console.log(' SELECT * FROM current_prices WHERE sale_price IS NOT NULL;');
console.log(' ');
console.log(' -- Biggest price drops:');
console.log(' SELECT * FROM price_changes WHERE price_change < 0 ORDER BY price_change ASC LIMIT 10;');
} catch (error) {
await client.query('ROLLBACK');
console.error('❌ Error:', error);
throw error;
} finally {
client.release();
await pool.end();
}
}
addPriceHistory();

View File

@@ -0,0 +1,45 @@
import { Pool } from 'pg';
const pool = new Pool({
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
});
async function addLocationColumns() {
try {
console.log('Adding location columns to proxies table...\n');
await pool.query(`
ALTER TABLE proxies
ADD COLUMN IF NOT EXISTS city VARCHAR(100),
ADD COLUMN IF NOT EXISTS state VARCHAR(100),
ADD COLUMN IF NOT EXISTS country VARCHAR(100),
ADD COLUMN IF NOT EXISTS country_code VARCHAR(10),
ADD COLUMN IF NOT EXISTS latitude DECIMAL(10, 8),
ADD COLUMN IF NOT EXISTS longitude DECIMAL(11, 8)
`);
console.log('✅ Location columns added successfully\n');
// Show updated schema
const result = await pool.query(`
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'proxies'
ORDER BY ordinal_position
`);
console.log('Updated proxies table schema:');
console.log('─'.repeat(60));
result.rows.forEach(row => {
console.log(` ${row.column_name}: ${row.data_type}`);
});
console.log('─'.repeat(60));
} catch (error: any) {
console.error('Error:', error.message);
} finally {
await pool.end();
}
}
addLocationColumns();

View File

@@ -0,0 +1,74 @@
import { pool } from './src/db/migrate';
const solFlowerStores = [
{
name: 'Sol Flower - Sun City',
slug: 'sol-flower-sun-city',
dutchie_url: 'https://dutchie.com/dispensary/sol-flower-dispensary',
},
{
name: 'Sol Flower - South Tucson',
slug: 'sol-flower-south-tucson',
dutchie_url: 'https://dutchie.com/dispensary/sol-flower-dispensary-south-tucson',
},
{
name: 'Sol Flower - North Tucson',
slug: 'sol-flower-north-tucson',
dutchie_url: 'https://dutchie.com/dispensary/sol-flower-dispensary-north-tucson',
},
{
name: 'Sol Flower - McClintock (Tempe)',
slug: 'sol-flower-mcclintock',
dutchie_url: 'https://dutchie.com/dispensary/sol-flower-dispensary-mcclintock',
},
{
name: 'Sol Flower - Deer Valley (Phoenix)',
slug: 'sol-flower-deer-valley',
dutchie_url: 'https://dutchie.com/dispensary/sol-flower-dispensary-deer-valley',
},
];
async function addSolFlowerStores() {
console.log('🌻 Adding Sol Flower stores to database...\n');
try {
for (const store of solFlowerStores) {
// Check if store already exists
const existing = await pool.query(
'SELECT id FROM stores WHERE slug = $1',
[store.slug]
);
if (existing.rows.length > 0) {
console.log(`⏭️ Skipping ${store.name} - already exists (ID: ${existing.rows[0].id})`);
continue;
}
// Insert store
const result = await pool.query(
`INSERT INTO stores (name, slug, dutchie_url, active, scrape_enabled, logo_url)
VALUES ($1, $2, $3, true, true, $4)
RETURNING id`,
[store.name, store.slug, store.dutchie_url, 'https://dutchie.com/favicon.ico']
);
console.log(`✅ Added ${store.name} (ID: ${result.rows[0].id})`);
}
console.log('\n✅ All Sol Flower stores added successfully!');
// Show all stores
console.log('\n📊 All stores in database:');
const allStores = await pool.query(
'SELECT id, name, dutchie_url FROM stores ORDER BY id'
);
console.table(allStores.rows);
} catch (error) {
console.error('❌ Error:', error);
} finally {
await pool.end();
}
}
addSolFlowerStores();

View File

@@ -0,0 +1,90 @@
import { Pool } from 'pg';
const pool = new Pool({
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
});
async function addTestBrands() {
try {
// Update the store slug to match the local URL
console.log('Updating store slug...');
await pool.query(
`UPDATE stores
SET slug = $1,
dutchie_url = $2,
updated_at = NOW()
WHERE slug = $3`,
[
'curaleaf-az-48th-street',
'https://curaleaf.com/stores/curaleaf-dispensary-48th-street',
'curaleaf-az-48th-street-med'
]
);
console.log('✓ Store slug updated\n');
// Get the store ID
const storeResult = await pool.query(
'SELECT id FROM stores WHERE slug = $1',
['curaleaf-az-48th-street']
);
if (storeResult.rows.length === 0) {
console.log('Store not found!');
return;
}
const storeId = storeResult.rows[0].id;
// Sample products with brands commonly found at dispensaries
const testProducts = [
{ name: 'Select Elite Live Resin Cartridge - Clementine', brand: 'Select', price: 45.00, category: 'vape-pens', thc: 82.5 },
{ name: 'Curaleaf Flower - Blue Dream', brand: 'Curaleaf', price: 35.00, category: 'flower', thc: 22.0 },
{ name: 'Grassroots RSO Syringe', brand: 'Grassroots', price: 40.00, category: 'concentrates', thc: 75.0 },
{ name: 'Stiiizy Pod - Skywalker OG', brand: 'Stiiizy', price: 50.00, category: 'vape-pens', thc: 85.0 },
{ name: 'Cookies Flower - Gary Payton', brand: 'Cookies', price: 55.00, category: 'flower', thc: 28.0 },
{ name: 'Raw Garden Live Resin - Wedding Cake', brand: 'Raw Garden', price: 48.00, category: 'concentrates', thc: 80.5 },
{ name: 'Jeeter Pre-Roll - Zkittlez', brand: 'Jeeter', price: 12.00, category: 'pre-rolls', thc: 24.0 },
{ name: 'Kiva Camino Gummies - Wild Cherry', brand: 'Kiva', price: 20.00, category: 'edibles', thc: 5.0 },
{ name: 'Wyld Gummies - Raspberry', brand: 'Wyld', price: 18.00, category: 'edibles', thc: 10.0 },
{ name: 'Papa & Barkley Releaf Balm', brand: 'Papa & Barkley', price: 45.00, category: 'topicals', thc: 3.0 },
{ name: 'Brass Knuckles Cartridge - Gorilla Glue', brand: 'Brass Knuckles', price: 42.00, category: 'vape-pens', thc: 83.0 },
{ name: 'Heavy Hitters Ultra Extract - Sour Diesel', brand: 'Heavy Hitters', price: 55.00, category: 'concentrates', thc: 90.0 },
{ name: 'Cresco Liquid Live Resin - Pineapple Express', brand: 'Cresco', price: 50.00, category: 'vape-pens', thc: 87.0 },
{ name: 'Verano Pre-Roll - Mag Landrace', brand: 'Verano', price: 15.00, category: 'pre-rolls', thc: 26.0 },
{ name: 'Select Nano Gummies - Watermelon', brand: 'Select', price: 22.00, category: 'edibles', thc: 10.0 }
];
console.log(`Inserting ${testProducts.length} test products with brands...`);
console.log('─'.repeat(80));
for (const product of testProducts) {
await pool.query(
`INSERT INTO products (
store_id, name, brand, price, thc_percentage,
dutchie_url, in_stock
)
VALUES ($1, $2, $3, $4, $5, $6, true)`,
[
storeId,
product.name,
product.brand,
product.price,
product.thc,
`https://curaleaf.com/stores/curaleaf-dispensary-48th-street/product/${product.name.toLowerCase().replace(/\s+/g, '-')}`
]
);
console.log(`${product.brand} - ${product.name}`);
}
console.log('─'.repeat(80));
console.log(`\n✅ Added ${testProducts.length} test products with brands to the store\n`);
console.log(`View at: http://localhost:5174/stores/az/curaleaf/curaleaf-az-48th-street\n`);
} catch (error: any) {
console.error('Error:', error.message);
} finally {
await pool.end();
}
}
addTestBrands();

View File

@@ -0,0 +1,23 @@
const { Pool } = require('pg');
const pool = new Pool({
connectionString: process.env.DATABASE_URL || 'postgresql://kelly:kelly@localhost:5432/hub'
});
(async () => {
try {
await pool.query(`
UPDATE proxy_test_jobs
SET status = 'cancelled',
completed_at = CURRENT_TIMESTAMP,
updated_at = CURRENT_TIMESTAMP
WHERE id = 2
`);
console.log('✅ Cancelled job ID 2');
process.exit(0);
} catch (error) {
console.error('Error:', error);
process.exit(1);
}
})();

View File

@@ -0,0 +1,53 @@
import puppeteer from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
import { writeFileSync } from 'fs';
puppeteer.use(StealthPlugin());
async function captureAgeGateCookies() {
const browser = await puppeteer.launch({
headless: false, // Visible browser so you can complete age gate manually
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-blink-features=AutomationControlled'
]
});
const page = await browser.newPage();
await page.setViewport({ width: 1920, height: 1080 });
console.log('\n===========================================');
console.log('INSTRUCTIONS:');
console.log('1. A browser window will open');
console.log('2. Complete the age gate manually');
console.log('3. Wait until you see the store page load');
console.log('4. Press ENTER in this terminal when done');
console.log('===========================================\n');
await page.goto('https://curaleaf.com/stores/curaleaf-az-48th-street');
// Wait for user to complete age gate manually
await new Promise((resolve) => {
process.stdin.once('data', () => resolve(null));
});
// Get cookies after age gate
const cookies = await page.cookies();
console.log('\nCaptured cookies:', JSON.stringify(cookies, null, 2));
// Save cookies to file
writeFileSync(
'/home/kelly/dutchie-menus/backend/curaleaf-cookies.json',
JSON.stringify(cookies, null, 2)
);
console.log('\n✅ Cookies saved to curaleaf-cookies.json');
console.log('Current URL:', page.url());
await browser.close();
process.exit(0);
}
captureAgeGateCookies();

View File

@@ -0,0 +1,27 @@
import { Pool } from 'pg';
const pool = new Pool({
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
});
async function check() {
try {
const result = await pool.query("SELECT id, name, slug, dutchie_url FROM stores WHERE slug LIKE '%48th%'");
console.log('Stores with "48th" in slug:');
console.log('─'.repeat(80));
result.rows.forEach(store => {
console.log(`ID: ${store.id}`);
console.log(`Name: ${store.name}`);
console.log(`Slug: ${store.slug}`);
console.log(`URL: ${store.dutchie_url}`);
console.log('─'.repeat(80));
});
console.log(`Total: ${result.rowCount}`);
} catch (error: any) {
console.error('Error:', error.message);
} finally {
await pool.end();
}
}
check();

View File

@@ -0,0 +1,23 @@
import { pool } from './src/db/migrate.js';
async function main() {
const result = await pool.query(`
SELECT id, brand_name, status, products_scraped
FROM brand_jobs
WHERE dispensary_id = 112
AND status = 'completed'
ORDER BY products_scraped DESC
LIMIT 10
`);
console.log('\nCompleted Brand Jobs:');
console.log('='.repeat(80));
result.rows.forEach((row: any) => {
console.log(`${row.id}: ${row.brand_name} - ${row.products_scraped} products`);
});
console.log('='.repeat(80) + '\n');
await pool.end();
}
main().catch(console.error);

View File

@@ -0,0 +1,22 @@
import { pool } from './src/db/migrate.js';
async function checkBrandNames() {
const result = await pool.query(`
SELECT brand_slug, brand_name
FROM brand_scrape_jobs
WHERE dispensary_id = 112
ORDER BY id
LIMIT 20
`);
console.log('\nBrand Names in Database:');
console.log('='.repeat(60));
result.rows.forEach((row, idx) => {
console.log(`${idx + 1}. slug: ${row.brand_slug}`);
console.log(` name: ${row.brand_name}`);
});
await pool.end();
}
checkBrandNames().catch(console.error);

View File

@@ -0,0 +1,17 @@
import { pool } from './src/db/migrate';
async function checkTable() {
const result = await pool.query(`
SELECT column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_name = 'brands'
ORDER BY ordinal_position
`);
console.log('Brands table structure:');
console.table(result.rows);
await pool.end();
}
checkTable();

View File

@@ -0,0 +1,55 @@
import pg from 'pg';
const { Pool } = pg;
const pool = new Pool({
connectionString: process.env.DATABASE_URL
});
async function checkBrands() {
try {
// Get dispensary info
const dispensaryResult = await pool.query(
"SELECT id, name FROM dispensaries WHERE dutchie_slug = 'AZ-Deeply-Rooted'"
);
if (dispensaryResult.rows.length === 0) {
console.log('Dispensary not found');
return;
}
const dispensary = dispensaryResult.rows[0];
console.log(`Dispensary: ${dispensary.name} (ID: ${dispensary.id})`);
// Get brand count
const brandCountResult = await pool.query(
`SELECT COUNT(DISTINCT brand) as brand_count
FROM products
WHERE dispensary_id = $1 AND brand IS NOT NULL AND brand != ''`,
[dispensary.id]
);
console.log(`\nTotal distinct brands: ${brandCountResult.rows[0].brand_count}`);
// List all brands
const brandsResult = await pool.query(
`SELECT DISTINCT brand, COUNT(*) as product_count
FROM products
WHERE dispensary_id = $1 AND brand IS NOT NULL AND brand != ''
GROUP BY brand
ORDER BY brand`,
[dispensary.id]
);
console.log(`\nBrands found:`);
brandsResult.rows.forEach(row => {
console.log(` - ${row.brand} (${row.product_count} products)`);
});
} catch (error) {
console.error('Error:', error);
} finally {
await pool.end();
}
}
checkBrands();

View File

@@ -0,0 +1,27 @@
import { Pool } from 'pg';
const pool = new Pool({
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
});
async function checkDB() {
try {
const result = await pool.query(`
SELECT COUNT(*) as total,
COUNT(*) FILTER (WHERE active = true) as active
FROM proxies
`);
console.log('Proxies:', result.rows[0]);
const stores = await pool.query('SELECT slug FROM stores LIMIT 5');
console.log('Sample stores:', stores.rows.map(r => r.slug));
} catch (error: any) {
console.error('Error:', error.message);
} finally {
await pool.end();
}
}
checkDB();

View File

@@ -0,0 +1,74 @@
import pg from 'pg';
const { Pool } = pg;
const pool = new Pool({
connectionString: process.env.DATABASE_URL
});
async function check() {
try {
// Get dispensary info - try different column names
const dispensaryResult = await pool.query(
"SELECT * FROM dispensaries WHERE name ILIKE '%Deeply Rooted%' LIMIT 1"
);
if (dispensaryResult.rows.length === 0) {
console.log('Dispensary not found. Listing all dispensaries:');
const all = await pool.query("SELECT id, name FROM dispensaries LIMIT 10");
all.rows.forEach(d => console.log(` ID ${d.id}: ${d.name}`));
return;
}
const dispensary = dispensaryResult.rows[0];
console.log(`Dispensary: ${dispensary.name} (ID: ${dispensary.id})`);
console.log(`Columns:`, Object.keys(dispensary));
// Get product count
const productCountResult = await pool.query(
`SELECT COUNT(*) as total_products FROM products WHERE dispensary_id = $1`,
[dispensary.id]
);
console.log(`\nTotal products: ${productCountResult.rows[0].total_products}`);
// Get brand count and list
const brandCountResult = await pool.query(
`SELECT COUNT(DISTINCT brand) as brand_count
FROM products
WHERE dispensary_id = $1 AND brand IS NOT NULL AND brand != ''`,
[dispensary.id]
);
console.log(`Total distinct brands: ${brandCountResult.rows[0].brand_count}`);
// List all brands
const brandsResult = await pool.query(
`SELECT DISTINCT brand, COUNT(*) as product_count
FROM products
WHERE dispensary_id = $1 AND brand IS NOT NULL AND brand != ''
GROUP BY brand
ORDER BY product_count DESC`,
[dispensary.id]
);
console.log(`\nBrands with products:`);
brandsResult.rows.forEach(row => {
console.log(` - ${row.brand} (${row.product_count} products)`);
});
// Count products without brands
const noBrandResult = await pool.query(
`SELECT COUNT(*) as no_brand_count
FROM products
WHERE dispensary_id = $1 AND (brand IS NULL OR brand = '')`,
[dispensary.id]
);
console.log(`\nProducts without brand: ${noBrandResult.rows[0].no_brand_count}`);
} catch (error) {
console.error('Error:', error);
} finally {
await pool.end();
}
}
check();

View File

@@ -0,0 +1,33 @@
import pkg from 'pg';
const { Pool } = pkg;
const pool = new Pool({
connectionString: process.env.DATABASE_URL
});
async function main() {
const result = await pool.query(`
SELECT
COUNT(*) as total_products,
COUNT(CASE WHEN discount_type IS NOT NULL AND discount_value IS NOT NULL THEN 1 END) as products_with_discounts
FROM products
`);
console.log('Product Count:');
console.log(result.rows[0]);
// Get a sample of products with discounts
const sample = await pool.query(`
SELECT name, brand, regular_price, sale_price, discount_type, discount_value
FROM products
WHERE discount_type IS NOT NULL AND discount_value IS NOT NULL
LIMIT 5
`);
console.log('\nSample Products with Discounts:');
console.log(sample.rows);
await pool.end();
}
main().catch(console.error);

View File

@@ -0,0 +1,25 @@
import { pool } from './src/db/migrate.js';
async function main() {
const result = await pool.query(`
SELECT
COUNT(*) as total,
COUNT(regular_price) as with_prices,
COUNT(*) - COUNT(regular_price) as without_prices
FROM products
WHERE dispensary_id = 112
`);
const stats = result.rows[0];
const pct = ((parseInt(stats.with_prices) / parseInt(stats.total)) * 100).toFixed(1);
console.log('\nENRICHMENT PROGRESS:');
console.log(` Total products: ${stats.total}`);
console.log(` With prices: ${stats.with_prices} (${pct}%)`);
console.log(` Without prices: ${stats.without_prices}`);
console.log('');
await pool.end();
}
main().catch(console.error);

View File

@@ -0,0 +1,89 @@
import { firefox } from 'playwright';
import { getRandomProxy } from './src/utils/proxyManager.js';
async function checkForPrices() {
const proxy = await getRandomProxy();
if (!proxy) {
console.log('No proxy available');
process.exit(1);
}
const browser = await firefox.launch({ headless: true });
const context = await browser.newContext({
viewport: { width: 1920, height: 1080 },
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
proxy: {
server: proxy.server,
username: proxy.username,
password: proxy.password
}
});
const page = await context.newPage();
const brandUrl = 'https://dutchie.com/embedded-menu/AZ-Deeply-Rooted/brands/alien-labs';
console.log(`Loading: ${brandUrl}`);
await page.goto(brandUrl, { waitUntil: 'domcontentloaded', timeout: 60000 });
await page.waitForTimeout(5000);
// Check for any dollar signs on the entire page
const pageText = await page.evaluate(() => document.body.textContent);
const hasDollarSigns = pageText?.includes('$');
console.log('\n' + '='.repeat(80));
console.log('PRICE AVAILABILITY CHECK:');
console.log('='.repeat(80));
console.log(`\nPage contains '$' symbol: ${hasDollarSigns ? 'YES' : 'NO'}`);
if (hasDollarSigns) {
// Find all text containing dollar signs
const priceElements = await page.evaluate(() => {
const walker = document.createTreeWalker(
document.body,
NodeFilter.SHOW_TEXT,
null
);
const results: string[] = [];
let node;
while (node = walker.nextNode()) {
const text = node.textContent?.trim();
if (text && text.includes('$')) {
results.push(text);
}
}
return results.slice(0, 10); // First 10 instances
});
console.log('\nText containing "$":');
priceElements.forEach((text, idx) => {
console.log(` ${idx + 1}. ${text.substring(0, 100)}`);
});
}
// Check specifically within product cards
const productCardPrices = await page.evaluate(() => {
const cards = Array.from(document.querySelectorAll('a[href*="/product/"]'));
return cards.slice(0, 5).map(card => ({
text: card.textContent?.substring(0, 200),
hasDollar: card.textContent?.includes('$') || false
}));
});
console.log('\nFirst 5 Product Cards:');
productCardPrices.forEach((card, idx) => {
console.log(`\n Card ${idx + 1}:`);
console.log(` Has $: ${card.hasDollar}`);
console.log(` Text: ${card.text}`);
});
console.log('\n' + '='.repeat(80));
await browser.close();
process.exit(0);
}
checkForPrices().catch(console.error);

View File

@@ -0,0 +1,33 @@
const { Pool } = require('pg');
const pool = new Pool({
connectionString: process.env.DATABASE_URL || 'postgresql://kelly:kelly@localhost:5432/hub'
});
(async () => {
try {
const result = await pool.query(`
SELECT id, status, total_proxies, tested_proxies, passed_proxies, failed_proxies,
created_at, started_at, completed_at
FROM proxy_test_jobs
ORDER BY created_at DESC
LIMIT 5
`);
console.log('\n📊 Recent Proxy Test Jobs:');
console.log('='.repeat(80));
result.rows.forEach(job => {
console.log(`\nJob ID: ${job.id}`);
console.log(`Status: ${job.status}`);
console.log(`Progress: ${job.tested_proxies}/${job.total_proxies} (${job.passed_proxies} passed, ${job.failed_proxies} failed)`);
console.log(`Created: ${job.created_at}`);
console.log(`Started: ${job.started_at || 'N/A'}`);
console.log(`Completed: ${job.completed_at || 'N/A'}`);
});
process.exit(0);
} catch (error) {
console.error('Error:', error);
process.exit(1);
}
})();

View File

@@ -0,0 +1,40 @@
import pg from 'pg';
const client = new pg.Client({
connectionString: process.env.DATABASE_URL,
});
async function checkJobs() {
await client.connect();
const statusRes = await client.query(`
SELECT status, COUNT(*) as count
FROM brand_scrape_jobs
WHERE dispensary_id = 112
GROUP BY status
ORDER BY status
`);
console.log('\n📊 Job Status Summary:');
console.log('====================');
statusRes.rows.forEach(row => {
console.log(`${row.status}: ${row.count}`);
});
const activeRes = await client.query(`
SELECT worker_id, COUNT(*) as count
FROM brand_scrape_jobs
WHERE dispensary_id = 112 AND status = 'in_progress'
GROUP BY worker_id
`);
console.log('\n👷 Active Workers:');
console.log('==================');
activeRes.rows.forEach(row => {
console.log(`${row.worker_id}: ${row.count} jobs`);
});
await client.end();
}
checkJobs();

View File

@@ -0,0 +1,110 @@
import puppeteer from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
import { Pool } from 'pg';
puppeteer.use(StealthPlugin());
const pool = new Pool({
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
});
async function check() {
let browser;
try {
const proxyResult = await pool.query(`SELECT host, port, protocol FROM proxies ORDER BY RANDOM() LIMIT 1`);
const proxy = proxyResult.rows[0];
browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox', '--disable-setuid-sandbox', `--proxy-server=${proxy.protocol}://${proxy.host}:${proxy.port}`]
});
const page = await browser.newPage();
await page.setUserAgent('Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)');
console.log('🔍 CHECKING FOR DATA LEAKS\n');
await page.goto('https://curaleaf.com/stores/curaleaf-dispensary-phoenix-airport/brands', {
waitUntil: 'networkidle2',
timeout: 60000
});
await page.waitForTimeout(5000);
// Check what browser exposes
const browserData = await page.evaluate(() => ({
// Automation detection
webdriver: navigator.webdriver,
hasHeadlessUA: /headless/i.test(navigator.userAgent),
// User agent
userAgent: navigator.userAgent,
// Chrome detection
hasChrome: typeof (window as any).chrome !== 'undefined',
chromeKeys: (window as any).chrome ? Object.keys((window as any).chrome) : [],
// Permissions
permissions: navigator.permissions ? 'exists' : 'missing',
// Languages
languages: navigator.languages,
language: navigator.language,
// Plugins
pluginCount: navigator.plugins.length,
// Platform
platform: navigator.platform,
// Screen
screenWidth: screen.width,
screenHeight: screen.height,
// JavaScript working?
jsWorking: true,
// Page content
title: document.title,
bodyLength: document.body.innerHTML.length,
hasReactRoot: document.getElementById('__next') !== null,
scriptTags: document.querySelectorAll('script').length
}));
console.log('📋 BROWSER FINGERPRINT:');
console.log('─'.repeat(60));
console.log('navigator.webdriver:', browserData.webdriver, browserData.webdriver ? '❌ LEAKED!' : '✅');
console.log('navigator.userAgent:', browserData.userAgent);
console.log('Has "headless" in UA:', browserData.hasHeadlessUA, browserData.hasHeadlessUA ? '❌' : '✅');
console.log('window.chrome exists:', browserData.hasChrome, browserData.hasChrome ? '✅' : '❌ SUSPICIOUS');
console.log('Chrome keys:', browserData.chromeKeys.join(', '));
console.log('Languages:', browserData.languages);
console.log('Platform:', browserData.platform);
console.log('Plugins:', browserData.pluginCount);
console.log('\n📄 PAGE STATE:');
console.log('─'.repeat(60));
console.log('JavaScript executing:', browserData.jsWorking ? '✅ YES' : '❌ NO');
console.log('Page title:', `"${browserData.title}"`);
console.log('Body HTML size:', browserData.bodyLength, 'chars');
console.log('React root exists:', browserData.hasReactRoot ? '✅' : '❌');
console.log('Script tags:', browserData.scriptTags);
if (browserData.bodyLength < 1000) {
console.log('\n⚠ PROBLEM: Body too small! JS likely failed to load/execute');
}
if (!browserData.title) {
console.log('⚠️ PROBLEM: No page title! Page didn\'t render');
}
} catch (error: any) {
console.error('❌', error.message);
} finally {
if (browser) await browser.close();
await pool.end();
}
}
check();

View File

@@ -0,0 +1,72 @@
import { pool } from './src/db/migrate.js';
async function checkProductData() {
// Get a few recently saved products
const result = await pool.query(`
SELECT
slug,
name,
brand,
variant,
regular_price,
sale_price,
thc_percentage,
cbd_percentage,
strain_type,
in_stock,
stock_status,
image_url
FROM products
WHERE dispensary_id = 112
AND brand IN ('(the) Essence', 'Abundant Organics', 'AAchieve', 'Alien Labs')
ORDER BY updated_at DESC
LIMIT 10
`);
console.log('\n📊 Recently Saved Products:');
console.log('='.repeat(100));
result.rows.forEach((row, idx) => {
console.log(`\n${idx + 1}. ${row.name} (${row.brand})`);
console.log(` Variant: ${row.variant || 'N/A'}`);
console.log(` Regular Price: $${row.regular_price || 'N/A'}`);
console.log(` Sale Price: $${row.sale_price || 'N/A'}`);
console.log(` THC %: ${row.thc_percentage || 'N/A'}%`);
console.log(` CBD %: ${row.cbd_percentage || 'N/A'}%`);
console.log(` Strain: ${row.strain_type || 'N/A'}`);
console.log(` Stock: ${row.stock_status || (row.in_stock ? 'In stock' : 'Out of stock')}`);
console.log(` Image: ${row.image_url ? '✓' : 'N/A'}`);
});
console.log('\n' + '='.repeat(100));
// Count how many products have complete data
const stats = await pool.query(`
SELECT
COUNT(*) as total,
COUNT(regular_price) as has_price,
COUNT(thc_percentage) as has_thc,
COUNT(cbd_percentage) as has_cbd,
COUNT(variant) as has_variant,
COUNT(strain_type) as has_strain,
COUNT(image_url) as has_image
FROM products
WHERE dispensary_id = 112
AND brand IN ('(the) Essence', 'Abundant Organics', 'AAchieve', 'Alien Labs')
`);
const stat = stats.rows[0];
console.log('\n📈 Data Completeness for Recently Scraped Brands:');
console.log(` Total products: ${stat.total}`);
console.log(` Has price: ${stat.has_price} (${Math.round(stat.has_price / stat.total * 100)}%)`);
console.log(` Has THC%: ${stat.has_thc} (${Math.round(stat.has_thc / stat.total * 100)}%)`);
console.log(` Has CBD%: ${stat.has_cbd} (${Math.round(stat.has_cbd / stat.total * 100)}%)`);
console.log(` Has variant: ${stat.has_variant} (${Math.round(stat.has_variant / stat.total * 100)}%)`);
console.log(` Has strain type: ${stat.has_strain} (${Math.round(stat.has_strain / stat.total * 100)}%)`);
console.log(` Has image: ${stat.has_image} (${Math.round(stat.has_image / stat.total * 100)}%)`);
console.log('');
await pool.end();
}
checkProductData().catch(console.error);

View File

@@ -0,0 +1,75 @@
import { firefox } from 'playwright';
import { getRandomProxy } from './src/utils/proxyManager.js';
async function checkProductDetailPage() {
const proxy = await getRandomProxy();
if (!proxy) {
console.log('No proxy available');
process.exit(1);
}
const browser = await firefox.launch({ headless: true });
const context = await browser.newContext({
viewport: { width: 1920, height: 1080 },
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
proxy: {
server: proxy.server,
username: proxy.username,
password: proxy.password
}
});
const page = await context.newPage();
// Load a product detail page
const productUrl = 'https://dutchie.com/embedded-menu/AZ-Deeply-Rooted/product/alien-labs-cured-resin-cart-dark-web';
console.log(`Loading: ${productUrl}`);
await page.goto(productUrl, { waitUntil: 'domcontentloaded', timeout: 60000 });
await page.waitForTimeout(5000);
// Extract all data from the product detail page
const productData = await page.evaluate(() => {
const pageText = document.body.textContent || '';
// Check for prices
const priceElements = Array.from(document.querySelectorAll('*')).filter(el => {
const text = el.textContent?.trim() || '';
return text.match(/\$\d+/) && el.children.length === 0; // Leaf nodes only
});
// Check for stock information
const stockElements = Array.from(document.querySelectorAll('*')).filter(el => {
const text = el.textContent?.toLowerCase() || '';
return (text.includes('stock') || text.includes('available') || text.includes('in stock') || text.includes('out of stock')) && el.children.length === 0;
});
return {
hasPrice: pageText.includes('$'),
priceText: priceElements.slice(0, 5).map(el => el.textContent?.trim()),
stockText: stockElements.slice(0, 5).map(el => el.textContent?.trim()),
pageTextSample: pageText.substring(0, 500)
};
});
console.log('\n' + '='.repeat(80));
console.log('PRODUCT DETAIL PAGE DATA:');
console.log('='.repeat(80));
console.log('\nHas "$" symbol:', productData.hasPrice);
console.log('\nPrice elements found:');
productData.priceText.forEach((text, idx) => {
console.log(` ${idx + 1}. ${text}`);
});
console.log('\nStock elements found:');
productData.stockText.forEach((text, idx) => {
console.log(` ${idx + 1}. ${text}`);
});
console.log('\nPage text sample:');
console.log(productData.pageTextSample);
console.log('\n' + '='.repeat(80));
await browser.close();
process.exit(0);
}
checkProductDetailPage().catch(console.error);

View File

@@ -0,0 +1,56 @@
import { pool } from './src/db/migrate.js';
async function checkProductPrices() {
const result = await pool.query(`
SELECT
id,
name,
brand,
regular_price,
sale_price,
in_stock,
stock_status
FROM products
WHERE dispensary_id = 112
ORDER BY brand, name
LIMIT 50
`);
console.log('\n' + '='.repeat(100));
console.log('PRODUCTS WITH PRICES');
console.log('='.repeat(100) + '\n');
result.rows.forEach((row, idx) => {
const regularPrice = row.regular_price ? `$${row.regular_price.toFixed(2)}` : 'N/A';
const salePrice = row.sale_price ? `$${row.sale_price.toFixed(2)}` : 'N/A';
const stock = row.in_stock ? (row.stock_status || 'In Stock') : 'Out of Stock';
console.log(`${idx + 1}. ${row.brand} - ${row.name.substring(0, 50)}`);
console.log(` Price: ${regularPrice} | Sale: ${salePrice} | Stock: ${stock}`);
console.log('');
});
console.log('='.repeat(100) + '\n');
// Summary stats
const stats = await pool.query(`
SELECT
COUNT(*) as total_products,
COUNT(regular_price) as products_with_price,
COUNT(sale_price) as products_with_sale,
COUNT(CASE WHEN in_stock THEN 1 END) as in_stock_count
FROM products
WHERE dispensary_id = 112
`);
console.log('SUMMARY:');
console.log(` Total products: ${stats.rows[0].total_products}`);
console.log(` Products with regular price: ${stats.rows[0].products_with_price}`);
console.log(` Products with sale price: ${stats.rows[0].products_with_sale}`);
console.log(` Products in stock: ${stats.rows[0].in_stock_count}`);
console.log('\n');
await pool.end();
}
checkProductPrices().catch(console.error);

View File

@@ -0,0 +1,47 @@
import { pool } from './src/db/migrate.js';
async function main() {
const result = await pool.query(`
SELECT column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_name = 'products'
ORDER BY ordinal_position;
`);
console.log('Products table columns:');
result.rows.forEach(row => {
console.log(` ${row.column_name}: ${row.data_type} (${row.is_nullable === 'YES' ? 'nullable' : 'NOT NULL'})`);
});
const constraints = await pool.query(`
SELECT constraint_name, constraint_type
FROM information_schema.table_constraints
WHERE table_name = 'products';
`);
console.log('\nProducts table constraints:');
constraints.rows.forEach(row => {
console.log(` ${row.constraint_name}: ${row.constraint_type}`);
});
// Get unique constraints details
const uniqueConstraints = await pool.query(`
SELECT
tc.constraint_name,
kcu.column_name
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
ON tc.constraint_name = kcu.constraint_name
WHERE tc.table_name = 'products'
AND tc.constraint_type IN ('PRIMARY KEY', 'UNIQUE');
`);
console.log('\nUnique/Primary key constraints:');
uniqueConstraints.rows.forEach(row => {
console.log(` ${row.constraint_name}: ${row.column_name}`);
});
await pool.end();
}
main().catch(console.error);

View File

@@ -0,0 +1,29 @@
import { Pool } from 'pg';
const pool = new Pool({
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
});
async function checkProducts() {
try {
console.log('Products table columns:');
const productsColumns = await pool.query(`
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'products'
ORDER BY ordinal_position
`);
productsColumns.rows.forEach(r => console.log(` - ${r.column_name}: ${r.data_type}`));
console.log('\nSample products:');
const products = await pool.query('SELECT * FROM products LIMIT 3');
console.log(JSON.stringify(products.rows, null, 2));
} catch (error: any) {
console.error('Error:', error.message);
} finally {
await pool.end();
}
}
checkProducts();

View File

@@ -0,0 +1,31 @@
import { Pool } from 'pg';
const pool = new Pool({
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
});
async function checkProxies() {
try {
const result = await pool.query(`
SELECT
COUNT(*) as total,
COUNT(*) FILTER (WHERE active = true) as active,
COUNT(*) FILTER (WHERE state = 'Arizona') as arizona
FROM proxies
`);
console.log('Proxy Stats:');
console.log('─'.repeat(40));
console.log(`Total: ${result.rows[0].total}`);
console.log(`Active: ${result.rows[0].active}`);
console.log(`Arizona: ${result.rows[0].arizona}`);
console.log('─'.repeat(40));
} catch (error: any) {
console.error('Error:', error.message);
} finally {
await pool.end();
}
}
checkProxies();

View File

@@ -0,0 +1,36 @@
const { Pool } = require('pg');
const pool = new Pool({
connectionString: process.env.DATABASE_URL || 'postgresql://kelly:kelly@localhost:5432/hub'
});
(async () => {
try {
const stats = await pool.query(`
SELECT
COUNT(*) as total,
COUNT(*) FILTER (WHERE active = true) as active,
COUNT(*) FILTER (WHERE active = false) as inactive,
COUNT(*) FILTER (WHERE test_result = 'success') as passed,
COUNT(*) FILTER (WHERE test_result = 'failed') as failed,
COUNT(*) FILTER (WHERE test_result IS NULL) as untested
FROM proxies
`);
const s = stats.rows[0];
console.log('\n📊 Proxy Statistics:');
console.log('='.repeat(60));
console.log(`Total Proxies: ${s.total}`);
console.log(`Active: ${s.active} (passing tests)`);
console.log(`Inactive: ${s.inactive} (failed tests)`);
console.log(`Test Results:`);
console.log(` ✅ Passed: ${s.passed}`);
console.log(` ❌ Failed: ${s.failed}`);
console.log(` ⚪ Untested: ${s.untested}`);
process.exit(0);
} catch (error) {
console.error('Error:', error);
process.exit(1);
}
})();

View File

@@ -0,0 +1,60 @@
const { Pool } = require('pg');
const pool = new Pool({
connectionString: process.env.DATABASE_URL || 'postgresql://kelly:kelly@localhost:5432/hub'
});
(async () => {
try {
// Check for categories that have been scraped
const historyResult = await pool.query(`
SELECT
s.id as store_id,
s.name as store_name,
c.id as category_id,
c.name as category_name,
c.last_scraped_at,
(
SELECT COUNT(*)
FROM products p
WHERE p.store_id = s.id
AND p.category_id = c.id
) as product_count
FROM stores s
LEFT JOIN categories c ON c.store_id = s.id
WHERE c.last_scraped_at IS NOT NULL
ORDER BY c.last_scraped_at DESC
LIMIT 10
`);
console.log('\n📊 Scraper History:');
console.log('='.repeat(80));
if (historyResult.rows.length === 0) {
console.log('No scraper history found. No categories have been scraped yet.');
} else {
historyResult.rows.forEach(row => {
console.log(`\nStore: ${row.store_name} (ID: ${row.store_id})`);
console.log(`Category: ${row.category_name} (ID: ${row.category_id})`);
console.log(`Last Scraped: ${row.last_scraped_at}`);
console.log(`Products: ${row.product_count}`);
});
}
// Check total categories
const totalCategoriesResult = await pool.query(`
SELECT COUNT(*) as total FROM categories
`);
console.log(`\n\nTotal Categories: ${totalCategoriesResult.rows[0].total}`);
// Check categories with last_scraped_at
const scrapedCategoriesResult = await pool.query(`
SELECT COUNT(*) as scraped FROM categories WHERE last_scraped_at IS NOT NULL
`);
console.log(`Categories Scraped: ${scrapedCategoriesResult.rows[0].scraped}`);
process.exit(0);
} catch (error) {
console.error('Error:', error);
process.exit(1);
}
})();

View File

@@ -0,0 +1,32 @@
import { pool } from './src/db/migrate.js';
async function main() {
const result = await pool.query(`
SELECT name, brand, regular_price, sale_price, in_stock, stock_status
FROM products
WHERE dispensary_id = 112
AND brand = 'Select'
ORDER BY name
LIMIT 10
`);
console.log('\n' + '='.repeat(100));
console.log('SELECT BRAND PRODUCTS WITH PRICES (NEW ONE-PASS APPROACH)');
console.log('='.repeat(100) + '\n');
result.rows.forEach((row, idx) => {
const regularPrice = row.regular_price ? `$${parseFloat(row.regular_price).toFixed(2)}` : 'N/A';
const salePrice = row.sale_price ? `$${parseFloat(row.sale_price).toFixed(2)}` : 'N/A';
const stock = row.in_stock ? (row.stock_status || 'In Stock') : 'Out of Stock';
console.log(`${idx + 1}. ${row.name}`);
console.log(` Price: ${regularPrice} | Sale: ${salePrice} | Stock: ${stock}`);
console.log('');
});
console.log('='.repeat(100) + '\n');
await pool.end();
}
main().catch(console.error);

View File

@@ -0,0 +1,161 @@
import { firefox } from 'playwright';
import { pool } from './src/db/migrate.js';
import { getRandomProxy } from './src/utils/proxyManager.js';
async function checkProduct() {
const proxy = await getRandomProxy();
if (!proxy) {
console.log('No proxy available');
process.exit(1);
}
console.log(`Using proxy: ${proxy.server}`);
const browser = await firefox.launch({
headless: true,
firefoxUserPrefs: {
'geo.enabled': true,
}
});
const context = await browser.newContext({
viewport: { width: 1920, height: 1080 },
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0',
geolocation: { latitude: 33.4484, longitude: -112.0740 },
permissions: ['geolocation'],
proxy: {
server: proxy.server,
username: proxy.username,
password: proxy.password
}
});
const page = await context.newPage();
try {
console.log('Loading product page...');
const url = process.argv[2] || 'https://dutchie.com/embedded-menu/AZ-Deeply-Rooted/product/abundant-organics-flower-mylar-abundant-horizon';
await page.goto(url, {
waitUntil: 'domcontentloaded',
timeout: 30000
});
await page.waitForTimeout(5000);
const productData = await page.evaluate(() => {
const data: any = { fields: {} };
const allText = document.body.textContent || '';
// 1. BASIC INFO
const nameEl = document.querySelector('h1');
data.fields.name = nameEl?.textContent?.trim() || null;
// 2. CATEGORY - look for breadcrumbs or category links
const breadcrumbs = Array.from(document.querySelectorAll('[class*="breadcrumb"] a, nav a'));
data.fields.category = breadcrumbs.map(b => b.textContent?.trim()).filter(Boolean);
// 3. BRAND
const brandSelectors = ['[class*="brand"]', '[data-testid*="brand"]', 'span:has-text("Brand")', 'label:has-text("Brand")'];
for (const sel of brandSelectors) {
try {
const el = document.querySelector(sel);
if (el && el.textContent && !el.textContent.includes('Brand:')) {
data.fields.brand = el.textContent.trim();
break;
}
} catch {}
}
// 4. PRICES
const priceMatches = allText.match(/\$(\d+\.?\d*)/g);
data.fields.prices = priceMatches || [];
// 5. THC/CBD CONTENT
const thcMatch = allText.match(/THC[:\s]*(\d+\.?\d*)\s*%/i);
const cbdMatch = allText.match(/CBD[:\s]*(\d+\.?\d*)\s*%/i);
data.fields.thc = thcMatch ? parseFloat(thcMatch[1]) : null;
data.fields.cbd = cbdMatch ? parseFloat(cbdMatch[1]) : null;
// 6. STRAIN TYPE
if (allText.match(/\bindica\b/i)) data.fields.strainType = 'Indica';
else if (allText.match(/\bsativa\b/i)) data.fields.strainType = 'Sativa';
else if (allText.match(/\bhybrid\b/i)) data.fields.strainType = 'Hybrid';
// 7. WEIGHT/SIZE OPTIONS
const weights = allText.matchAll(/(\d+\.?\d*\s*(?:g|oz|mg|ml|gram|ounce))/gi);
data.fields.weights = Array.from(weights).map(m => m[1].trim());
// 8. DESCRIPTION
const descSelectors = ['[class*="description"]', '[class*="Description"]', 'p[class*="product"]'];
for (const sel of descSelectors) {
const el = document.querySelector(sel);
if (el?.textContent && el.textContent.length > 20) {
data.fields.description = el.textContent.trim().substring(0, 500);
break;
}
}
// 9. EFFECTS
const effectNames = ['Relaxed', 'Happy', 'Euphoric', 'Uplifted', 'Creative', 'Energetic', 'Focused', 'Calm', 'Sleepy', 'Hungry'];
data.fields.effects = effectNames.filter(e => allText.match(new RegExp(`\\b${e}\\b`, 'i')));
// 10. TERPENES
const terpeneNames = ['Myrcene', 'Limonene', 'Caryophyllene', 'Pinene', 'Linalool', 'Humulene'];
data.fields.terpenes = terpeneNames.filter(t => allText.match(new RegExp(`\\b${t}\\b`, 'i')));
// 11. FLAVORS
const flavorNames = ['Sweet', 'Citrus', 'Earthy', 'Pine', 'Berry', 'Diesel', 'Sour', 'Floral', 'Spicy'];
data.fields.flavors = flavorNames.filter(f => allText.match(new RegExp(`\\b${f}\\b`, 'i')));
// 12. SPECIAL INFO
data.fields.hasSpecialText = allText.includes('Special') || allText.includes('Sale') || allText.includes('Deal');
const endsMatch = allText.match(/(?:ends?|expires?)\s+(?:in\s+)?(\d+)\s+(min|hour|day)/i);
data.fields.specialEndsIn = endsMatch ? `${endsMatch[1]} ${endsMatch[2]}` : null;
// 13. IMAGE URLS
const images = Array.from(document.querySelectorAll('img[src*="dutchie"]'));
data.fields.imageUrls = images.map(img => (img as HTMLImageElement).src).filter(Boolean);
// 14. ALL VISIBLE TEXT (for debugging)
data.allVisibleText = allText.substring(0, 1000);
// 15. STRUCTURED DATA FROM SCRIPTS
const scripts = Array.from(document.querySelectorAll('script'));
data.structuredData = {};
for (const script of scripts) {
const content = script.textContent || '';
const idMatch = content.match(/"id":"([a-f0-9-]+)"/);
if (idMatch && idMatch[1].length > 10) {
data.structuredData.productId = idMatch[1];
}
const variantMatch = content.match(/"variantId":"([^"]+)"/);
if (variantMatch) {
data.structuredData.variantId = variantMatch[1];
}
const categoryMatch = content.match(/"category":"([^"]+)"/);
if (categoryMatch) {
data.structuredData.category = categoryMatch[1];
}
}
return data;
});
console.log('\n=== PRODUCT DATA (Time: ' + new Date().toISOString() + ') ===');
console.log(JSON.stringify(productData, null, 2));
await browser.close();
await pool.end();
} catch (error) {
console.error('Error:', error);
await browser.close();
await pool.end();
process.exit(1);
}
}
checkProduct();

View File

@@ -0,0 +1,49 @@
import { Pool } from 'pg';
const pool = new Pool({
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
});
async function checkStore() {
try {
const store = await pool.query(`
SELECT id, name, slug FROM stores WHERE slug = 'curaleaf-az-48th-street'
`);
if (store.rows.length > 0) {
console.log('Store found:', store.rows[0]);
// Check if it has products
const products = await pool.query(`
SELECT COUNT(*) as total, COUNT(DISTINCT brand) as brands
FROM products WHERE store_id = $1
`, [store.rows[0].id]);
console.log('Store products:', products.rows[0]);
// Check distinct brands
const brands = await pool.query(`
SELECT DISTINCT brand FROM products
WHERE store_id = $1 AND brand IS NOT NULL
ORDER BY brand
`, [store.rows[0].id]);
console.log('\nCurrent brands:', brands.rows.map(r => r.brand));
} else {
console.log('Store not found');
// Show available stores
const stores = await pool.query(`
SELECT slug FROM stores WHERE slug LIKE '%48th%'
`);
console.log('Stores with 48th:', stores.rows.map(r => r.slug));
}
} catch (error: any) {
console.error('Error:', error.message);
} finally {
await pool.end();
}
}
checkStore();

View File

@@ -0,0 +1,24 @@
import { pool } from './src/db/migrate';
async function checkStores() {
try {
const result = await pool.query(`
SELECT id, name, slug, dutchie_url
FROM stores
WHERE name ILIKE '%sol flower%'
ORDER BY name
`);
console.log(`Found ${result.rows.length} Sol Flower stores:\n`);
result.rows.forEach(store => {
console.log(`ID ${store.id}: ${store.name}`);
console.log(` URL: ${store.dutchie_url}\n`);
});
} catch (error) {
console.error('Error:', error);
} finally {
await pool.end();
}
}
checkStores();

View File

@@ -0,0 +1,35 @@
import { Pool } from 'pg';
const pool = new Pool({
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
});
async function checkTables() {
try {
const result = await pool.query(`
SELECT table_name
FROM information_schema.tables
WHERE table_schema = 'public'
ORDER BY table_name
`);
console.log('Tables in database:');
result.rows.forEach(r => console.log(' -', r.table_name));
// Check stores table structure
console.log('\nStores table columns:');
const storesColumns = await pool.query(`
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'stores'
`);
storesColumns.rows.forEach(r => console.log(` - ${r.column_name}: ${r.data_type}`));
} catch (error: any) {
console.error('Error:', error.message);
} finally {
await pool.end();
}
}
checkTables();

View File

@@ -0,0 +1,105 @@
import { chromium } from 'playwright';
const GOOGLE_UA = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36';
async function main() {
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext({
userAgent: GOOGLE_UA
});
const page = await context.newPage();
try {
console.log('Loading menu page...');
await page.goto('https://best.treez.io/onlinemenu/?customerType=ADULT', {
waitUntil: 'networkidle',
timeout: 30000
});
await page.waitForTimeout(3000);
// Look for category navigation elements
console.log('\n=== Checking for category filters/tabs ===\n');
// Check for common category selectors
const categorySelectors = [
'nav a',
'nav button',
'[role="tab"]',
'[class*="category"]',
'[class*="filter"]',
'[class*="nav"]',
'.menu-category',
'.category-filter',
'.product-category'
];
for (const selector of categorySelectors) {
const elements = await page.locator(selector).all();
if (elements.length > 0) {
console.log(`\nFound ${elements.length} elements matching "${selector}":`);
for (let i = 0; i < Math.min(10, elements.length); i++) {
const text = await elements[i].textContent();
const href = await elements[i].getAttribute('href');
const className = await elements[i].getAttribute('class');
console.log(` ${i + 1}. Text: "${text?.trim()}" | Class: "${className}" | Href: "${href}"`);
}
}
}
// Check the main navigation
console.log('\n=== Main Navigation Structure ===\n');
const navElements = await page.locator('nav, [role="navigation"]').all();
console.log(`Found ${navElements.length} navigation elements`);
for (let i = 0; i < navElements.length; i++) {
const navHtml = await navElements[i].innerHTML();
console.log(`\nNavigation ${i + 1}:`);
console.log(navHtml.substring(0, 500)); // First 500 chars
console.log('...');
}
// Check for dropdowns or select elements
console.log('\n=== Checking for dropdowns ===\n');
const selects = await page.locator('select').all();
console.log(`Found ${selects.length} select elements`);
for (let i = 0; i < selects.length; i++) {
const options = await selects[i].locator('option').all();
console.log(`\nSelect ${i + 1} has ${options.length} options:`);
for (let j = 0; j < Math.min(10, options.length); j++) {
const text = await options[j].textContent();
const value = await options[j].getAttribute('value');
console.log(` - "${text}" (value: ${value})`);
}
}
// Look for any clickable category buttons
console.log('\n=== Checking for category buttons ===\n');
const buttons = await page.locator('button').all();
console.log(`Found ${buttons.length} total buttons`);
const categoryButtons = [];
for (const button of buttons) {
const text = await button.textContent();
const className = await button.getAttribute('class');
if (text && (text.includes('Flower') || text.includes('Edible') || text.includes('Vape') ||
text.includes('Concentrate') || text.includes('Pre-Roll') || text.includes('All'))) {
categoryButtons.push({ text: text.trim(), class: className });
}
}
console.log(`Found ${categoryButtons.length} potential category buttons:`);
categoryButtons.forEach((btn, i) => {
console.log(` ${i + 1}. "${btn.text}" (class: ${btn.class})`);
});
} catch (error: any) {
console.error('Error:', error.message);
} finally {
await browser.close();
}
}
main().catch(console.error);

View File

@@ -0,0 +1,52 @@
import { pool } from './src/db/migrate.js';
async function main() {
try {
// Count total products and unique brands
const stats = await pool.query(`
SELECT
COUNT(*) as total_products,
COUNT(DISTINCT brand) as unique_brands
FROM products
WHERE dispensary_id = 149
`);
console.log('Stats:', stats.rows[0]);
// Get sample products to verify brand extraction
const samples = await pool.query(`
SELECT brand, name, variant, dutchie_url
FROM products
WHERE dispensary_id = 149
ORDER BY RANDOM()
LIMIT 10
`);
console.log('\nSample products:');
samples.rows.forEach(row => {
console.log(`Brand: "${row.brand}" | Name: "${row.name}" | Variant: "${row.variant}"`);
});
// Get brand distribution
const brands = await pool.query(`
SELECT brand, COUNT(*) as count
FROM products
WHERE dispensary_id = 149
GROUP BY brand
ORDER BY count DESC
LIMIT 15
`);
console.log('\nTop brands:');
brands.rows.forEach(row => {
console.log(`${row.brand}: ${row.count} products`);
});
} catch (error: any) {
console.error('Error:', error.message);
} finally {
await pool.end();
}
}
main().catch(console.error);

View File

@@ -0,0 +1,70 @@
import { chromium } from 'playwright';
const GOOGLE_UA = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36';
async function main() {
console.log('Checking BEST Dispensary Treez menu for pagination...');
const browser = await chromium.launch({ headless: false });
const context = await browser.newContext({ userAgent: GOOGLE_UA });
const page = await context.newPage();
try {
console.log('Loading menu page...');
await page.goto('https://best.treez.io/onlinemenu/?customerType=ADULT', {
waitUntil: 'networkidle',
timeout: 30000
});
await page.waitForTimeout(3000);
// Check initial count
const initialItems = await page.locator('.menu-item').all();
console.log(`Initial menu items found: ${initialItems.length}`);
// Check for pagination controls
const paginationButtons = await page.locator('button:has-text("Next"), button:has-text("Load More"), .pagination, [class*="page"], [class*="Pagination"]').all();
console.log(`Pagination controls found: ${paginationButtons.length}`);
// Check page height and scroll
const scrollHeight = await page.evaluate(() => document.body.scrollHeight);
console.log(`Page scroll height: ${scrollHeight}px`);
// Try scrolling to bottom
console.log('Scrolling to bottom...');
await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
await page.waitForTimeout(2000);
// Check if more items loaded after scroll
const afterScrollItems = await page.locator('.menu-item').all();
console.log(`Menu items after scroll: ${afterScrollItems.length}`);
// Check for categories/filters
const categories = await page.locator('[class*="category"], [class*="filter"], nav a, .nav-link').all();
console.log(`Category/filter links found: ${categories.length}`);
if (categories.length > 0) {
console.log('\nCategory links:');
for (let i = 0; i < Math.min(categories.length, 10); i++) {
const text = await categories[i].textContent();
const href = await categories[i].getAttribute('href');
console.log(` - ${text?.trim()} (${href})`);
}
}
// Take a screenshot
await page.screenshot({ path: '/tmp/treez-menu-check.png', fullPage: true });
console.log('\nScreenshot saved to /tmp/treez-menu-check.png');
// Get page HTML to analyze structure
const html = await page.content();
console.log(`\nPage HTML length: ${html.length} characters`);
} catch (error: any) {
console.error('Error:', error.message);
} finally {
await browser.close();
}
}
main().catch(console.error);

View File

@@ -0,0 +1,215 @@
import { pool } from './src/db/migrate';
async function migrateCompleteSchema() {
console.log('🔧 Migrating to complete normalized schema...\n');
const client = await pool.connect();
try {
await client.query('BEGIN');
// Step 1: Add brand_id to products table
console.log('1. Adding brand_id column to products...');
await client.query(`
ALTER TABLE products
ADD COLUMN IF NOT EXISTS brand_id INTEGER REFERENCES brands(id) ON DELETE SET NULL
`);
// Step 2: Ensure brands.name has UNIQUE constraint
console.log('2. Ensuring brands.name has UNIQUE constraint...');
await client.query(`
ALTER TABLE brands DROP CONSTRAINT IF EXISTS brands_name_key;
ALTER TABLE brands ADD CONSTRAINT brands_name_key UNIQUE (name);
`);
// Step 9: Migrate existing brand text to brands table and update FKs
console.log('3. Migrating existing brand data...');
await client.query(`
-- Insert unique brands from products into brands table
INSERT INTO brands (name)
SELECT DISTINCT brand
FROM products
WHERE brand IS NOT NULL AND brand != ''
ON CONFLICT (name) DO NOTHING
`);
// Update products to use brand_id
await client.query(`
UPDATE products p
SET brand_id = b.id
FROM brands b
WHERE p.brand = b.name
AND p.brand IS NOT NULL
AND p.brand != ''
`);
// Step 9: Create product_brands junction table for historical tracking
console.log('3. Creating product_brands tracking table...');
await client.query(`
CREATE TABLE IF NOT EXISTS product_brands (
id SERIAL PRIMARY KEY,
product_id INTEGER REFERENCES products(id) ON DELETE CASCADE,
brand_id INTEGER REFERENCES brands(id) ON DELETE CASCADE,
first_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
last_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE(product_id, brand_id)
)
`);
// Populate product_brands from current data
await client.query(`
INSERT INTO product_brands (product_id, brand_id)
SELECT id, brand_id
FROM products
WHERE brand_id IS NOT NULL
ON CONFLICT (product_id, brand_id) DO NOTHING
`);
// Step 9: Add store contact information and address fields
console.log('4. Adding store contact info and address fields...');
await client.query(`
ALTER TABLE stores
ADD COLUMN IF NOT EXISTS address TEXT,
ADD COLUMN IF NOT EXISTS city VARCHAR(255),
ADD COLUMN IF NOT EXISTS state VARCHAR(50),
ADD COLUMN IF NOT EXISTS zip VARCHAR(20),
ADD COLUMN IF NOT EXISTS phone VARCHAR(50),
ADD COLUMN IF NOT EXISTS website TEXT,
ADD COLUMN IF NOT EXISTS email VARCHAR(255)
`);
// Step 9: Add product discount tracking
console.log('5. Adding product discount fields...');
await client.query(`
ALTER TABLE products
ADD COLUMN IF NOT EXISTS discount_percentage DECIMAL(5, 2),
ADD COLUMN IF NOT EXISTS discount_amount DECIMAL(10, 2),
ADD COLUMN IF NOT EXISTS sale_price DECIMAL(10, 2)
`);
// Step 9: Add missing timestamp columns
console.log('6. Ensuring all timestamp columns exist...');
// Products timestamps
await client.query(`
ALTER TABLE products
ADD COLUMN IF NOT EXISTS first_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
ADD COLUMN IF NOT EXISTS last_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
`);
// Categories timestamps
await client.query(`
ALTER TABLE categories
ADD COLUMN IF NOT EXISTS first_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
ADD COLUMN IF NOT EXISTS last_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
`);
// Step 9: Add indexes for reporting queries
console.log('7. Creating indexes for fast reporting...');
await client.query(`
-- Brand exposure queries (which stores carry which brands)
CREATE INDEX IF NOT EXISTS idx_store_brands_brand_active ON store_brands(brand_id, active);
CREATE INDEX IF NOT EXISTS idx_store_brands_store_active ON store_brands(store_id, active);
CREATE INDEX IF NOT EXISTS idx_store_brands_dates ON store_brands(first_seen_at, last_seen_at);
-- Product queries by store and brand
CREATE INDEX IF NOT EXISTS idx_products_store_brand ON products(store_id, brand_id);
CREATE INDEX IF NOT EXISTS idx_products_brand_stock ON products(brand_id, in_stock);
CREATE INDEX IF NOT EXISTS idx_products_dates ON products(first_seen_at, last_seen_at);
-- Category queries
CREATE INDEX IF NOT EXISTS idx_categories_store ON categories(store_id, scrape_enabled);
-- Specials queries
CREATE INDEX IF NOT EXISTS idx_products_specials ON products(store_id, is_special) WHERE is_special = true;
`);
// Step 9: Create helper views for common queries
console.log('8. Creating reporting views...');
// Brand exposure view
await client.query(`
CREATE OR REPLACE VIEW brand_exposure AS
SELECT
b.id as brand_id,
b.name as brand_name,
COUNT(DISTINCT sb.store_id) as store_count,
COUNT(DISTINCT CASE WHEN sb.active THEN sb.store_id END) as active_store_count,
MIN(sb.first_seen_at) as first_seen,
MAX(sb.last_seen_at) as last_seen
FROM brands b
LEFT JOIN store_brands sb ON b.id = sb.brand_id
GROUP BY b.id, b.name
ORDER BY active_store_count DESC, brand_name
`);
// Brand timeline view (track adds/drops)
await client.query(`
CREATE OR REPLACE VIEW brand_timeline AS
SELECT
sb.id,
b.name as brand_name,
s.name as store_name,
sb.first_seen_at as added_on,
CASE
WHEN sb.active THEN NULL
ELSE sb.last_seen_at
END as dropped_on,
sb.active as currently_active
FROM store_brands sb
JOIN brands b ON sb.brand_id = b.id
JOIN stores s ON sb.store_id = s.id
ORDER BY sb.first_seen_at DESC
`);
// Product inventory view
await client.query(`
CREATE OR REPLACE VIEW product_inventory AS
SELECT
p.id,
p.name as product_name,
b.name as brand_name,
s.name as store_name,
c.name as category_name,
p.price,
p.in_stock,
p.is_special,
p.first_seen_at,
p.last_seen_at
FROM products p
JOIN stores s ON p.store_id = s.id
LEFT JOIN brands b ON p.brand_id = b.id
LEFT JOIN categories c ON p.category_id = c.id
ORDER BY p.last_seen_at DESC
`);
await client.query('COMMIT');
console.log('\n✅ Schema migration complete!');
console.log('\n📊 Available reporting views:');
console.log(' - brand_exposure: See how many stores carry each brand');
console.log(' - brand_timeline: Track when brands were added/dropped');
console.log(' - product_inventory: Full product catalog with store/brand info');
console.log('\n💡 Example queries:');
console.log(' -- Brands by exposure:');
console.log(' SELECT * FROM brand_exposure ORDER BY active_store_count DESC;');
console.log(' ');
console.log(' -- Recently dropped brands:');
console.log(' SELECT * FROM brand_timeline WHERE dropped_on IS NOT NULL ORDER BY dropped_on DESC;');
console.log(' ');
console.log(' -- Products by brand:');
console.log(' SELECT * FROM product_inventory WHERE brand_name = \'Sol Flower\';');
} catch (error) {
await client.query('ROLLBACK');
console.error('❌ Migration failed:', error);
throw error;
} finally {
client.release();
await pool.end();
}
}
migrateCompleteSchema();

View File

@@ -0,0 +1,13 @@
import { pool } from './src/db/migrate.js';
async function countProducts() {
const result = await pool.query(
`SELECT COUNT(*) as total FROM products WHERE dispensary_id = 112`
);
console.log(`Total products for Deeply Rooted: ${result.rows[0].total}`);
await pool.end();
}
countProducts();

View File

@@ -0,0 +1,32 @@
import { pool } from './src/db/migrate';
async function createAZDHSTable() {
console.log('🗄️ Creating azdhs_list table...\n');
await pool.query(`
CREATE TABLE IF NOT EXISTS azdhs_list (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
company_name VARCHAR(255),
slug VARCHAR(255),
address VARCHAR(500),
city VARCHAR(100),
state VARCHAR(2) DEFAULT 'AZ',
zip VARCHAR(10),
phone VARCHAR(20),
email VARCHAR(255),
status_line TEXT,
azdhs_url TEXT,
latitude DECIMAL(10, 8),
longitude DECIMAL(11, 8),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
`);
console.log('✅ Table created successfully!');
await pool.end();
}
createAZDHSTable();

View File

@@ -0,0 +1,81 @@
import { Pool } from 'pg';
const pool = new Pool({
connectionString: 'postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus'
});
async function createBrandsTable() {
try {
console.log('Creating brands table...');
await pool.query(`
CREATE TABLE IF NOT EXISTS brands (
id SERIAL PRIMARY KEY,
store_id INTEGER NOT NULL REFERENCES stores(id) ON DELETE CASCADE,
name VARCHAR(255) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
`);
console.log('✅ Brands table created');
// Create index for faster queries
await pool.query(`
CREATE INDEX IF NOT EXISTS idx_brands_store_id ON brands(store_id)
`);
console.log('✅ Index created on store_id');
// Create unique constraint
await pool.query(`
CREATE UNIQUE INDEX IF NOT EXISTS idx_brands_store_name ON brands(store_id, name)
`);
console.log('✅ Unique constraint created on (store_id, name)');
console.log('\nCreating specials table...');
await pool.query(`
CREATE TABLE IF NOT EXISTS specials (
id SERIAL PRIMARY KEY,
store_id INTEGER NOT NULL REFERENCES stores(id) ON DELETE CASCADE,
product_id INTEGER REFERENCES products(id) ON DELETE CASCADE,
name VARCHAR(255) NOT NULL,
description TEXT,
discount_amount NUMERIC(10, 2),
discount_percentage NUMERIC(5, 2),
special_price NUMERIC(10, 2),
original_price NUMERIC(10, 2),
valid_date DATE NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
`);
console.log('✅ Specials table created');
// Create composite index for fast date-based queries
await pool.query(`
CREATE INDEX IF NOT EXISTS idx_specials_store_date ON specials(store_id, valid_date DESC)
`);
console.log('✅ Index created on (store_id, valid_date)');
// Create index on product_id for joins
await pool.query(`
CREATE INDEX IF NOT EXISTS idx_specials_product_id ON specials(product_id)
`);
console.log('✅ Index created on product_id');
console.log('\n🎉 All tables and indexes created successfully!');
} catch (error: any) {
console.error('❌ Error:', error.message);
} finally {
await pool.end();
}
}
createBrandsTable();

View File

@@ -0,0 +1,61 @@
import { pool } from './src/db/migrate';
async function createBrandsTables() {
console.log('📦 Creating brands tracking tables...\n');
try {
// Brands table - stores unique brands across all stores
await pool.query(`
CREATE TABLE IF NOT EXISTS brands (
id SERIAL PRIMARY KEY,
name VARCHAR(255) UNIQUE NOT NULL,
logo_url TEXT,
first_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
last_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
`);
console.log('✅ Created brands table');
// Store-Brand relationship - tracks which brands are at which stores
await pool.query(`
CREATE TABLE IF NOT EXISTS store_brands (
id SERIAL PRIMARY KEY,
store_id INTEGER REFERENCES stores(id) ON DELETE CASCADE,
brand_id INTEGER REFERENCES brands(id) ON DELETE CASCADE,
first_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
last_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
active BOOLEAN DEFAULT true,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE(store_id, brand_id)
)
`);
console.log('✅ Created store_brands table');
// Add indexes for performance
await pool.query(`
CREATE INDEX IF NOT EXISTS idx_store_brands_store_id ON store_brands(store_id);
CREATE INDEX IF NOT EXISTS idx_store_brands_brand_id ON store_brands(brand_id);
CREATE INDEX IF NOT EXISTS idx_store_brands_active ON store_brands(active);
CREATE INDEX IF NOT EXISTS idx_brands_name ON brands(name);
`);
console.log('✅ Created indexes');
console.log('\n✅ Brands tables created successfully!');
console.log('\nTable structure:');
console.log(' brands: Stores unique brand names and logos');
console.log(' store_brands: Tracks which brands are at which stores with timestamps');
console.log('\nReports you can run:');
console.log(' - Brand exposure: How many stores carry each brand');
console.log(' - Brand timeline: When brands were added/removed from stores');
console.log(' - Store changes: Which brands were added/dropped at a store');
} catch (error) {
console.error('❌ Error:', error);
} finally {
await pool.end();
}
}
createBrandsTables();

View File

@@ -0,0 +1,38 @@
const { Pool } = require('pg');
const pool = new Pool({
connectionString: process.env.DATABASE_URL
});
(async () => {
try {
await pool.query(`
CREATE TABLE IF NOT EXISTS proxy_test_jobs (
id SERIAL PRIMARY KEY,
status VARCHAR(20) NOT NULL DEFAULT 'pending',
total_proxies INTEGER NOT NULL DEFAULT 0,
tested_proxies INTEGER NOT NULL DEFAULT 0,
passed_proxies INTEGER NOT NULL DEFAULT 0,
failed_proxies INTEGER NOT NULL DEFAULT 0,
started_at TIMESTAMP,
completed_at TIMESTAMP,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
`);
await pool.query(`
CREATE INDEX IF NOT EXISTS idx_proxy_test_jobs_status ON proxy_test_jobs(status);
`);
await pool.query(`
CREATE INDEX IF NOT EXISTS idx_proxy_test_jobs_created_at ON proxy_test_jobs(created_at DESC);
`);
console.log('✅ Table created successfully');
process.exit(0);
} catch (error) {
console.error('❌ Error:', error);
process.exit(1);
}
})();

View File

@@ -0,0 +1,22 @@
[
{
"name": "age_gate_passed",
"value": "true",
"domain": ".curaleaf.com",
"path": "/",
"expires": 9999999999,
"httpOnly": false,
"secure": false,
"sameSite": "Lax"
},
{
"name": "selected_state",
"value": "Arizona",
"domain": ".curaleaf.com",
"path": "/",
"expires": 9999999999,
"httpOnly": false,
"secure": false,
"sameSite": "Lax"
}
]

View File

@@ -0,0 +1,83 @@
import puppeteer from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
import { Browser, Page } from 'puppeteer';
puppeteer.use(StealthPlugin());
async function debugAfterStateSelect() {
let browser: Browser | null = null;
try {
const url = 'https://curaleaf.com/stores/curaleaf-az-48th-street';
browser = await puppeteer.launch({
headless: 'new',
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-blink-features=AutomationControlled'
]
});
const page = await browser.newPage();
await page.setViewport({ width: 1920, height: 1080 });
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36');
console.log('Loading page...');
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 60000 });
await page.waitForTimeout(5000);
// Click dropdown and select Arizona
const stateButton = await page.$('button#state');
if (stateButton) {
console.log('Clicking state button...');
await stateButton.click();
await page.waitForTimeout(800);
console.log('Clicking Arizona...');
await page.evaluate(() => {
const options = Array.from(document.querySelectorAll('[role="option"]'));
const arizona = options.find(el => el.textContent?.toLowerCase() === 'arizona');
if (arizona instanceof HTMLElement) {
arizona.click();
}
});
await page.waitForTimeout(1000);
console.log('\n=== AFTER selecting Arizona ===');
// Check what buttons are now visible
const elementsAfter = await page.evaluate(() => {
return {
buttons: Array.from(document.querySelectorAll('button')).map(b => ({
text: b.textContent?.trim(),
classes: b.className,
id: b.id,
visible: b.offsetParent !== null
})),
links: Array.from(document.querySelectorAll('a')).filter(a => a.offsetParent !== null).map(a => ({
text: a.textContent?.trim(),
href: a.href
})),
hasAgeQuestion: document.body.textContent?.includes('21') || document.body.textContent?.includes('age')
};
});
console.log('\nVisible buttons:', JSON.stringify(elementsAfter.buttons.filter(b => b.visible), null, 2));
console.log('\nVisible links:', JSON.stringify(elementsAfter.links, null, 2));
console.log('\nHas age question:', elementsAfter.hasAgeQuestion);
}
await browser.close();
process.exit(0);
} catch (error) {
console.error('Error:', error);
if (browser) await browser.close();
process.exit(1);
}
}
debugAfterStateSelect();

View File

@@ -0,0 +1,96 @@
import puppeteer from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
import { Browser, Page } from 'puppeteer';
puppeteer.use(StealthPlugin());
async function debugDetailedAgeGate() {
let browser: Browser | null = null;
try {
const url = 'https://curaleaf.com/stores/curaleaf-az-48th-street';
browser = await puppeteer.launch({
headless: false, // Run with visible browser
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-blink-features=AutomationControlled'
]
});
const page = await browser.newPage();
await page.setViewport({ width: 1920, height: 1080 });
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36');
console.log('Loading page...');
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 60000 });
await page.waitForTimeout(5000);
console.log(`\nCurrent URL: ${page.url()}`);
// Check for dropdown button
console.log('\nLooking for dropdown button #state...');
const stateButton = await page.$('button#state');
console.log('State button found:', !!stateButton);
if (stateButton) {
console.log('Clicking state button...');
await stateButton.click();
await page.waitForTimeout(1000);
console.log('\nLooking for dropdown options after clicking...');
const options = await page.evaluate(() => {
// Look for any elements that appeared after clicking
const allElements = Array.from(document.querySelectorAll('[role="option"], [class*="option"], [class*="Option"], li'));
return allElements.slice(0, 20).map(el => ({
text: el.textContent?.trim(),
tag: el.tagName,
role: el.getAttribute('role'),
classes: el.className
}));
});
console.log('Found options:', JSON.stringify(options, null, 2));
// Try to click Arizona
console.log('\nTrying to click Arizona option...');
const clicked = await page.evaluate(() => {
const allElements = Array.from(document.querySelectorAll('[role="option"], [class*="option"], [class*="Option"], li, div, span'));
const arizonaEl = allElements.find(el => el.textContent?.toLowerCase().includes('arizona'));
if (arizonaEl instanceof HTMLElement) {
console.log('Found Arizona element:', arizonaEl.textContent);
arizonaEl.click();
return true;
}
return false;
});
console.log('Arizona clicked:', clicked);
if (clicked) {
console.log('Waiting for navigation...');
try {
await page.waitForNavigation({ timeout: 10000 });
console.log('Navigation successful!');
} catch (e) {
console.log('Navigation timeout');
}
}
}
console.log(`\nFinal URL: ${page.url()}`);
console.log('\nPress Ctrl+C to close browser...');
// Keep browser open for inspection
await new Promise(() => {});
} catch (error) {
console.error('Error:', error);
if (browser) await browser.close();
process.exit(1);
}
}
debugDetailedAgeGate();

View File

@@ -0,0 +1,78 @@
import puppeteer from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
import { Browser, Page } from 'puppeteer';
puppeteer.use(StealthPlugin());
async function debugAgeGateElements() {
let browser: Browser | null = null;
try {
const url = 'https://curaleaf.com/stores/curaleaf-az-48th-street';
browser = await puppeteer.launch({
headless: 'new',
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-blink-features=AutomationControlled'
]
});
const page = await browser.newPage();
await page.setViewport({ width: 1920, height: 1080 });
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36');
console.log('Loading page...');
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 60000 });
await page.waitForTimeout(5000); // Wait for React to render
const elements = await page.evaluate(() => {
const allClickable = Array.from(document.querySelectorAll('button, a, div[role="button"], [onclick]'));
return {
buttons: Array.from(document.querySelectorAll('button')).map(b => ({
text: b.textContent?.trim(),
classes: b.className,
id: b.id
})),
links: Array.from(document.querySelectorAll('a')).map(a => ({
text: a.textContent?.trim(),
href: a.href,
classes: a.className
})),
divs: Array.from(document.querySelectorAll('div[role="button"], div[onclick], [class*="card"], [class*="Card"], [class*="state"], [class*="State"]')).slice(0, 20).map(d => ({
text: d.textContent?.trim().substring(0, 100),
classes: d.className,
role: d.getAttribute('role')
}))
};
});
console.log('\n=== BUTTONS ===');
elements.buttons.forEach((b, i) => {
console.log(`${i + 1}. "${b.text}" [${b.classes}] #${b.id}`);
});
console.log('\n=== LINKS ===');
elements.links.slice(0, 10).forEach((a, i) => {
console.log(`${i + 1}. "${a.text}" -> ${a.href}`);
});
console.log('\n=== DIVS/CARDS ===');
elements.divs.forEach((d, i) => {
console.log(`${i + 1}. "${d.text}" [${d.classes}] role=${d.role}`);
});
await browser.close();
process.exit(0);
} catch (error) {
console.error('Error:', error);
if (browser) await browser.close();
process.exit(1);
}
}
debugAgeGateElements();

View File

@@ -0,0 +1,88 @@
import { chromium } from 'playwright-extra';
import stealth from 'puppeteer-extra-plugin-stealth';
import { pool } from './src/db/migrate';
chromium.use(stealth());
async function debugAZDHSPage() {
console.log('🔍 Debugging AZDHS page structure...\n');
const browser = await chromium.launch({
headless: false,
});
const context = await browser.newContext({
viewport: { width: 1920, height: 1080 },
});
const page = await context.newPage();
try {
console.log('📄 Loading page...');
await page.goto('https://azcarecheck.azdhs.gov/s/?facilityId=001t000000L0TApAAN', {
waitUntil: 'domcontentloaded',
timeout: 60000
});
console.log('⏳ Waiting 30 seconds for you to scroll and load all dispensaries...\n');
await page.waitForTimeout(30000);
console.log('🔍 Analyzing page structure...\n');
const debug = await page.evaluate(() => {
// Get all unique tag names
const allElements = document.querySelectorAll('*');
const tagCounts: any = {};
const classSamples: string[] = [];
allElements.forEach(el => {
const tag = el.tagName.toLowerCase();
tagCounts[tag] = (tagCounts[tag] || 0) + 1;
// Sample some classes
if (el.className && typeof el.className === 'string' && el.className.length > 0 && classSamples.length < 50) {
classSamples.push(el.className.substring(0, 80));
}
});
// Look for elements with text that might be dispensary names
const textElements: any[] = [];
allElements.forEach(el => {
const text = el.textContent?.trim() || '';
if (text.length > 10 && text.length < 200 && el.children.length < 5) {
textElements.push({
tag: el.tagName.toLowerCase(),
class: el.className ? el.className.substring(0, 50) : '',
text: text.substring(0, 100)
});
}
});
return {
totalElements: allElements.length,
tagCounts: Object.entries(tagCounts).sort((a: any, b: any) => b[1] - a[1]).slice(0, 20),
classSamples: classSamples.slice(0, 20),
textElementsSample: textElements.slice(0, 10)
};
});
console.log('📊 Page Structure Analysis:');
console.log(`\nTotal elements: ${debug.totalElements}`);
console.log('\nTop 20 element types:');
console.table(debug.tagCounts);
console.log('\nSample classes:');
debug.classSamples.forEach((c: string, i: number) => console.log(` ${i + 1}. ${c}`));
console.log('\nSample text elements (potential dispensary names):');
console.table(debug.textElementsSample);
} catch (error) {
console.error(`❌ Error: ${error}`);
} finally {
console.log('\n👉 Browser will stay open for 30 seconds so you can inspect...');
await page.waitForTimeout(30000);
await browser.close();
await pool.end();
}
}
debugAZDHSPage();

View File

@@ -0,0 +1,79 @@
import { firefox } from 'playwright';
import { pool } from './src/db/migrate.js';
import { getRandomProxy } from './src/utils/proxyManager.js';
const dispensaryId = 112;
async function main() {
const dispensaryResult = await pool.query(
"SELECT id, name, menu_url FROM dispensaries WHERE id = $1",
[dispensaryId]
);
const menuUrl = dispensaryResult.rows[0].menu_url;
const proxy = await getRandomProxy();
const browser = await firefox.launch({ headless: true });
const context = await browser.newContext({
viewport: { width: 1920, height: 1080 },
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
proxy: {
server: proxy.server,
username: proxy.username,
password: proxy.password
}
});
const page = await context.newPage();
const brandsUrl = `${menuUrl}/brands`;
console.log(`Loading: ${brandsUrl}`);
await page.goto(brandsUrl, { waitUntil: 'domcontentloaded', timeout: 60000 });
await page.waitForSelector('a[href*="/brands/"]', { timeout: 45000 });
await page.waitForTimeout(3000);
// Get the HTML structure of the first 5 brand links
const brandStructures = await page.evaluate(() => {
const brandLinks = Array.from(document.querySelectorAll('a[href*="/brands/"]')).slice(0, 10);
return brandLinks.map(link => {
const href = link.getAttribute('href') || '';
const slug = href.split('/brands/')[1]?.replace(/\/$/, '') || '';
return {
slug,
innerHTML: (link as HTMLElement).innerHTML.substring(0, 300),
textContent: link.textContent?.trim(),
childElementCount: link.childElementCount,
children: Array.from(link.children).map(child => ({
tag: child.tagName.toLowerCase(),
class: child.className,
text: child.textContent?.trim()
}))
};
});
});
console.log('\n' + '='.repeat(80));
console.log('BRAND LINK STRUCTURES:');
console.log('='.repeat(80));
brandStructures.forEach((brand, idx) => {
console.log(`\n${idx + 1}. slug: ${brand.slug}`);
console.log(` textContent: "${brand.textContent}"`);
console.log(` childElementCount: ${brand.childElementCount}`);
console.log(` children:`);
brand.children.forEach((child, childIdx) => {
console.log(` ${childIdx + 1}. <${child.tag}> class="${child.class}"`);
console.log(` text: "${child.text}"`);
});
console.log(` innerHTML: ${brand.innerHTML.substring(0, 200)}`);
});
console.log('\n' + '='.repeat(80));
await browser.close();
await pool.end();
}
main().catch(console.error);

View File

@@ -0,0 +1,77 @@
import { chromium } from 'playwright';
async function debugCuraleafButtons() {
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext({
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
viewport: { width: 1280, height: 720 }
});
const page = await context.newPage();
await page.goto('https://curaleaf.com/stores/curaleaf-dispensary-48th-street');
await page.waitForTimeout(2000);
console.log('\n=== BEFORE STATE SELECTION ===\n');
// Get all buttons
const buttonsBefore = await page.locator('button, [role="button"], a').evaluateAll(elements => {
return elements.map(el => ({
tag: el.tagName,
text: el.textContent?.trim().substring(0, 50),
id: el.id,
class: el.className,
visible: el.offsetParent !== null
})).filter(b => b.visible);
});
console.log('Buttons before state selection:');
buttonsBefore.forEach((b, i) => console.log(`${i + 1}. ${b.tag} - "${b.text}" [id: ${b.id}]`));
// Click state dropdown
const stateButton = page.locator('button#state').first();
await stateButton.click();
await page.waitForTimeout(1000);
// Click Arizona
const arizona = page.locator('[role="option"]').filter({ hasText: /^Arizona$/i }).first();
await arizona.click();
await page.waitForTimeout(2000);
console.log('\n=== AFTER STATE SELECTION ===\n');
const buttonsAfter = await page.locator('button, [role="button"], a').evaluateAll(elements => {
return elements.map(el => ({
tag: el.tagName,
text: el.textContent?.trim().substring(0, 50),
id: el.id,
class: el.className,
type: el.getAttribute('type'),
visible: el.offsetParent !== null
})).filter(b => b.visible);
});
console.log('Buttons after state selection:');
buttonsAfter.forEach((b, i) => console.log(`${i + 1}. ${b.tag} - "${b.text}" [id: ${b.id}] [type: ${b.type}]`));
// Check for any form elements
const forms = await page.locator('form').count();
console.log(`\nForms on page: ${forms}`);
if (forms > 0) {
const formActions = await page.locator('form').evaluateAll(forms => {
return forms.map(f => ({
action: f.action,
method: f.method
}));
});
console.log('Form details:', formActions);
}
await page.screenshot({ path: '/tmp/curaleaf-debug-after-state.png', fullPage: true });
console.log('\n📸 Screenshot: /tmp/curaleaf-debug-after-state.png');
await browser.close();
}
debugCuraleafButtons();

View File

@@ -0,0 +1,103 @@
import puppeteer from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
import { bypassAgeGate, detectStateFromUrl } from './src/utils/age-gate';
import { Browser, Page } from 'puppeteer';
puppeteer.use(StealthPlugin());
async function debugDutchieDetection() {
let browser: Browser | null = null;
try {
const url = 'https://curaleaf.com/stores/curaleaf-az-48th-street';
browser = await puppeteer.launch({
headless: 'new',
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-blink-features=AutomationControlled'
]
});
const page = await browser.newPage();
await page.setViewport({ width: 1920, height: 1080 });
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36');
console.log('Loading page...');
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 60000 });
await page.waitForTimeout(3000);
console.log('Bypassing age gate...');
const state = detectStateFromUrl(url);
await bypassAgeGate(page, state);
console.log('\nWaiting 5 more seconds for Dutchie menu to load...');
await page.waitForTimeout(5000);
console.log('\nChecking for Dutchie markers...');
const dutchieInfo = await page.evaluate(() => {
// Check window.reactEnv
const hasReactEnv = !!(window as any).reactEnv;
const reactEnvDetails = hasReactEnv ? JSON.stringify((window as any).reactEnv, null, 2) : null;
// Check HTML content
const htmlContent = document.documentElement.innerHTML;
const hasAdminDutchie = htmlContent.includes('admin.dutchie.com');
const hasApiDutchie = htmlContent.includes('api.dutchie.com');
const hasEmbeddedMenu = htmlContent.includes('embedded-menu');
const hasReactEnvInHtml = htmlContent.includes('window.reactEnv');
// Check for Dutchie-specific elements
const hasProductListItems = document.querySelectorAll('[data-testid="product-list-item"]').length;
const hasDutchieScript = !!document.querySelector('script[src*="dutchie"]');
// Check meta tags
const metaTags = Array.from(document.querySelectorAll('meta')).map(m => ({
name: m.getAttribute('name'),
content: m.getAttribute('content'),
property: m.getAttribute('property')
})).filter(m => m.name || m.property);
return {
hasReactEnv,
reactEnvDetails,
hasAdminDutchie,
hasApiDutchie,
hasEmbeddedMenu,
hasReactEnvInHtml,
hasProductListItems,
hasDutchieScript,
metaTags,
pageTitle: document.title,
url: window.location.href
};
});
console.log('\n=== Dutchie Detection Results ===');
console.log('window.reactEnv exists:', dutchieInfo.hasReactEnv);
if (dutchieInfo.reactEnvDetails) {
console.log('window.reactEnv contents:', dutchieInfo.reactEnvDetails);
}
console.log('Has admin.dutchie.com in HTML:', dutchieInfo.hasAdminDutchie);
console.log('Has api.dutchie.com in HTML:', dutchieInfo.hasApiDutchie);
console.log('Has "embedded-menu" in HTML:', dutchieInfo.hasEmbeddedMenu);
console.log('Has "window.reactEnv" in HTML:', dutchieInfo.hasReactEnvInHtml);
console.log('Product list items found:', dutchieInfo.hasProductListItems);
console.log('Has Dutchie script tag:', dutchieInfo.hasDutchieScript);
console.log('Page title:', dutchieInfo.pageTitle);
console.log('Current URL:', dutchieInfo.url);
await browser.close();
process.exit(0);
} catch (error) {
console.error('Error:', error);
if (browser) await browser.close();
process.exit(1);
}
}
debugDutchieDetection();

View File

@@ -0,0 +1,134 @@
import { createStealthBrowser, createStealthContext, waitForPageLoad, isCloudflareChallenge, waitForCloudflareChallenge } from './src/utils/stealthBrowser';
import { getRandomProxy } from './src/utils/proxyManager';
import { pool } from './src/db/migrate';
import * as fs from 'fs/promises';
async function debugDutchieSelectors() {
console.log('🔍 Debugging Dutchie page structure...\n');
const url = 'https://dutchie.com/dispensary/sol-flower-dispensary';
// Get proxy
const proxy = await getRandomProxy();
console.log(`Using proxy: ${proxy?.server || 'none'}\n`);
const browser = await createStealthBrowser({ proxy: proxy || undefined, headless: true });
try {
const context = await createStealthContext(browser, { state: 'Arizona' });
const page = await context.newPage();
console.log(`Loading: ${url}`);
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 60000 });
// Check for Cloudflare
if (await isCloudflareChallenge(page)) {
console.log('🛡️ Cloudflare detected, waiting...');
await waitForCloudflareChallenge(page, 60000);
}
await waitForPageLoad(page);
// Wait for content
await page.waitForTimeout(5000);
console.log('\n📸 Taking screenshot...');
await page.screenshot({ path: '/tmp/dutchie-page.png', fullPage: true });
console.log('💾 Saving HTML...');
const html = await page.content();
await fs.writeFile('/tmp/dutchie-page.html', html);
console.log('\n🔎 Looking for common React/product patterns...\n');
// Try to find product containers by various methods
const patterns = [
// React data attributes
'a[href*="/product/"]',
'[data-testid*="product"]',
'[data-cy*="product"]',
'[data-test*="product"]',
// Common class patterns
'[class*="ProductCard"]',
'[class*="product-card"]',
'[class*="Product_"]',
'[class*="MenuItem"]',
'[class*="menu-item"]',
// Semantic HTML
'article',
'[role="article"]',
'[role="listitem"]',
// Link patterns
'a[href*="/menu/"]',
'a[href*="/products/"]',
'a[href*="/item/"]',
];
for (const selector of patterns) {
const count = await page.locator(selector).count();
if (count > 0) {
console.log(`${selector}: ${count} elements`);
// Get details of first element
try {
const first = page.locator(selector).first();
const html = await first.evaluate(el => el.outerHTML.substring(0, 500));
const classes = await first.getAttribute('class');
const testId = await first.getAttribute('data-testid');
console.log(` Classes: ${classes || 'none'}`);
console.log(` Data-testid: ${testId || 'none'}`);
console.log(` HTML preview: ${html}...`);
console.log('');
} catch (e) {
console.log(` (Could not get element details)`);
}
}
}
// Try to extract actual product links
console.log('\n🔗 Looking for product links...\n');
const links = await page.locator('a[href*="/product/"], a[href*="/menu/"], a[href*="/item/"]').all();
if (links.length > 0) {
console.log(`Found ${links.length} potential product links:`);
for (let i = 0; i < Math.min(5, links.length); i++) {
const href = await links[i].getAttribute('href');
const text = await links[i].textContent();
console.log(` ${i + 1}. ${href}`);
console.log(` Text: ${text?.substring(0, 100)}`);
}
}
// Check page title and URL
console.log(`\n📄 Page title: ${await page.title()}`);
console.log(`📍 Final URL: ${page.url()}`);
// Try to find the main content container
console.log('\n🎯 Looking for main content container...\n');
const mainPatterns = ['main', '[role="main"]', '#root', '#app', '[id*="app"]'];
for (const selector of mainPatterns) {
const count = await page.locator(selector).count();
if (count > 0) {
console.log(`${selector}: found`);
const classes = await page.locator(selector).first().getAttribute('class');
console.log(` Classes: ${classes || 'none'}`);
}
}
console.log('\n✅ Debug complete!');
console.log('📸 Screenshot saved to: /tmp/dutchie-page.png');
console.log('💾 HTML saved to: /tmp/dutchie-page.html');
} catch (error) {
console.error('❌ Error:', error);
} finally {
await browser.close();
await pool.end();
}
}
debugDutchieSelectors();

View File

@@ -0,0 +1,171 @@
import { chromium } from 'playwright';
import { pool } from './src/db/migrate';
import { getRandomProxy } from './src/utils/proxyManager';
import * as fs from 'fs';
async function debugGoogleScraper() {
console.log('🔍 Debugging Google scraper with proxy\n');
// Get a proxy
const proxy = await getRandomProxy();
if (!proxy) {
console.log('❌ No proxies available');
await pool.end();
return;
}
console.log(`🔌 Using proxy: ${proxy.server}\n`);
const browser = await chromium.launch({
headless: false, // Run in visible mode
args: ['--disable-blink-features=AutomationControlled']
});
const contextOptions: any = {
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
viewport: { width: 1920, height: 1080 },
locale: 'en-US',
timezoneId: 'America/Phoenix',
geolocation: { latitude: 33.4484, longitude: -112.0740 },
permissions: ['geolocation'],
proxy: {
server: proxy.server,
username: proxy.username,
password: proxy.password
}
};
const context = await browser.newContext(contextOptions);
// Add stealth
await context.addInitScript(() => {
Object.defineProperty(navigator, 'webdriver', { get: () => false });
(window as any).chrome = { runtime: {} };
});
const page = await context.newPage();
try {
// Test with the "All Greens Dispensary" example
const testAddress = '1035 W Main St, Quartzsite, AZ 85346';
const searchQuery = `${testAddress} dispensary`;
const searchUrl = `https://www.google.com/search?q=${encodeURIComponent(searchQuery)}`;
console.log(`🔍 Testing search: ${searchQuery}`);
console.log(`📍 URL: ${searchUrl}\n`);
await page.goto(searchUrl, { waitUntil: 'networkidle', timeout: 30000 });
await page.waitForTimeout(3000);
// Take screenshot
await page.screenshot({ path: '/tmp/google-search-debug.png', fullPage: true });
console.log('📸 Screenshot saved to /tmp/google-search-debug.png\n');
// Get the full HTML
const html = await page.content();
fs.writeFileSync('/tmp/google-search-debug.html', html);
console.log('💾 HTML saved to /tmp/google-search-debug.html\n');
// Try to find any text that looks like "All Greens"
const pageText = await page.evaluate(() => document.body.innerText);
const hasAllGreens = pageText.toLowerCase().includes('all greens');
console.log(`🔍 Page contains "All Greens": ${hasAllGreens}\n`);
if (hasAllGreens) {
console.log('✅ Google found the business!\n');
// Let's try to find where the name appears in the DOM
const nameInfo = await page.evaluate(() => {
const results: any[] = [];
const walker = document.createTreeWalker(
document.body,
NodeFilter.SHOW_TEXT,
null
);
let node;
while (node = walker.nextNode()) {
const text = node.textContent?.trim() || '';
if (text.toLowerCase().includes('all greens')) {
const element = node.parentElement;
results.push({
text: text,
tagName: element?.tagName,
className: element?.className,
id: element?.id,
dataAttrs: Array.from(element?.attributes || [])
.filter(attr => attr.name.startsWith('data-'))
.map(attr => `${attr.name}="${attr.value}"`)
});
}
}
return results;
});
console.log('📍 Found "All Greens" in these elements:');
console.log(JSON.stringify(nameInfo, null, 2));
}
// Try current selectors
console.log('\n🧪 Testing current selectors:\n');
const nameSelectors = [
'[data-attrid="title"]',
'h2[data-attrid="title"]',
'.SPZz6b h2',
'h3.LC20lb',
'.kp-header .SPZz6b'
];
for (const selector of nameSelectors) {
const element = await page.$(selector);
if (element) {
const text = await element.textContent();
console.log(`${selector}: "${text?.trim()}"`);
} else {
console.log(`${selector}: not found`);
}
}
// Look for website links
console.log('\n🔗 Looking for website links:\n');
const links = await page.evaluate(() => {
const allLinks = Array.from(document.querySelectorAll('a[href]'));
return allLinks
.filter(a => {
const href = (a as HTMLAnchorElement).href;
return href &&
!href.includes('google.com') &&
!href.includes('youtube.com') &&
!href.includes('facebook.com');
})
.slice(0, 10)
.map(a => ({
href: (a as HTMLAnchorElement).href,
text: a.textContent?.trim().substring(0, 50),
className: a.className
}));
});
console.log('First 10 non-Google links:');
console.log(JSON.stringify(links, null, 2));
// Look for phone numbers
console.log('\n📞 Looking for phone numbers:\n');
const phoneMatches = pageText.match(/\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}/g);
if (phoneMatches) {
console.log('Found phone numbers:', phoneMatches);
} else {
console.log('No phone numbers found in page text');
}
console.log('\n⏸ Browser will stay open for 30 seconds for manual inspection...');
await page.waitForTimeout(30000);
} finally {
await browser.close();
await pool.end();
}
}
debugGoogleScraper().catch(console.error);

View File

@@ -0,0 +1,56 @@
import { firefox } from 'playwright';
import { getRandomProxy } from './src/utils/proxyManager.js';
async function debugProductCard() {
const proxy = await getRandomProxy();
if (!proxy) {
console.log('No proxy available');
process.exit(1);
}
const browser = await firefox.launch({ headless: true });
const context = await browser.newContext({
viewport: { width: 1920, height: 1080 },
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
proxy: {
server: proxy.server,
username: proxy.username,
password: proxy.password
}
});
const page = await context.newPage();
const brandUrl = 'https://dutchie.com/embedded-menu/AZ-Deeply-Rooted/brands/alien-labs';
console.log(`Loading: ${brandUrl}`);
await page.goto(brandUrl, { waitUntil: 'domcontentloaded', timeout: 60000 });
await page.waitForTimeout(3000);
// Get the first product card's full text content
const cardData = await page.evaluate(() => {
const card = document.querySelector('a[href*="/product/"]');
if (!card) return null;
return {
href: card.getAttribute('href'),
innerHTML: card.innerHTML.substring(0, 2000),
textContent: card.textContent?.substring(0, 1000)
};
});
console.log('\n' + '='.repeat(80));
console.log('FIRST PRODUCT CARD DATA:');
console.log('='.repeat(80));
console.log('\nHREF:', cardData?.href);
console.log('\nTEXT CONTENT:');
console.log(cardData?.textContent);
console.log('\nHTML (first 2000 chars):');
console.log(cardData?.innerHTML);
console.log('='.repeat(80));
await browser.close();
process.exit(0);
}
debugProductCard().catch(console.error);

View File

@@ -0,0 +1,97 @@
import { firefox } from 'playwright';
import { getRandomProxy } from './src/utils/proxyManager.js';
import { pool } from './src/db/migrate.js';
async function debugPage() {
const proxy = await getRandomProxy();
if (!proxy) {
console.log('No proxy available');
process.exit(1);
}
const browser = await firefox.launch({
headless: true,
firefoxUserPrefs: { 'geo.enabled': true }
});
const context = await browser.newContext({
viewport: { width: 1920, height: 1080 },
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
geolocation: { latitude: 33.4484, longitude: -112.0740 },
permissions: ['geolocation'],
proxy: {
server: proxy.server,
username: proxy.username,
password: proxy.password
}
});
const page = await context.newPage();
try {
console.log('Loading page...');
await page.goto('https://dutchie.com/embedded-menu/AZ-Deeply-Rooted/products/', {
waitUntil: 'domcontentloaded',
timeout: 60000
});
await page.waitForTimeout(5000);
// Take screenshot
await page.screenshot({ path: '/tmp/products-page.png' });
console.log('Screenshot saved to /tmp/products-page.png');
// Get HTML sample
const html = await page.content();
console.log('\n=== PAGE TITLE ===');
console.log(await page.title());
console.log('\n=== SEARCHING FOR PRODUCT ELEMENTS ===');
// Try different selectors
const tests = [
'a[href*="/product/"]',
'[class*="Product"]',
'[class*="product"]',
'[class*="card"]',
'[class*="Card"]',
'[data-testid*="product"]',
'article',
'[role="article"]',
];
for (const selector of tests) {
const count = await page.locator(selector).count();
console.log(`${selector.padEnd(35)}${count} elements`);
}
// Get all links
console.log('\n=== ALL LINKS WITH "product" IN HREF ===');
const productLinks = await page.evaluate(() => {
return Array.from(document.querySelectorAll('a'))
.filter(a => a.href.includes('/product/'))
.map(a => ({
href: a.href,
text: a.textContent?.trim().substring(0, 100),
classes: a.className
}))
.slice(0, 10);
});
console.table(productLinks);
// Get sample HTML of body
console.log('\n=== SAMPLE HTML (first 2000 chars) ===');
const bodyHtml = await page.evaluate(() => document.body.innerHTML);
console.log(bodyHtml.substring(0, 2000));
await browser.close();
await pool.end();
} catch (error) {
console.error('Error:', error);
await browser.close();
await pool.end();
process.exit(1);
}
}
debugPage();

View File

@@ -0,0 +1,113 @@
import { firefox } from 'playwright';
import { getRandomProxy } from './src/utils/proxyManager.js';
async function main() {
const productUrl = 'https://dutchie.com/embedded-menu/AZ-Deeply-Rooted/product/easy-tiger-live-rosin-aio-pete-s-peach';
console.log('🔍 Investigating Product with Sale Price...\n');
console.log(`URL: ${productUrl}\n`);
const proxyConfig = await getRandomProxy();
if (!proxyConfig) {
throw new Error('No proxy available');
}
console.log(`🔐 Using proxy: ${proxyConfig.server}\n`);
const browser = await firefox.launch({
headless: true,
proxy: proxyConfig
});
const context = await browser.newContext({
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/115.0'
});
const page = await context.newPage();
try {
console.log('📄 Loading product page...');
await page.goto(productUrl, {
waitUntil: 'domcontentloaded',
timeout: 60000
});
await page.waitForTimeout(3000);
console.log('✅ Page loaded\n');
// Get the full page HTML for inspection
const html = await page.content();
// Look for price-related elements
const priceData = await page.evaluate(() => {
// Try JSON-LD structured data
const scripts = Array.from(document.querySelectorAll('script[type="application/ld+json"]'));
let jsonLdData: any = null;
for (const script of scripts) {
try {
const data = JSON.parse(script.textContent || '');
if (data['@type'] === 'Product') {
jsonLdData = data;
break;
}
} catch (e) {}
}
// Look for price elements in various ways
const priceElements = Array.from(document.querySelectorAll('[class*="price"], [class*="Price"]'));
const priceTexts = priceElements.map(el => ({
className: el.className,
textContent: el.textContent?.trim().substring(0, 100)
}));
// Get all text containing dollar signs
const pageText = document.body.textContent || '';
const priceMatches = pageText.match(/\$\d+\.?\d*/g);
// Look for strikethrough prices (often used for original price when there's a sale)
const strikethroughElements = Array.from(document.querySelectorAll('s, del, [style*="line-through"]'));
const strikethroughPrices = strikethroughElements.map(el => el.textContent?.trim());
// Look for elements with "sale", "special", "discount" in class names
const saleElements = Array.from(document.querySelectorAll('[class*="sale"], [class*="Sale"], [class*="special"], [class*="Special"], [class*="discount"], [class*="Discount"]'));
const saleTexts = saleElements.map(el => ({
className: el.className,
textContent: el.textContent?.trim().substring(0, 100)
}));
return {
jsonLdData,
priceElements: priceTexts.slice(0, 10),
priceMatches: priceMatches?.slice(0, 20) || [],
strikethroughPrices: strikethroughPrices.slice(0, 5),
saleElements: saleTexts.slice(0, 10)
};
});
console.log('💰 Price Data Found:');
console.log(JSON.stringify(priceData, null, 2));
// Take a screenshot for visual reference
await page.screenshot({ path: '/tmp/sale-price-product.png', fullPage: true });
console.log('\n📸 Screenshot saved to /tmp/sale-price-product.png');
// Save a snippet of the HTML around price elements
const priceHtmlSnippet = await page.evaluate(() => {
const priceElements = Array.from(document.querySelectorAll('[class*="price"], [class*="Price"]'));
if (priceElements.length > 0) {
return priceElements.slice(0, 3).map(el => el.outerHTML).join('\n\n');
}
return 'No price elements found';
});
console.log('\n📝 Price HTML Snippet:');
console.log(priceHtmlSnippet);
} catch (error: any) {
console.error('❌ Error:', error.message);
} finally {
await browser.close();
}
}
main().catch(console.error);

View File

@@ -0,0 +1,91 @@
import puppeteer from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
import { Pool } from 'pg';
import fs from 'fs';
puppeteer.use(StealthPlugin());
const pool = new Pool({
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
});
async function debug() {
let browser;
try {
// Get proxy
const proxyResult = await pool.query(`SELECT host, port, protocol FROM proxies ORDER BY RANDOM() LIMIT 1`);
const proxy = proxyResult.rows[0];
const proxyUrl = `${proxy.protocol}://${proxy.host}:${proxy.port}`;
console.log('🔌 Proxy:', proxyUrl);
browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox', '--disable-setuid-sandbox', `--proxy-server=${proxyUrl}`]
});
const page = await browser.newPage();
// Set Googlebot UA
await page.setUserAgent('Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)');
// Log all requests being made
page.on('request', request => {
console.log('\n📤 REQUEST:', request.method(), request.url());
console.log(' Headers:', JSON.stringify(request.headers(), null, 2));
});
// Log all responses
page.on('response', response => {
console.log('\n📥 RESPONSE:', response.status(), response.url());
console.log(' Headers:', JSON.stringify(response.headers(), null, 2));
});
const url = 'https://curaleaf.com/stores/curaleaf-dispensary-phoenix-airport/brands';
console.log('\n🌐 Going to:', url);
await page.goto(url, { waitUntil: 'networkidle2', timeout: 60000 });
await page.waitForTimeout(3000);
// Get what the browser sees
const pageData = await page.evaluate(() => ({
title: document.title,
url: window.location.href,
userAgent: navigator.userAgent,
bodyHTML: document.body.innerHTML,
bodyText: document.body.innerText
}));
console.log('\n📄 PAGE DATA:');
console.log('Title:', pageData.title);
console.log('URL:', pageData.url);
console.log('User Agent (browser sees):', pageData.userAgent);
console.log('Body HTML length:', pageData.bodyHTML.length, 'chars');
console.log('Body text length:', pageData.bodyText.length, 'chars');
// Save HTML to file
fs.writeFileSync('/tmp/page.html', pageData.bodyHTML);
console.log('\n💾 Saved HTML to /tmp/page.html');
// Save screenshot
await page.screenshot({ path: '/tmp/screenshot.png', fullPage: true });
console.log('📸 Saved screenshot to /tmp/screenshot.png');
// Show first 500 chars of HTML
console.log('\n📝 First 500 chars of HTML:');
console.log(pageData.bodyHTML.substring(0, 500));
// Show first 500 chars of text
console.log('\n📝 First 500 chars of text:');
console.log(pageData.bodyText.substring(0, 500));
} catch (error: any) {
console.error('❌ Error:', error.message);
} finally {
if (browser) await browser.close();
await pool.end();
}
}
debug();

View File

@@ -0,0 +1,68 @@
import { chromium } from 'playwright';
async function debugSolFlower() {
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext();
const page = await context.newPage();
try {
// Set age gate bypass cookies
await context.addCookies([
{
name: 'age_verified',
value: 'true',
domain: '.dutchie.com',
path: '/',
},
{
name: 'initial_location',
value: JSON.stringify({ state: 'Arizona' }),
domain: '.dutchie.com',
path: '/',
},
]);
console.log('🌐 Loading Sol Flower Sun City shop page...');
await page.goto('https://dutchie.com/dispensary/sol-flower-dispensary/shop', {
waitUntil: 'networkidle',
});
console.log('📸 Taking screenshot...');
await page.screenshot({ path: '/tmp/sol-flower-shop.png', fullPage: true });
// Try to find products with various selectors
console.log('\n🔍 Looking for products with different selectors:');
const selectors = [
'a[href*="/product/"]',
'[data-testid="product-card"]',
'[data-testid="product"]',
'.product-card',
'.ProductCard',
'article',
'[role="article"]',
];
for (const selector of selectors) {
const count = await page.locator(selector).count();
console.log(` ${selector}: ${count} elements`);
}
// Get the page HTML to inspect
console.log('\n📄 Page title:', await page.title());
// Check if there's any text indicating no products
const bodyText = await page.locator('body').textContent();
if (bodyText?.includes('No products') || bodyText?.includes('no items')) {
console.log('⚠️ Page indicates no products available');
}
console.log('\n✅ Screenshot saved to /tmp/sol-flower-shop.png');
} catch (error) {
console.error('❌ Error:', error);
} finally {
await browser.close();
}
}
debugSolFlower();

View File

@@ -0,0 +1,151 @@
import { firefox } from 'playwright';
import { getRandomProxy } from './src/utils/proxyManager.js';
async function main() {
const menuUrl = 'https://dutchie.com/embedded-menu/AZ-Deeply-Rooted';
const specialsUrl = `${menuUrl}/specials`;
console.log('🔍 Investigating Specials Page...\n');
console.log(`URL: ${specialsUrl}\n`);
const proxyConfig = await getRandomProxy();
if (!proxyConfig) {
throw new Error('No proxy available');
}
console.log(`🔐 Using proxy: ${proxyConfig.server}\n`);
const browser = await firefox.launch({
headless: true,
proxy: proxyConfig
});
const context = await browser.newContext({
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/115.0'
});
const page = await context.newPage();
try {
console.log('📄 Loading specials page...');
await page.goto(specialsUrl, {
waitUntil: 'domcontentloaded',
timeout: 60000
});
await page.waitForTimeout(3000);
// Scroll to load content
for (let i = 0; i < 10; i++) {
await page.evaluate(() => window.scrollBy(0, window.innerHeight));
await page.waitForTimeout(1000);
}
console.log('✅ Page loaded\n');
// Check for JSON-LD data
const jsonLdData = await page.evaluate(() => {
const scripts = Array.from(document.querySelectorAll('script[type="application/ld+json"]'));
return scripts.map(script => {
try {
return JSON.parse(script.textContent || '');
} catch (e) {
return null;
}
}).filter(Boolean);
});
if (jsonLdData.length > 0) {
console.log('📋 JSON-LD Data Found:');
console.log(JSON.stringify(jsonLdData, null, 2));
console.log('\n');
}
// Look for product cards
const productCards = await page.evaluate(() => {
const cards = Array.from(document.querySelectorAll('a[href*="/product/"]'));
return cards.slice(0, 5).map(card => ({
href: card.getAttribute('href'),
text: card.textContent?.trim().substring(0, 100)
}));
});
console.log('🛍️ Product Cards Found:', productCards.length);
if (productCards.length > 0) {
console.log('First 5 products:');
productCards.forEach((card, idx) => {
console.log(` ${idx + 1}. ${card.href}`);
console.log(` Text: ${card.text}\n`);
});
}
// Look for special indicators
const specialData = await page.evaluate(() => {
const pageText = document.body.textContent || '';
// Look for common special-related keywords
const hasDiscount = pageText.toLowerCase().includes('discount');
const hasSale = pageText.toLowerCase().includes('sale');
const hasOff = pageText.toLowerCase().includes('off');
const hasDeal = pageText.toLowerCase().includes('deal');
const hasPromo = pageText.toLowerCase().includes('promo');
// Look for percentage or dollar off indicators
const percentMatches = pageText.match(/(\d+)%\s*off/gi);
const dollarMatches = pageText.match(/\$(\d+)\s*off/gi);
// Try to find any special tags or badges
const badges = Array.from(document.querySelectorAll('[class*="badge"], [class*="tag"], [class*="special"], [class*="sale"], [class*="discount"]'));
const badgeTexts = badges.map(b => b.textContent?.trim()).filter(Boolean).slice(0, 10);
return {
keywords: {
hasDiscount,
hasSale,
hasOff,
hasDeal,
hasPromo
},
percentMatches: percentMatches || [],
dollarMatches: dollarMatches || [],
badgeTexts,
totalBadges: badges.length
};
});
console.log('\n🏷 Special Indicators:');
console.log(JSON.stringify(specialData, null, 2));
// Get page title and any heading text
const pageInfo = await page.evaluate(() => {
const title = document.title;
const h1 = document.querySelector('h1')?.textContent?.trim();
const h2s = Array.from(document.querySelectorAll('h2')).map(h => h.textContent?.trim()).slice(0, 3);
return { title, h1, h2s };
});
console.log('\n📰 Page Info:');
console.log(JSON.stringify(pageInfo, null, 2));
// Check if there are any price elements visible
const priceInfo = await page.evaluate(() => {
const pageText = document.body.textContent || '';
const priceMatches = pageText.match(/\$(\d+\.?\d*)/g);
return {
pricesFound: priceMatches?.length || 0,
samplePrices: priceMatches?.slice(0, 10) || []
};
});
console.log('\n💰 Price Info:');
console.log(JSON.stringify(priceInfo, null, 2));
} catch (error: any) {
console.error('❌ Error:', error.message);
} finally {
await browser.close();
}
}
main().catch(console.error);

View File

@@ -0,0 +1,17 @@
import { pool } from './src/db/migrate.js';
async function main() {
try {
const result = await pool.query(`
DELETE FROM products WHERE dispensary_id = 149
`);
console.log(`Deleted ${result.rowCount} products from dispensary 149`);
} catch (error: any) {
console.error('Error:', error.message);
} finally {
await pool.end();
}
}
main().catch(console.error);

View File

@@ -0,0 +1,71 @@
import { chromium } from 'playwright';
import { bypassAgeGatePlaywright } from './src/utils/age-gate-playwright';
async function diagnoseCuraleafPage() {
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext({
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
viewport: { width: 1280, height: 720 }
});
const page = await context.newPage();
try {
console.log('Loading Curaleaf page...');
await page.goto('https://curaleaf.com/stores/curaleaf-dispensary-48th-street', {
waitUntil: 'domcontentloaded'
});
await page.waitForTimeout(2000);
console.log('Bypassing age gate...');
const bypassed = await bypassAgeGatePlaywright(page, 'Arizona');
if (!bypassed) {
console.log('❌ Failed to bypass age gate');
await browser.close();
return;
}
console.log('✅ Age gate bypassed!');
console.log(`Current URL: ${page.url()}`);
await page.waitForTimeout(5000);
// Check page title
const title = await page.title();
console.log(`\nPage title: ${title}`);
// Check for "menu" or "shop" links
const menuLinks = await page.locator('a:has-text("menu"), a:has-text("shop"), a:has-text("order")').count();
console.log(`\nMenu/Shop links found: ${menuLinks}`);
if (menuLinks > 0) {
const links = await page.locator('a:has-text("menu"), a:has-text("shop"), a:has-text("order")').all();
console.log('\nMenu links:');
for (const link of links.slice(0, 5)) {
const text = await link.textContent();
const href = await link.getAttribute('href');
console.log(` - ${text}: ${href}`);
}
}
// Check body text
const bodyText = await page.textContent('body') || '';
console.log(`\nBody text length: ${bodyText.length} characters`);
console.log(`Contains "menu": ${bodyText.toLowerCase().includes('menu')}`);
console.log(`Contains "shop": ${bodyText.toLowerCase().includes('shop')}`);
console.log(`Contains "product": ${bodyText.toLowerCase().includes('product')}`);
console.log(`Contains "dutchie": ${bodyText.toLowerCase().includes('dutchie')}`);
// Take screenshot
await page.screenshot({ path: '/tmp/curaleaf-diagnosed.png', fullPage: true });
console.log('\n📸 Screenshot saved: /tmp/curaleaf-diagnosed.png');
} catch (error) {
console.error('Error:', error);
} finally {
await browser.close();
}
}
diagnoseCuraleafPage();

View File

@@ -0,0 +1,319 @@
import { firefox } from 'playwright';
import { pool } from './src/db/migrate';
import { getRandomProxy } from './src/utils/proxyManager';
interface DispensaryEnrichment {
id: number;
azdhs_name: string;
address: string;
city: string;
state: string;
zip: string;
dba_name?: string;
website?: string;
google_phone?: string;
google_rating?: number;
google_review_count?: number;
confidence: 'high' | 'medium' | 'low';
notes?: string;
}
async function enrichFromGoogleMaps() {
console.log('🦊 Enriching AZDHS dispensaries from Google Maps using Firefox\n');
// Get a proxy
const proxy = await getRandomProxy();
if (!proxy) {
console.log('❌ No proxies available');
await pool.end();
return;
}
console.log(`🔌 Using proxy: ${proxy.server}\n`);
const browser = await firefox.launch({
headless: true,
firefoxUserPrefs: {
'geo.enabled': true,
'geo.provider.use_corelocation': true,
'geo.prompt.testing': true,
'geo.prompt.testing.allow': true,
}
});
const contextOptions: any = {
viewport: { width: 1920, height: 1080 },
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0',
geolocation: { latitude: 33.4484, longitude: -112.0740 }, // Phoenix, AZ
permissions: ['geolocation'],
proxy: {
server: proxy.server,
username: proxy.username,
password: proxy.password
}
};
const context = await browser.newContext(contextOptions);
const page = await context.newPage();
try {
// Get all dispensaries that don't have website yet
const result = await pool.query(`
SELECT id, slug, name, address, city, state, zip, phone, website, dba_name
FROM dispensaries
WHERE website IS NULL OR website = ''
ORDER BY id
LIMIT 50
`);
const dispensaries = result.rows;
console.log(`📋 Found ${dispensaries.length} dispensaries to enrich\n`);
let changesCreated = 0;
let failed = 0;
let skipped = 0;
for (const disp of dispensaries) {
console.log(`\n🔍 Processing: ${disp.name}`);
console.log(` Address: ${disp.address}, ${disp.city}, ${disp.state} ${disp.zip}`);
try {
// Search Google Maps with dispensary name + address for better results
const searchQuery = `${disp.name} ${disp.address}, ${disp.city}, ${disp.state} ${disp.zip}`;
const encodedQuery = encodeURIComponent(searchQuery);
const url = `https://www.google.com/maps/search/${encodedQuery}`;
console.log(` 📍 Searching Maps: ${searchQuery}`);
await page.goto(url, {
waitUntil: 'domcontentloaded',
timeout: 30000
});
// Wait for results
await page.waitForTimeout(3000);
// Extract business data from the first result
const businessData = await page.evaluate(() => {
const data: any = {};
// Try to find the place name from the side panel
const nameSelectors = [
'h1[class*="fontHeadline"]',
'h1.DUwDvf',
'[data-item-id*="name"] h1'
];
for (const selector of nameSelectors) {
const el = document.querySelector(selector);
if (el?.textContent) {
data.name = el.textContent.trim();
break;
}
}
// Try to find website
const websiteSelectors = [
'a[data-item-id="authority"]',
'a[data-tooltip="Open website"]',
'a[aria-label*="Website"]'
];
for (const selector of websiteSelectors) {
const el = document.querySelector(selector) as HTMLAnchorElement;
if (el?.href && !el.href.includes('google.com')) {
data.website = el.href;
break;
}
}
// Try to find phone
const phoneSelectors = [
'button[data-item-id*="phone"]',
'button[aria-label*="Phone"]',
'[data-tooltip*="Copy phone number"]'
];
for (const selector of phoneSelectors) {
const el = document.querySelector(selector);
if (el?.textContent) {
const phoneMatch = el.textContent.match(/\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}/);
if (phoneMatch) {
data.phone = phoneMatch[0];
break;
}
}
}
// Try to find rating
const ratingEl = document.querySelector('[role="img"][aria-label*="stars"]');
if (ratingEl) {
const label = ratingEl.getAttribute('aria-label');
const match = label?.match(/(\d+\.?\d*)\s*stars?/);
if (match) {
data.rating = parseFloat(match[1]);
}
}
// Try to find review count
const reviewEl = document.querySelector('[aria-label*="reviews"]');
if (reviewEl) {
const label = reviewEl.getAttribute('aria-label');
const match = label?.match(/([\d,]+)\s*reviews?/);
if (match) {
data.reviewCount = parseInt(match[1].replace(/,/g, ''));
}
}
return data;
});
console.log(` Found data:`, businessData);
// Determine confidence level
let confidence: 'high' | 'medium' | 'low' = 'low';
if (businessData.name && businessData.website && businessData.phone) {
confidence = 'high';
} else if (businessData.name && (businessData.website || businessData.phone)) {
confidence = 'medium';
}
// Track if any changes were made for this dispensary
let changesMadeForDispensary = 0;
// Create change records for each field that has new data
if (businessData.name && businessData.name !== disp.dba_name) {
await pool.query(`
INSERT INTO dispensary_changes (
dispensary_id, field_name, old_value, new_value,
confidence_score, source, change_notes
) VALUES ($1, $2, $3, $4, $5, $6, $7)
`, [
disp.id,
'dba_name',
disp.dba_name || null,
businessData.name,
confidence,
'google_maps',
`Found via Google Maps search for "${disp.name}"`
]);
console.log(` 📝 Created change record for DBA name`);
changesMadeForDispensary++;
}
if (businessData.website && businessData.website !== disp.website) {
await pool.query(`
INSERT INTO dispensary_changes (
dispensary_id, field_name, old_value, new_value,
confidence_score, source, change_notes, requires_recrawl
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
`, [
disp.id,
'website',
disp.website || null,
businessData.website,
confidence,
'google_maps',
`Found via Google Maps search for "${disp.name}"`,
true
]);
console.log(` 📝 Created change record for website (requires recrawl)`);
changesMadeForDispensary++;
}
if (businessData.phone && businessData.phone !== disp.phone) {
await pool.query(`
INSERT INTO dispensary_changes (
dispensary_id, field_name, old_value, new_value,
confidence_score, source, change_notes
) VALUES ($1, $2, $3, $4, $5, $6, $7)
`, [
disp.id,
'phone',
disp.phone || null,
businessData.phone,
confidence,
'google_maps',
`Found via Google Maps search for "${disp.name}"`
]);
console.log(` 📝 Created change record for phone`);
changesMadeForDispensary++;
}
if (businessData.rating) {
await pool.query(`
INSERT INTO dispensary_changes (
dispensary_id, field_name, old_value, new_value,
confidence_score, source, change_notes
) VALUES ($1, $2, $3, $4, $5, $6, $7)
`, [
disp.id,
'google_rating',
null,
businessData.rating.toString(),
confidence,
'google_maps',
`Google rating from Maps search`
]);
console.log(` 📝 Created change record for Google rating`);
changesMadeForDispensary++;
}
if (businessData.reviewCount) {
await pool.query(`
INSERT INTO dispensary_changes (
dispensary_id, field_name, old_value, new_value,
confidence_score, source, change_notes
) VALUES ($1, $2, $3, $4, $5, $6, $7)
`, [
disp.id,
'google_review_count',
null,
businessData.reviewCount.toString(),
confidence,
'google_maps',
`Google review count from Maps search`
]);
console.log(` 📝 Created change record for Google review count`);
changesMadeForDispensary++;
}
if (changesMadeForDispensary > 0) {
console.log(` ✅ Created ${changesMadeForDispensary} change record(s) for review (${confidence} confidence)`);
changesCreated += changesMadeForDispensary;
} else {
console.log(` ⏭️ No new data found`);
skipped++;
}
} catch (error) {
console.log(` ❌ Error: ${error}`);
failed++;
}
// Rate limiting - wait between requests
await page.waitForTimeout(3000 + Math.random() * 2000);
}
console.log('\n' + '='.repeat(80));
console.log(`\n📊 Summary:`);
console.log(` 📝 Change records created: ${changesCreated}`);
console.log(` ⏭️ Skipped (no new data): ${skipped}`);
console.log(` ❌ Failed: ${failed}`);
console.log(`\n💡 Visit the Change Approval page to review and approve these changes.`);
} finally {
await browser.close();
await pool.end();
}
}
async function main() {
try {
await enrichFromGoogleMaps();
} catch (error) {
console.error('Fatal error:', error);
process.exit(1);
}
}
main();

View File

@@ -0,0 +1,284 @@
import { chromium } from 'playwright';
import { pool } from './src/db/migrate';
import { getStateProxy, getRandomProxy } from './src/utils/proxyManager';
interface DispensaryEnrichment {
id: number;
azdhs_name: string;
address: string;
city: string;
state: string;
zip: string;
dba_name?: string;
website?: string;
google_phone?: string;
google_rating?: number;
google_review_count?: number;
confidence: 'high' | 'medium' | 'low';
notes?: string;
}
async function enrichDispensariesFromGoogle() {
console.log('🔍 Starting Google enrichment for AZDHS dispensaries\n');
// Get an Arizona proxy if available, otherwise any proxy
let proxy = await getStateProxy('Arizona');
if (!proxy) {
console.log('⚠️ No Arizona proxy available, trying any US proxy...');
proxy = await getRandomProxy();
}
if (!proxy) {
console.log('❌ No proxies available. Please add proxies to the database.');
await pool.end();
return;
}
console.log(`🔌 Using proxy: ${proxy.server}\n`);
const browser = await chromium.launch({
headless: true,
args: [
'--disable-blink-features=AutomationControlled',
]
});
const contextOptions: any = {
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
viewport: { width: 1920, height: 1080 },
locale: 'en-US',
timezoneId: 'America/Phoenix',
geolocation: { latitude: 33.4484, longitude: -112.0740 }, // Phoenix, AZ
permissions: ['geolocation'],
proxy: {
server: proxy.server,
username: proxy.username,
password: proxy.password
}
};
const context = await browser.newContext(contextOptions);
// Add stealth techniques
await context.addInitScript(() => {
// Remove webdriver flag
Object.defineProperty(navigator, 'webdriver', { get: () => false });
// Chrome runtime
(window as any).chrome = { runtime: {} };
// Permissions
const originalQuery = window.navigator.permissions.query;
window.navigator.permissions.query = (parameters: any) => (
parameters.name === 'notifications' ?
Promise.resolve({ state: Notification.permission } as PermissionStatus) :
originalQuery(parameters)
);
});
const page = await context.newPage();
try {
// Get all dispensaries that don't have website yet
const result = await pool.query(`
SELECT id, name, address, city, state, zip, phone
FROM azdhs_list
WHERE website IS NULL OR website = ''
ORDER BY id
LIMIT 2
`);
const dispensaries = result.rows;
console.log(`📋 Found ${dispensaries.length} dispensaries to enrich\n`);
let enriched = 0;
let failed = 0;
const needsReview: DispensaryEnrichment[] = [];
for (const disp of dispensaries) {
console.log(`\n🔍 Processing: ${disp.name}`);
console.log(` Address: ${disp.address}, ${disp.city}, ${disp.state} ${disp.zip}`);
try {
// Search Google for the address + dispensary
const searchQuery = `${disp.address}, ${disp.city}, ${disp.state} ${disp.zip} dispensary`;
const searchUrl = `https://www.google.com/search?q=${encodeURIComponent(searchQuery)}`;
console.log(` Searching: ${searchQuery}`);
await page.goto(searchUrl, { waitUntil: 'domcontentloaded', timeout: 15000 });
await page.waitForTimeout(2000);
// Try to extract Google Business info
const businessData = await page.evaluate(() => {
const data: any = {};
// Try to find business name
const nameSelectors = [
'[data-attrid="title"]',
'h2[data-attrid="title"]',
'.SPZz6b h2',
'h3.LC20lb',
'.kp-header .SPZz6b'
];
for (const selector of nameSelectors) {
const el = document.querySelector(selector);
if (el?.textContent) {
data.name = el.textContent.trim();
break;
}
}
// Try to find website
const websiteSelectors = [
'a[data-dtype="d3ph"]',
'.yuRUbf a',
'a.ab_button[href^="http"]'
];
for (const selector of websiteSelectors) {
const el = document.querySelector(selector) as HTMLAnchorElement;
if (el?.href && !el.href.includes('google.com')) {
data.website = el.href;
break;
}
}
// Try to find phone
const phoneSelectors = [
'[data-dtype="d3ph"]',
'span[data-dtype="d3ph"]',
'.LrzXr.zdqRlf'
];
for (const selector of phoneSelectors) {
const el = document.querySelector(selector);
if (el?.textContent && /\d{3}.*\d{3}.*\d{4}/.test(el.textContent)) {
data.phone = el.textContent.trim();
break;
}
}
// Try to find rating
const ratingEl = document.querySelector('.Aq14fc');
if (ratingEl?.textContent) {
const match = ratingEl.textContent.match(/(\d+\.?\d*)/);
if (match) data.rating = parseFloat(match[1]);
}
// Try to find review count
const reviewEl = document.querySelector('.hqzQac span');
if (reviewEl?.textContent) {
const match = reviewEl.textContent.match(/(\d+)/);
if (match) data.reviewCount = parseInt(match[1]);
}
return data;
});
console.log(` Found data:`, businessData);
// Determine confidence level
let confidence: 'high' | 'medium' | 'low' = 'low';
if (businessData.name && businessData.website && businessData.phone) {
confidence = 'high';
} else if (businessData.name && (businessData.website || businessData.phone)) {
confidence = 'medium';
}
const enrichment: DispensaryEnrichment = {
id: disp.id,
azdhs_name: disp.name,
address: disp.address,
city: disp.city,
state: disp.state,
zip: disp.zip,
dba_name: businessData.name,
website: businessData.website,
google_phone: businessData.phone,
google_rating: businessData.rating,
google_review_count: businessData.reviewCount,
confidence
};
if (confidence === 'high') {
// Auto-update high confidence matches
await pool.query(`
UPDATE azdhs_list
SET
dba_name = $1,
website = $2,
google_rating = $3,
google_review_count = $4,
updated_at = CURRENT_TIMESTAMP
WHERE id = $5
`, [
businessData.name,
businessData.website,
businessData.rating,
businessData.reviewCount,
disp.id
]);
console.log(` ✅ Updated (high confidence)`);
enriched++;
} else {
// Flag for manual review
needsReview.push(enrichment);
console.log(` ⚠️ Needs review (${confidence} confidence)`);
}
} catch (error) {
console.log(` ❌ Error: ${error}`);
failed++;
}
// Rate limiting - wait between requests
await page.waitForTimeout(3000 + Math.random() * 2000);
}
console.log('\n' + '='.repeat(80));
console.log(`\n📊 Summary:`);
console.log(` ✅ Enriched: ${enriched}`);
console.log(` ⚠️ Needs review: ${needsReview.length}`);
console.log(` ❌ Failed: ${failed}`);
if (needsReview.length > 0) {
console.log('\n📋 Dispensaries needing manual review:\n');
console.table(needsReview.map(d => ({
ID: d.id,
'AZDHS Name': d.azdhs_name.substring(0, 30),
'Google Name': d.dba_name?.substring(0, 30) || '-',
Website: d.website ? 'Yes' : 'No',
Phone: d.google_phone ? 'Yes' : 'No',
Confidence: d.confidence
})));
}
} finally {
await browser.close();
await pool.end();
}
}
// Add missing columns if they don't exist
async function setupDatabase() {
await pool.query(`
ALTER TABLE azdhs_list
ADD COLUMN IF NOT EXISTS dba_name VARCHAR(255),
ADD COLUMN IF NOT EXISTS google_rating DECIMAL(2,1),
ADD COLUMN IF NOT EXISTS google_review_count INTEGER
`);
}
async function main() {
try {
await setupDatabase();
await enrichDispensariesFromGoogle();
} catch (error) {
console.error('Fatal error:', error);
process.exit(1);
}
}
main();

View File

@@ -0,0 +1,218 @@
import { firefox } from 'playwright';
import { pool } from './src/db/migrate.js';
import { getRandomProxy } from './src/utils/proxyManager.js';
const workerNum = process.argv[2] || `P${Date.now().toString().slice(-4)}`;
const dispensaryId = parseInt(process.argv[3] || '112', 10);
const batchSize = 10; // Process 10 products per batch
interface Product {
id: number;
slug: string;
name: string;
brand: string;
dutchie_url: string;
}
async function getProductsNeedingPrices(limit: number): Promise<Product[]> {
const result = await pool.query(`
SELECT id, slug, name, brand, dutchie_url
FROM products
WHERE dispensary_id = $1
AND regular_price IS NULL
AND dutchie_url IS NOT NULL
ORDER BY id
LIMIT $2
`, [dispensaryId, limit]);
return result.rows;
}
async function extractPriceFromPage(page: any, productUrl: string): Promise<{
regularPrice?: number;
salePrice?: number;
}> {
try {
console.log(`[${workerNum}] Loading: ${productUrl}`);
await page.goto(productUrl, {
waitUntil: 'domcontentloaded',
timeout: 30000
});
await page.waitForTimeout(2000);
// Extract price data from the page
const priceData = await page.evaluate(() => {
// Try JSON-LD structured data first
const scripts = Array.from(document.querySelectorAll('script[type="application/ld+json"]'));
for (const script of scripts) {
try {
const data = JSON.parse(script.textContent || '');
if (data['@type'] === 'Product' && data.offers) {
return {
regularPrice: parseFloat(data.offers.price) || undefined,
salePrice: undefined
};
}
} catch (e) {
// Continue to next script
}
}
// Fallback: extract from page text
const pageText = document.body.textContent || '';
// Look for price patterns like $30.00, $40.00
const priceMatches = pageText.match(/\$(\d+\.?\d*)/g);
if (priceMatches && priceMatches.length > 0) {
const prices = priceMatches.map(p => parseFloat(p.replace('$', '')));
// If we find multiple prices, assume first is sale, second is regular
if (prices.length >= 2) {
return {
salePrice: Math.min(prices[0], prices[1]),
regularPrice: Math.max(prices[0], prices[1])
};
} else if (prices.length === 1) {
return {
regularPrice: prices[0],
salePrice: undefined
};
}
}
return { regularPrice: undefined, salePrice: undefined };
});
return priceData;
} catch (error: any) {
console.log(`[${workerNum}] ⚠️ Error loading page: ${error.message}`);
return { regularPrice: undefined, salePrice: undefined };
}
}
async function updateProductPrice(
productId: number,
regularPrice?: number,
salePrice?: number
): Promise<void> {
await pool.query(`
UPDATE products
SET regular_price = $1,
sale_price = $2,
updated_at = CURRENT_TIMESTAMP
WHERE id = $3
`, [regularPrice || null, salePrice || null, productId]);
}
async function main() {
console.log(`\n${'='.repeat(70)}`);
console.log(`💰 PRICE ENRICHMENT WORKER - ${workerNum}`);
console.log(` Dispensary ID: ${dispensaryId}`);
console.log(` Batch Size: ${batchSize} products`);
console.log(`${'='.repeat(70)}\n`);
// Get dispensary info
const dispensaryResult = await pool.query(
"SELECT id, name, menu_url FROM dispensaries WHERE id = $1",
[dispensaryId]
);
if (dispensaryResult.rows.length === 0) {
console.error(`[${workerNum}] ❌ Dispensary ID ${dispensaryId} not found`);
process.exit(1);
}
console.log(`[${workerNum}] ✅ Dispensary: ${dispensaryResult.rows[0].name}\n`);
// Get proxy
const proxy = await getRandomProxy();
if (!proxy) {
console.log(`[${workerNum}] ❌ No proxy available`);
process.exit(1);
}
console.log(`[${workerNum}] 🔐 Using proxy: ${proxy.server}\n`);
// Launch browser
const browser = await firefox.launch({ headless: true });
const context = await browser.newContext({
viewport: { width: 1920, height: 1080 },
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
proxy: {
server: proxy.server,
username: proxy.username,
password: proxy.password
}
});
const page = await context.newPage();
let totalProcessed = 0;
let totalWithPrices = 0;
let totalNoPrices = 0;
let batchNum = 0;
// Keep processing batches
while (true) {
const products = await getProductsNeedingPrices(batchSize);
if (products.length === 0) {
console.log(`[${workerNum}] No more products need price enrichment`);
break;
}
batchNum++;
console.log(`[${workerNum}] ${'─'.repeat(70)}`);
console.log(`[${workerNum}] 📦 BATCH #${batchNum}: Processing ${products.length} products`);
console.log(`[${workerNum}] ${'─'.repeat(70)}\n`);
for (let i = 0; i < products.length; i++) {
const product = products[i];
console.log(`[${workerNum}] [${i + 1}/${products.length}] ${product.brand} - ${product.name.substring(0, 40)}`);
const { regularPrice, salePrice } = await extractPriceFromPage(page, product.dutchie_url);
await updateProductPrice(product.id, regularPrice, salePrice);
totalProcessed++;
if (regularPrice || salePrice) {
totalWithPrices++;
const priceStr = salePrice
? `Sale: $${salePrice.toFixed(2)} (Reg: $${regularPrice?.toFixed(2) || 'N/A'})`
: `Price: $${regularPrice?.toFixed(2)}`;
console.log(`[${workerNum}] ✅ ${priceStr}`);
} else {
totalNoPrices++;
console.log(`[${workerNum}] ⚠️ No price found`);
}
// Small delay between products
await page.waitForTimeout(500);
}
console.log(`\n[${workerNum}] ✅ Batch #${batchNum} complete\n`);
// Delay between batches
await page.waitForTimeout(2000);
}
console.log(`\n[${workerNum}] ${'='.repeat(70)}`);
console.log(`[${workerNum}] ✅ PRICE ENRICHMENT COMPLETE`);
console.log(`[${workerNum}] Products processed: ${totalProcessed}`);
console.log(`[${workerNum}] Products with prices: ${totalWithPrices}`);
console.log(`[${workerNum}] Products without prices: ${totalNoPrices}`);
console.log(`[${workerNum}] ${'='.repeat(70)}\n`);
await browser.close();
await pool.end();
}
main().catch(console.error);

View File

@@ -0,0 +1,178 @@
import { firefox } from 'playwright';
import { pool } from './src/db/migrate';
import { getRandomProxy } from './src/utils/proxyManager';
async function enrichSingleDispensary() {
const address = '1115 Circulo Mercado';
const city = 'Rio Rico';
const state = 'AZ';
const zip = '85648';
console.log(`🦊 Enriching: ${address}, ${city}, ${state} ${zip}\\n`);
const proxy = await getRandomProxy();
if (!proxy) {
console.log('❌ No proxies available');
await pool.end();
return;
}
console.log(`🔌 Using proxy: ${proxy.server}\\n`);
const browser = await firefox.launch({
headless: false,
firefoxUserPrefs: {
'geo.enabled': true,
'geo.provider.use_corelocation': true,
'geo.prompt.testing': true,
'geo.prompt.testing.allow': true,
}
});
const contextOptions: any = {
viewport: { width: 1920, height: 1080 },
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0',
geolocation: { latitude: 33.4484, longitude: -112.0740 },
permissions: ['geolocation'],
proxy: {
server: proxy.server,
username: proxy.username,
password: proxy.password
}
};
const context = await browser.newContext(contextOptions);
const page = await context.newPage();
try {
// Search Google Maps
const searchQuery = `dispensary ${address}, ${city}, ${state} ${zip}`;
const encodedQuery = encodeURIComponent(searchQuery);
const url = `https://www.google.com/maps/search/${encodedQuery}`;
console.log(`📍 Searching Maps: ${searchQuery}`);
await page.goto(url, {
waitUntil: 'domcontentloaded',
timeout: 30000
});
// Wait for results
await page.waitForTimeout(5000);
// Extract business data
const businessData = await page.evaluate(() => {
const data: any = {};
// Try to find the place name
const nameSelectors = [
'h1[class*="fontHeadline"]',
'h1.DUwDvf',
'[data-item-id*="name"] h1'
];
for (const selector of nameSelectors) {
const el = document.querySelector(selector);
if (el?.textContent) {
data.name = el.textContent.trim();
break;
}
}
// Try to find website
const websiteSelectors = [
'a[data-item-id="authority"]',
'a[data-tooltip="Open website"]',
'a[aria-label*="Website"]'
];
for (const selector of websiteSelectors) {
const el = document.querySelector(selector) as HTMLAnchorElement;
if (el?.href && !el.href.includes('google.com')) {
data.website = el.href;
break;
}
}
// Try to find phone
const phoneSelectors = [
'button[data-item-id*="phone"]',
'button[aria-label*="Phone"]',
'[data-tooltip*="Copy phone number"]'
];
for (const selector of phoneSelectors) {
const el = document.querySelector(selector);
if (el?.textContent) {
const phoneMatch = el.textContent.match(/\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}/);
if (phoneMatch) {
data.phone = phoneMatch[0];
break;
}
}
}
// Try to find rating
const ratingEl = document.querySelector('[role="img"][aria-label*="stars"]');
if (ratingEl) {
const label = ratingEl.getAttribute('aria-label');
const match = label?.match(/(\d+\.?\d*)\s*stars?/);
if (match) {
data.rating = parseFloat(match[1]);
}
}
// Try to find review count
const reviewEl = document.querySelector('[aria-label*="reviews"]');
if (reviewEl) {
const label = reviewEl.getAttribute('aria-label');
const match = label?.match(/([\d,]+)\s*reviews?/);
if (match) {
data.reviewCount = parseInt(match[1].replace(/,/g, ''));
}
}
return data;
});
console.log(`\\n✅ Found data:`, businessData);
// Update dutchie database
if (businessData.name) {
await pool.query(`
UPDATE azdhs_list
SET
dba_name = $1,
website = $2,
phone = $3,
google_rating = $4,
google_review_count = $5,
updated_at = CURRENT_TIMESTAMP
WHERE address = $6 AND city = $7
`, [
businessData.name,
businessData.website,
businessData.phone?.replace(/\\D/g, ''),
businessData.rating,
businessData.reviewCount,
address,
city
]);
console.log(`\\n✅ Updated database!`);
} else {
console.log(`\\n❌ No business name found`);
}
// Keep browser open for 10 seconds so you can see the results
console.log(`\\n⏳ Keeping browser open for 10 seconds...`);
await page.waitForTimeout(10000);
} catch (error) {
console.log(`❌ Error: ${error}`);
} finally {
await browser.close();
await pool.end();
}
}
enrichSingleDispensary();

View File

@@ -0,0 +1,75 @@
import { chromium } from 'playwright';
import { bypassAgeGatePlaywright } from './src/utils/age-gate-playwright';
async function exploreCuraleafMenu() {
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext({
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
viewport: { width: 1280, height: 720 }
});
const page = await context.newPage();
try {
console.log('Loading Curaleaf and bypassing age gate...\n');
await page.goto('https://curaleaf.com/stores/curaleaf-dispensary-48th-street');
await page.waitForTimeout(2000);
await bypassAgeGatePlaywright(page, 'Arizona');
await page.waitForTimeout(3000);
console.log('Current URL:', page.url());
console.log('\nLooking for menu navigation...\n');
// Look for "View Menu", "Shop Now", "Order Online" type links
const menuSelectors = [
'a:has-text("View Menu")',
'a:has-text("Shop Now")',
'a:has-text("Order Online")',
'a:has-text("Browse Menu")',
'button:has-text("View Menu")',
'button:has-text("Shop Now")'
];
for (const selector of menuSelectors) {
const count = await page.locator(selector).count();
if (count > 0) {
console.log(`Found: ${selector} (${count} elements)`);
const element = page.locator(selector).first();
const href = await element.getAttribute('href').catch(() => null);
const text = await element.textContent();
console.log(` Text: ${text}`);
console.log(` Link: ${href}`);
if (href) {
console.log(`\n✅ Found menu link! Clicking to see where it goes...`);
await element.click();
await page.waitForTimeout(5000);
console.log(` New URL: ${page.url()}\n`);
break;
}
}
}
// Check if the URL changed
const currentUrl = page.url();
if (currentUrl.includes('menu') || currentUrl.includes('shop')) {
console.log(`✅ Navigated to menu page: ${currentUrl}\n`);
// Now look for products
await page.waitForTimeout(3000);
const productCount = await page.locator('[data-testid^="product"], .product-card, [class*="ProductCard"]').count();
console.log(`Products found: ${productCount}`);
}
// Take final screenshot
await page.screenshot({ path: '/tmp/curaleaf-menu-exploration.png', fullPage: true });
console.log('\n📸 Screenshot saved: /tmp/curaleaf-menu-exploration.png');
} catch (error) {
console.error('Error:', error);
} finally {
await browser.close();
}
}
exploreCuraleafMenu();

View File

@@ -0,0 +1,61 @@
import { chromium } from 'playwright';
import { bypassAgeGatePlaywright } from './src/utils/age-gate-playwright';
async function findDutchieMenu() {
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext({
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
viewport: { width: 1280, height: 720 }
});
const page = await context.newPage();
try {
console.log('Loading Curaleaf page and bypassing age gate...\n');
await page.goto('https://curaleaf.com/stores/curaleaf-dispensary-48th-street');
await page.waitForTimeout(2000);
await bypassAgeGatePlaywright(page, 'Arizona');
await page.waitForTimeout(5000);
console.log('Looking for Dutchie menu...\n');
// Check for iframes
const frames = page.frames();
console.log(`Total frames on page: ${frames.length}`);
for (let i = 0; i < frames.length; i++) {
const frame = frames[i];
const url = frame.url();
console.log(`Frame ${i}: ${url}`);
if (url.includes('dutchie')) {
console.log(` ✅ This is the Dutchie menu!`);
// Try to find the actual menu URL
const menuUrl = url;
console.log(`\n📍 Dutchie Menu URL: ${menuUrl}\n`);
// We should scrape this URL directly instead of the Curaleaf page
console.log('💡 Strategy: Scrape the Dutchie iframe URL directly');
console.log(` Example: ${menuUrl.split('?')[0]}/shop/flower\n`);
}
}
// Also check for links to Dutchie
const dutchieLinks = await page.locator('a[href*="dutchie"]').all();
console.log(`\nDutchie links found: ${dutchieLinks.length}`);
for (const link of dutchieLinks.slice(0, 3)) {
const href = await link.getAttribute('href');
const text = await link.textContent();
console.log(` - ${text}: ${href}`);
}
} catch (error) {
console.error('Error:', error);
} finally {
await browser.close();
}
}
findDutchieMenu();

View File

@@ -0,0 +1,90 @@
import { firefox } from 'playwright';
import { getRandomProxy } from './src/utils/proxyManager.js';
async function findPriceLocation() {
const proxy = await getRandomProxy();
if (!proxy) {
console.log('No proxy available');
process.exit(1);
}
const browser = await firefox.launch({ headless: true });
const context = await browser.newContext({
viewport: { width: 1920, height: 1080 },
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
proxy: {
server: proxy.server,
username: proxy.username,
password: proxy.password
}
});
const page = await context.newPage();
const brandUrl = 'https://dutchie.com/embedded-menu/AZ-Deeply-Rooted/brands/alien-labs';
console.log(`Loading: ${brandUrl}`);
await page.goto(brandUrl, { waitUntil: 'domcontentloaded', timeout: 60000 });
await page.waitForTimeout(5000);
// Find where the prices are in the DOM
const priceLocations = await page.evaluate(() => {
const findPriceElements = (root: Element) => {
const walker = document.createTreeWalker(
root,
NodeFilter.SHOW_ELEMENT,
null
);
const results: Array<{
tag: string;
classes: string;
text: string;
isProductCard: boolean;
parentInfo: string;
}> = [];
let node: Node | null;
while (node = walker.nextNode()) {
const element = node as Element;
const text = element.textContent?.trim() || '';
if (text.includes('$') && text.match(/\$\d+/)) {
const isProductCard = element.closest('a[href*="/product/"]') !== null;
const parent = element.parentElement;
results.push({
tag: element.tagName.toLowerCase(),
classes: element.className,
text: text.substring(0, 150),
isProductCard,
parentInfo: parent ? `${parent.tagName.toLowerCase()}.${parent.className}` : 'none'
});
}
}
return results.slice(0, 15);
};
return findPriceElements(document.body);
});
console.log('\n' + '='.repeat(80));
console.log('PRICE ELEMENT LOCATIONS:');
console.log('='.repeat(80));
priceLocations.forEach((loc, idx) => {
console.log(`\n${idx + 1}. <${loc.tag}> ${loc.classes ? `class="${loc.classes}"` : ''}`);
console.log(` In product card: ${loc.isProductCard ? 'YES' : 'NO'}`);
console.log(` Parent: ${loc.parentInfo}`);
console.log(` Text: ${loc.text}`);
});
console.log('\n' + '='.repeat(80));
await browser.close();
process.exit(0);
}
findPriceLocation().catch(console.error);

View File

@@ -0,0 +1,82 @@
import { chromium } from 'playwright';
import { bypassAgeGatePlaywright } from './src/utils/age-gate-playwright';
async function findProducts() {
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext({
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
viewport: { width: 1280, height: 720 }
});
const page = await context.newPage();
try {
console.log('Loading and bypassing age gate...\n');
await page.goto('https://curaleaf.com/stores/curaleaf-dispensary-48th-street');
await page.waitForTimeout(2000);
await bypassAgeGatePlaywright(page, 'Arizona');
await page.waitForTimeout(5000);
console.log('Scrolling page...\n');
await page.evaluate(async () => {
for (let i = 0; i < 3; i++) {
window.scrollBy(0, 1000);
await new Promise(r => setTimeout(r, 500));
}
});
await page.waitForTimeout(2000);
// Try various product selectors
const selectors = [
'[data-testid^="product"]',
'.product',
'[class*="Product"]',
'[class*="product"]',
'article',
'[role="article"]',
'.card',
'[class*="Card"]',
'[class*="item"]',
'[class*="Item"]'
];
console.log('Trying different product selectors:\n');
for (const selector of selectors) {
const count = await page.locator(selector).count();
if (count > 0) {
console.log(`${selector}: ${count} elements found`);
// Get first few elements
const elements = await page.locator(selector).all();
for (let i = 0; i < Math.min(3, elements.length); i++) {
const text = await elements[i].textContent();
const classes = await elements[i].getAttribute('class');
console.log(` ${i + 1}. Classes: ${classes}`);
console.log(` Text (first 100 chars): ${text?.trim().substring(0, 100)}`);
}
console.log('');
}
}
// Check for buttons/links that might load products
const menuButtons = await page.locator('button:has-text("menu"), button:has-text("shop"), a:has-text("View Menu")').count();
console.log(`\nMenu/Shop buttons: ${menuButtons}`);
if (menuButtons > 0) {
const buttons = await page.locator('button:has-text("menu"), button:has-text("shop"), a:has-text("View Menu")').all();
console.log('Menu buttons found:');
for (const btn of buttons.slice(0, 3)) {
const text = await btn.textContent();
console.log(` - ${text?.trim()}`);
}
}
} catch (error) {
console.error('Error:', error);
} finally {
await browser.close();
}
}
findProducts();

View File

@@ -0,0 +1,58 @@
import { pool } from './src/db/migrate.js';
async function fixBrandNames() {
console.log('\n' + '='.repeat(60));
console.log('FIXING BRAND NAMES WITH DUPLICATED LETTERS');
console.log('='.repeat(60) + '\n');
// Get all brands with potential duplication (where first letter is duplicated)
const result = await pool.query(`
SELECT id, brand_slug, brand_name
FROM brand_scrape_jobs
WHERE dispensary_id = 112
AND LENGTH(brand_name) > 1
AND SUBSTRING(brand_name, 1, 1) = SUBSTRING(brand_name, 2, 1)
ORDER BY brand_name
`);
console.log(`Found ${result.rows.length} brands with potential duplication:\n`);
let fixed = 0;
let skipped = 0;
for (const row of result.rows) {
const originalName = row.brand_name;
// Remove the first letter
const fixedName = originalName.substring(1);
console.log(`${row.id}. "${originalName}" → "${fixedName}"`);
// Update the database
await pool.query(`
UPDATE brand_scrape_jobs
SET brand_name = $1, updated_at = NOW()
WHERE id = $2
`, [fixedName, row.id]);
// Also update products table if brand was already scraped
const updateResult = await pool.query(`
UPDATE products
SET brand = $1, updated_at = CURRENT_TIMESTAMP
WHERE dispensary_id = 112 AND brand = $2
`, [fixedName, originalName]);
if (updateResult.rowCount && updateResult.rowCount > 0) {
console.log(` ✓ Updated ${updateResult.rowCount} products`);
}
fixed++;
}
console.log('\n' + '='.repeat(60));
console.log(`✅ FIXED ${fixed} BRAND NAMES`);
console.log('='.repeat(60) + '\n');
await pool.end();
}
fixBrandNames().catch(console.error);

View File

@@ -0,0 +1,52 @@
import { pool } from './src/db/migrate';
async function fixBrandsTable() {
console.log('🔧 Fixing brands table structure...\n');
try {
// Drop old tables
await pool.query('DROP TABLE IF EXISTS store_brands CASCADE');
await pool.query('DROP TABLE IF EXISTS product_brands CASCADE');
await pool.query('DROP TABLE IF EXISTS brands CASCADE');
console.log('✅ Dropped old tables');
// Create brands table with correct structure
await pool.query(`
CREATE TABLE brands (
id SERIAL PRIMARY KEY,
name VARCHAR(255) UNIQUE NOT NULL,
logo_url TEXT,
first_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
last_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
`);
console.log('✅ Created brands table');
// Create store_brands junction table
await pool.query(`
CREATE TABLE store_brands (
id SERIAL PRIMARY KEY,
store_id INTEGER REFERENCES stores(id) ON DELETE CASCADE,
brand_id INTEGER REFERENCES brands(id) ON DELETE CASCADE,
first_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
last_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
active BOOLEAN DEFAULT true,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE(store_id, brand_id)
)
`);
console.log('✅ Created store_brands table');
console.log('\n✅ Brands tables fixed!');
} catch (error) {
console.error('❌ Error:', error);
} finally {
await pool.end();
}
}
fixBrandsTable();

View File

@@ -0,0 +1,48 @@
import { pool } from './src/db/migrate';
import { dutchieTemplate } from './src/scrapers/templates/dutchie';
async function fixCategoryUrls() {
console.log('🔧 Fixing category URLs for Dutchie stores\n');
try {
// Get all categories with /shop/ in the URL
const result = await pool.query(`
SELECT id, store_id, name, slug, dutchie_url
FROM categories
WHERE dutchie_url LIKE '%/shop/%'
ORDER BY store_id, id
`);
console.log(`Found ${result.rows.length} categories to fix:\n`);
for (const category of result.rows) {
console.log(`Category: ${category.name}`);
console.log(` Old URL: ${category.dutchie_url}`);
// Extract base URL (everything before /shop/)
const baseUrl = category.dutchie_url.split('/shop/')[0];
// Use template to build correct URL
const newUrl = dutchieTemplate.buildCategoryUrl(baseUrl, category.name);
console.log(` New URL: ${newUrl}`);
// Update the category
await pool.query(`
UPDATE categories
SET dutchie_url = $1
WHERE id = $2
`, [newUrl, category.id]);
console.log(` ✅ Updated\n`);
}
console.log(`\n✅ All category URLs fixed!`);
} catch (error) {
console.error('❌ Error:', error);
} finally {
await pool.end();
}
}
fixCategoryUrls();

View File

@@ -0,0 +1,72 @@
import { Pool } from 'pg';
// Docker database connection
const pool = new Pool({
connectionString: 'postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus'
});
const updates = [
{ id: 18, url: 'https://dutchie.com/dispensary/curaleaf-dispensary-48th-street' },
{ id: 19, url: 'https://dutchie.com/dispensary/curaleaf-83rd-ave' },
{ id: 20, url: 'https://dutchie.com/dispensary/curaleaf-bell-road' },
{ id: 21, url: 'https://dutchie.com/dispensary/curaleaf-camelback' },
{ id: 22, url: 'https://dutchie.com/dispensary/curaleaf-central' },
{ id: 23, url: 'https://dutchie.com/dispensary/curaleaf-gilbert' },
{ id: 24, url: 'https://dutchie.com/dispensary/curaleaf-glendale-east' },
{ id: 25, url: 'https://dutchie.com/dispensary/curaleaf-glendale-east-kind-relief' },
{ id: 26, url: 'https://dutchie.com/dispensary/curaleaf-glendale' },
{ id: 27, url: 'https://dutchie.com/dispensary/curaleaf-dispensary-midtown' },
{ id: 28, url: 'https://dutchie.com/dispensary/curaleaf-dispensary-peoria' },
{ id: 29, url: 'https://dutchie.com/dispensary/curaleaf-phoenix' },
{ id: 30, url: 'https://dutchie.com/dispensary/curaleaf-queen-creek' },
{ id: 31, url: 'https://dutchie.com/dispensary/curaleaf-queen-creek-whoa' },
{ id: 32, url: 'https://dutchie.com/dispensary/curaleaf-dispensary-scottsdale' },
{ id: 33, url: 'https://dutchie.com/dispensary/curaleaf-dispensary-sedona' },
{ id: 34, url: 'https://dutchie.com/dispensary/curaleaf-tucson' },
{ id: 35, url: 'https://dutchie.com/dispensary/curaleaf-youngtown' },
];
const solFlowerStores = [
{ name: 'Sol Flower - Sun City', slug: 'sol-flower-sun-city', url: 'https://dutchie.com/dispensary/sol-flower-dispensary' },
{ name: 'Sol Flower - South Tucson', slug: 'sol-flower-south-tucson', url: 'https://dutchie.com/dispensary/sol-flower-dispensary-south-tucson' },
{ name: 'Sol Flower - North Tucson', slug: 'sol-flower-north-tucson', url: 'https://dutchie.com/dispensary/sol-flower-dispensary-north-tucson' },
{ name: 'Sol Flower - McClintock (Tempe)', slug: 'sol-flower-mcclintock', url: 'https://dutchie.com/dispensary/sol-flower-dispensary-mcclintock' },
{ name: 'Sol Flower - Deer Valley (Phoenix)', slug: 'sol-flower-deer-valley', url: 'https://dutchie.com/dispensary/sol-flower-dispensary-deer-valley' },
];
async function fixDatabase() {
console.log('🔧 Fixing Docker database...\n');
try {
// Update Curaleaf URLs
console.log('Updating Curaleaf URLs to Dutchie...');
for (const update of updates) {
await pool.query('UPDATE stores SET dutchie_url = $1 WHERE id = $2', [update.url, update.id]);
console.log(`✅ Updated store ID ${update.id}`);
}
// Add Sol Flower stores
console.log('\nAdding Sol Flower stores...');
for (const store of solFlowerStores) {
const result = await pool.query(
`INSERT INTO stores (name, slug, dutchie_url, active, scrape_enabled, logo_url)
VALUES ($1, $2, $3, true, true, $4)
ON CONFLICT (slug) DO UPDATE SET dutchie_url = $3
RETURNING id`,
[store.name, store.slug, store.url, 'https://dutchie.com/favicon.ico']
);
console.log(`✅ Added ${store.name} (ID: ${result.rows[0].id})`);
}
console.log('\n✅ Database fixed! Showing all stores:');
const all = await pool.query('SELECT id, name, dutchie_url FROM stores ORDER BY id');
console.table(all.rows);
} catch (error) {
console.error('❌ Error:', error);
} finally {
await pool.end();
}
}
fixDatabase();

View File

@@ -0,0 +1,38 @@
const { Pool } = require('pg');
const pool = new Pool({
connectionString: process.env.DATABASE_URL || 'postgresql://kelly:kelly@localhost:5432/hub'
});
(async () => {
try {
// Check for stuck jobs
const result = await pool.query(`
SELECT id, status, total_proxies, tested_proxies, started_at
FROM proxy_test_jobs
WHERE status IN ('pending', 'running')
ORDER BY created_at DESC
`);
console.log('Found stuck jobs:', result.rows);
// Mark them as cancelled
if (result.rows.length > 0) {
await pool.query(`
UPDATE proxy_test_jobs
SET status = 'cancelled',
completed_at = CURRENT_TIMESTAMP,
updated_at = CURRENT_TIMESTAMP
WHERE status IN ('pending', 'running')
`);
console.log('✅ Cleaned up', result.rows.length, 'stuck jobs');
} else {
console.log('No stuck jobs found');
}
process.exit(0);
} catch (error) {
console.error('❌ Error:', error);
process.exit(1);
}
})();

View File

@@ -0,0 +1,33 @@
import { pool } from './src/db/migrate.js';
async function main() {
const result = await pool.query(`
SELECT
id,
name,
slug,
website,
menu_url,
scraper_template,
scraper_config,
address,
city,
state,
zip,
phone,
email
FROM dispensaries
WHERE slug = 'best-dispensary'
`);
if (result.rows.length === 0) {
console.log('BEST Dispensary not found');
} else {
console.log('BEST Dispensary Details:');
console.log(JSON.stringify(result.rows[0], null, 2));
}
await pool.end();
}
main().catch(console.error);

View File

@@ -0,0 +1,23 @@
import { pool } from './src/db/migrate.js';
async function getDispensaryIds() {
const result = await pool.query(
`SELECT id, name, slug
FROM dispensaries
WHERE dutchie_url IS NOT NULL
ORDER BY id
LIMIT 30`
);
console.log('Dispensary IDs available for scraping:');
console.log('=====================================');
result.rows.forEach(row => {
console.log(`${row.id}: ${row.name} (${row.slug})`);
});
console.log(`\nTotal: ${result.rows.length} dispensaries`);
await pool.end();
}
getDispensaryIds();

View File

@@ -0,0 +1,76 @@
import { readFileSync } from 'fs';
import { Pool } from 'pg';
const pool = new Pool({
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
});
async function importProxies() {
try {
console.log('📥 Reading proxy list from file...\n');
const proxyFile = '/home/kelly/Downloads/proxyscrape_premium_http_proxies.txt';
const fileContent = readFileSync(proxyFile, 'utf-8');
const lines = fileContent.trim().split('\n');
console.log(`Found ${lines.length} proxies in file\n`);
let imported = 0;
let duplicates = 0;
let errors = 0;
for (const line of lines) {
const trimmed = line.trim();
if (!trimmed) continue;
const [host, portStr] = trimmed.split(':');
const port = parseInt(portStr);
if (!host || !port) {
console.log(`❌ Invalid format: ${trimmed}`);
errors++;
continue;
}
try {
// Insert proxy without testing (set active = false initially)
const result = await pool.query(`
INSERT INTO proxies (host, port, protocol, active)
VALUES ($1, $2, 'http', false)
ON CONFLICT (host, port, protocol) DO NOTHING
RETURNING id
`, [host, port]);
if (result.rows.length > 0) {
imported++;
if (imported % 100 === 0) {
console.log(`📥 Imported ${imported} proxies...`);
}
} else {
duplicates++;
}
} catch (error: any) {
console.log(`❌ Error importing ${host}:${port}: ${error.message}`);
errors++;
}
}
console.log('\n✅ Import complete!');
console.log('─'.repeat(60));
console.log(`Imported: ${imported}`);
console.log(`Duplicates: ${duplicates}`);
console.log(`Errors: ${errors}`);
console.log('─'.repeat(60));
// Show final count
const countResult = await pool.query('SELECT COUNT(*) as total FROM proxies');
console.log(`\n📊 Total proxies in database: ${countResult.rows[0].total}\n`);
} catch (error: any) {
console.error('❌ Error:', error.message);
} finally {
await pool.end();
}
}
importProxies();

View File

@@ -0,0 +1,47 @@
import { chromium } from 'playwright';
const GOOGLE_UA = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36';
async function main() {
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext({
userAgent: GOOGLE_UA
});
const page = await context.newPage();
try {
console.log('Loading menu page...');
await page.goto('https://best.treez.io/onlinemenu/?customerType=ADULT', {
waitUntil: 'networkidle',
timeout: 30000
});
await page.waitForTimeout(3000);
// Get first 3 menu items and inspect their HTML structure
const menuItems = await page.locator('.menu-item').all();
console.log(`\nFound ${menuItems.length} menu items. Inspecting first 3:\n`);
for (let i = 0; i < Math.min(3, menuItems.length); i++) {
const item = menuItems[i];
const html = await item.innerHTML();
const text = await item.textContent();
console.log(`\n========== ITEM ${i + 1} ==========`);
console.log('HTML:');
console.log(html);
console.log('\nText Content:');
console.log(text);
console.log('='.repeat(40));
}
} catch (error: any) {
console.error('Error:', error.message);
} finally {
await browser.close();
}
}
main().catch(console.error);

View File

@@ -0,0 +1,50 @@
#!/bin/bash
# Launch 25 parallel brand scrapers for Deeply Rooted (dispensary ID 112)
# 90 brands total:
# - Workers 1-15: 4 brands each (brands 0-59)
# - Workers 16-25: 3 brands each (brands 60-89)
DISPENSARY_ID=112
DB_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus"
echo "🚀 Launching 25 parallel brand scrapers..."
echo "============================================================"
# Workers 1-15: 4 brands each
for i in {1..15}; do
START=$(( (i-1) * 4 ))
END=$(( START + 3 ))
echo "Starting Worker $i: brands $START-$END"
DATABASE_URL="$DB_URL" npx tsx scrape-parallel-brands.ts $DISPENSARY_ID $START $END "W$i" 2>&1 | tee /tmp/scraper-w$i.log &
done
# Workers 16-25: 3 brands each
for i in {16..25}; do
START=$(( 60 + (i-16) * 3 ))
END=$(( START + 2 ))
echo "Starting Worker $i: brands $START-$END"
DATABASE_URL="$DB_URL" npx tsx scrape-parallel-brands.ts $DISPENSARY_ID $START $END "W$i" 2>&1 | tee /tmp/scraper-w$i.log &
done
echo "============================================================"
echo "✅ All 25 workers launched!"
echo ""
echo "Monitor progress with:"
echo " tail -f /tmp/scraper-w1.log (or w2, w3, etc.)"
echo " ps aux | grep scrape-parallel"
echo ""
echo "View all logs:"
echo " tail -f /tmp/scraper-w*.log"
echo ""
echo "Brand allocation:"
echo " Workers 1-15: 4 brands each (brands 0-59)"
echo " Workers 16-25: 3 brands each (brands 60-89)"
echo " Total: 90 brands across 25 workers"
echo "============================================================"
# Wait for all background jobs to finish
wait
echo ""
echo "✅ All 25 workers completed!"

View File

@@ -0,0 +1,22 @@
import { pool } from './src/db/migrate.js';
async function main() {
const result = await pool.query(`
SELECT id, name, slug, menu_url, scraper_template
FROM dispensaries
ORDER BY name
`);
console.log('Available Dispensaries:');
result.rows.forEach((row, idx) => {
console.log(`${idx + 1}. ${row.name} (ID: ${row.id})`);
console.log(` Slug: ${row.slug}`);
console.log(` Menu URL: ${row.menu_url || 'N/A'}`);
console.log(` Template: ${row.scraper_template || 'N/A'}`);
console.log('');
});
await pool.end();
}
main().catch(console.error);

View File

@@ -0,0 +1,27 @@
import { Pool } from 'pg';
const pool = new Pool({
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
});
async function listStores() {
try {
const result = await pool.query('SELECT id, name, slug, dutchie_url FROM stores ORDER BY id LIMIT 10');
console.log('\nStores in database:');
console.log('─'.repeat(80));
result.rows.forEach(store => {
console.log(`ID: ${store.id} | Slug: ${store.slug}`);
console.log(`Name: ${store.name}`);
console.log(`URL: ${store.dutchie_url}`);
console.log('─'.repeat(80));
});
console.log(`Total: ${result.rowCount} stores\n`);
} catch (error: any) {
console.error('Error:', error.message);
} finally {
await pool.end();
}
}
listStores();

View File

@@ -0,0 +1,83 @@
import { Pool } from 'pg';
async function migrateProxyLocations() {
const sailPool = new Pool({
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
});
const dutchiePool = new Pool({
connectionString: 'postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus'
});
try {
console.log('🔄 Migrating proxy location data from SAIL → DUTCHIE\n');
// Get all proxies with location data from SAIL
const result = await sailPool.query(`
SELECT host, port, city, state, country, country_code, active
FROM proxies
WHERE city IS NOT NULL
ORDER BY host, port
`);
console.log(`📋 Found ${result.rows.length} proxies with location data in SAIL\n`);
let updated = 0;
let notFound = 0;
for (const proxy of result.rows) {
// Update matching proxy in DUTCHIE database
const updateResult = await dutchiePool.query(`
UPDATE proxies
SET
city = $1,
state = $2,
country = $3,
country_code = $4,
active = $5
WHERE host = $6 AND port = $7
RETURNING id
`, [
proxy.city,
proxy.state,
proxy.country,
proxy.country_code,
proxy.active,
proxy.host,
proxy.port
]);
if (updateResult.rowCount > 0) {
updated++;
if (updated % 100 === 0) {
console.log(` ✅ Updated ${updated}/${result.rows.length}...`);
}
} else {
notFound++;
console.log(` ⚠️ Proxy not found in DUTCHIE: ${proxy.host}:${proxy.port}`);
}
}
console.log(`\n📊 Results:`);
console.log(` ✅ Updated: ${updated}`);
console.log(` ⚠️ Not found: ${notFound}`);
// Verify final counts
const dutchieCheck = await dutchiePool.query(`
SELECT
COUNT(*) as total,
COUNT(*) FILTER (WHERE city IS NOT NULL) as with_location
FROM proxies
`);
console.log(`\n✅ DUTCHIE database now has:`);
console.log(` Total proxies: ${dutchieCheck.rows[0].total}`);
console.log(` With location data: ${dutchieCheck.rows[0].with_location}`);
} finally {
await sailPool.end();
await dutchiePool.end();
}
}
migrateProxyLocations().catch(console.error);

View File

@@ -0,0 +1,204 @@
import { pool } from './src/db/migrate';
import * as fs from 'fs';
async function parseAZDHSCopiedData() {
console.log('📋 Parsing manually copied AZDHS data...\n');
const fileContent = fs.readFileSync('/home/kelly/Documents/azdhs dispos', 'utf-8');
const lines = fileContent.split('\n').map(l => l.trim()).filter(l => l.length > 0);
const dispensaries: any[] = [];
let i = 0;
while (i < lines.length) {
// Skip if we hit "Get Details" without processing (edge case)
if (lines[i] === 'Get Details') {
i++;
continue;
}
let name = lines[i];
let companyName = '';
let statusLine = '';
let address = '';
let linesConsumed = 0;
// Check if next line is "Get Details" (edge case: end of data)
if (i + 1 < lines.length && lines[i + 1] === 'Get Details') {
i += 2;
continue;
}
// Two possible patterns:
// Pattern 1 (5 lines): Name, Company, Status, Address, Get Details
// Pattern 2 (4 lines): Name, Status, Address, Get Details (company name = dispensary name)
// Check if line i+1 contains "Operating" (status line)
const nextLine = lines[i + 1];
if (nextLine && nextLine.includes('Operating')) {
// Pattern 2: 4-line format
companyName = name; // Company name same as dispensary name
statusLine = lines[i + 1];
address = lines[i + 2];
const getDetails = lines[i + 3];
if (getDetails !== 'Get Details') {
console.log(`⚠️ Skipping malformed 4-line record at line ${i}: ${name}`);
i++;
continue;
}
linesConsumed = 4;
} else {
// Pattern 1: 5-line format
companyName = lines[i + 1];
statusLine = lines[i + 2];
address = lines[i + 3];
const getDetails = lines[i + 4];
if (getDetails !== 'Get Details') {
console.log(`⚠️ Skipping malformed 5-line record at line ${i}: ${name}`);
i++;
continue;
}
linesConsumed = 5;
}
// Parse phone from status line
let phone = '';
const phoneMatch = statusLine.match(/(\(\d{3}\)\s*\d{3}-\d{4}|\d{3}-\d{3}-\d{4}|\d{10})/);
if (phoneMatch) {
phone = phoneMatch[1].replace(/\D/g, ''); // Remove all non-digits
}
// Parse address components
// Format: "123 Street Name, City, AZ 85001"
let street = '', city = '', state = 'AZ', zip = '';
const addressParts = address.split(',').map(p => p.trim());
if (addressParts.length >= 3) {
street = addressParts.slice(0, -2).join(', '); // Everything before city
city = addressParts[addressParts.length - 2]; // Second to last
// Last part should be "AZ 85001"
const stateZip = addressParts[addressParts.length - 1];
const stateZipMatch = stateZip.match(/([A-Z]{2})\s+(\d{5})/);
if (stateZipMatch) {
state = stateZipMatch[1];
zip = stateZipMatch[2];
}
} else if (addressParts.length === 2) {
street = addressParts[0];
const cityStateZip = addressParts[1];
// Try to extract "City, AZ 85001" from second part
const match = cityStateZip.match(/([^,]+),?\s+([A-Z]{2})\s+(\d{5})/);
if (match) {
city = match[1].trim();
state = match[2];
zip = match[3];
}
} else {
// Single part address - try best effort
street = address;
const zipMatch = address.match(/\b(\d{5})\b/);
if (zipMatch) zip = zipMatch[1];
const cityMatch = address.match(/,\s*([A-Za-z\s]+),\s*AZ/);
if (cityMatch) city = cityMatch[1].trim();
}
dispensaries.push({
name,
companyName,
phone,
street,
city,
state,
zip,
statusLine
});
// Move to next record
i += linesConsumed;
}
console.log(`✅ Parsed ${dispensaries.length} dispensaries\n`);
if (dispensaries.length > 0) {
console.log('📋 Sample of first 5:');
console.table(dispensaries.slice(0, 5).map(d => ({
name: d.name.substring(0, 30),
city: d.city,
phone: d.phone,
zip: d.zip
})));
}
// Save to database
console.log('\n💾 Saving to azdhs_list table...\n');
let savedCount = 0;
let updatedCount = 0;
let skippedCount = 0;
for (const disp of dispensaries) {
if (!disp.name || disp.name.length < 3) {
skippedCount++;
continue;
}
try {
// Check if exists by name + address + state (to handle multiple locations with same name)
const existing = await pool.query(
'SELECT id FROM azdhs_list WHERE LOWER(name) = LOWER($1) AND LOWER(address) = LOWER($2) AND state = $3',
[disp.name, disp.street, disp.state]
);
const slug = disp.name.toLowerCase().replace(/[^a-z0-9]+/g, '-');
const azdhsUrl = `https://azcarecheck.azdhs.gov/s/?name=${encodeURIComponent(disp.name)}`;
if (existing.rows.length > 0) {
await pool.query(`
UPDATE azdhs_list SET
company_name = COALESCE($1, company_name),
address = COALESCE($2, address),
city = COALESCE($3, city),
zip = COALESCE($4, zip),
phone = COALESCE($5, phone),
status_line = COALESCE($6, status_line),
updated_at = CURRENT_TIMESTAMP
WHERE id = $7
`, [disp.companyName, disp.street, disp.city, disp.zip, disp.phone, disp.statusLine, existing.rows[0].id]);
updatedCount++;
} else {
await pool.query(`
INSERT INTO azdhs_list (
name, company_name, slug, address, city, state, zip, phone, status_line, azdhs_url,
created_at, updated_at
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP)
`, [disp.name, disp.companyName, slug, disp.street, disp.city, disp.state, disp.zip, disp.phone, disp.statusLine, azdhsUrl]);
savedCount++;
}
} catch (error) {
console.error(`Error saving ${disp.name}: ${error}`);
skippedCount++;
}
}
console.log(`\n✅ Saved ${savedCount} new AZDHS dispensaries`);
console.log(`✅ Updated ${updatedCount} existing AZDHS dispensaries`);
if (skippedCount > 0) console.log(`⚠️ Skipped ${skippedCount} entries`);
// Show total in azdhs_list
const total = await pool.query(`SELECT COUNT(*) as total FROM azdhs_list`);
console.log(`\n🎯 Total in azdhs_list table: ${total.rows[0].total}`);
// Show total in stores (for comparison)
const storesTotal = await pool.query(`SELECT COUNT(*) as total FROM stores WHERE state = 'AZ'`);
console.log(`🎯 Total in stores table (AZ): ${storesTotal.rows[0].total}`);
await pool.end();
}
parseAZDHSCopiedData();

View File

@@ -0,0 +1,181 @@
import { firefox } from 'playwright';
import { pool } from './src/db/migrate.js';
import { getRandomProxy } from './src/utils/proxyManager.js';
const dispensaryId = parseInt(process.argv[2] || '112', 10);
interface Brand {
slug: string;
name: string;
url: string;
}
async function scrapeBrandsList(menuUrl: string, context: any, page: any): Promise<Brand[]> {
try {
const brandsUrl = `${menuUrl}/brands`;
console.log(`📄 Loading brands page: ${brandsUrl}`);
await page.goto(brandsUrl, {
waitUntil: 'domcontentloaded',
timeout: 60000
});
console.log(`⏳ Waiting for brands to render...`);
await page.waitForSelector('a[href*="/brands/"]', { timeout: 45000 });
console.log(`✅ Brands appeared!`);
await page.waitForTimeout(3000);
// Scroll to load all brands
console.log(`📜 Scrolling to load all brands...`);
let previousHeight = 0;
let scrollAttempts = 0;
const maxScrolls = 10;
while (scrollAttempts < maxScrolls) {
await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
await page.waitForTimeout(1500);
const currentHeight = await page.evaluate(() => document.body.scrollHeight);
if (currentHeight === previousHeight) break;
previousHeight = currentHeight;
scrollAttempts++;
}
// Extract all brand links
const brands = await page.evaluate(() => {
const brandLinks = Array.from(document.querySelectorAll('a[href*="/brands/"]'));
const extracted = brandLinks.map(link => {
const href = link.getAttribute('href') || '';
const slug = href.split('/brands/')[1]?.replace(/\/$/, '') || '';
// Get brand name from the ContentWrapper div to avoid placeholder letter duplication
const contentWrapper = link.querySelector('[class*="ContentWrapper"]');
const name = contentWrapper?.textContent?.trim() || link.textContent?.trim() || slug;
return {
slug,
name,
url: href.startsWith('http') ? href : href
};
});
// Filter out duplicates and invalid entries
const seen = new Set();
const unique = extracted.filter(b => {
if (!b.slug || !b.name || seen.has(b.slug)) return false;
seen.add(b.slug);
return true;
});
return unique;
});
console.log(`✅ Found ${brands.length} total brands`);
return brands;
} catch (error: any) {
console.error(`❌ Error scraping brands list:`, error.message);
return [];
}
}
async function main() {
console.log(`\n${'='.repeat(60)}`);
console.log(`🏭 POPULATING JOB QUEUE`);
console.log(` Dispensary ID: ${dispensaryId}`);
console.log(`${'='.repeat(60)}\n`);
// Get dispensary info
const dispensaryResult = await pool.query(
"SELECT id, name, menu_url FROM dispensaries WHERE id = $1",
[dispensaryId]
);
if (dispensaryResult.rows.length === 0) {
console.error(`❌ Dispensary ID ${dispensaryId} not found`);
process.exit(1);
}
const menuUrl = dispensaryResult.rows[0].menu_url;
console.log(`✅ Dispensary: ${dispensaryResult.rows[0].name}`);
console.log(` Menu URL: ${menuUrl}\n`);
// Get proxy
const proxy = await getRandomProxy();
if (!proxy) {
console.log(`❌ No proxy available`);
process.exit(1);
}
console.log(`🔐 Using proxy: ${proxy.server}\n`);
// Launch browser
const browser = await firefox.launch({ headless: true });
const context = await browser.newContext({
viewport: { width: 1920, height: 1080 },
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
proxy: {
server: proxy.server,
username: proxy.username,
password: proxy.password
}
});
const page = await context.newPage();
// Get all brands
const allBrands = await scrapeBrandsList(menuUrl, context, page);
if (allBrands.length === 0) {
console.log(`❌ No brands found`);
await browser.close();
process.exit(1);
}
console.log(`\n📋 Found ${allBrands.length} brands. Populating job queue...\n`);
// Insert jobs into database
let inserted = 0;
let skipped = 0;
for (const brand of allBrands) {
try {
await pool.query(`
INSERT INTO brand_scrape_jobs (dispensary_id, brand_slug, brand_name, status)
VALUES ($1, $2, $3, 'pending')
ON CONFLICT (dispensary_id, brand_slug) DO NOTHING
`, [dispensaryId, brand.slug, brand.name]);
const result = await pool.query(
'SELECT id FROM brand_scrape_jobs WHERE dispensary_id = $1 AND brand_slug = $2',
[dispensaryId, brand.slug]
);
if (result.rows.length > 0) {
inserted++;
if (inserted % 10 === 0) {
console.log(` Inserted ${inserted}/${allBrands.length} jobs...`);
}
} else {
skipped++;
}
} catch (error: any) {
console.error(`❌ Error inserting job for ${brand.name}:`, error.message);
}
}
console.log(`\n${'='.repeat(60)}`);
console.log(`✅ JOB QUEUE POPULATED`);
console.log(` Total brands: ${allBrands.length}`);
console.log(` Jobs inserted: ${inserted}`);
console.log(` Jobs skipped (already exist): ${skipped}`);
console.log(`${'='.repeat(60)}\n`);
await browser.close();
await pool.end();
}
main().catch(console.error);

View File

@@ -0,0 +1,155 @@
import { pool } from './src/db/migrate';
import { logger } from './src/services/logger';
interface GeoLocation {
status: string;
country: string;
countryCode: string;
region: string;
regionName: string;
city: string;
lat: number;
lon: number;
query: string;
}
/**
* Fetch geolocation data from ip-api.com (free, 45 req/min)
*/
async function fetchGeoLocation(ip: string): Promise<GeoLocation | null> {
try {
const response = await fetch(`http://ip-api.com/json/${ip}?fields=status,country,countryCode,region,regionName,city,lat,lon,query`);
const data = await response.json();
if (data.status === 'success') {
return data as GeoLocation;
}
logger.warn('geo', `Failed to lookup ${ip}: ${data.message || 'Unknown error'}`);
return null;
} catch (error) {
logger.error('geo', `Error fetching geolocation for ${ip}: ${error}`);
return null;
}
}
/**
* Fetch geolocation data in batches (up to 100 IPs at once)
*/
async function fetchGeoLocationBatch(ips: string[]): Promise<Map<string, GeoLocation>> {
try {
const response = await fetch('http://ip-api.com/batch?fields=status,country,countryCode,region,regionName,city,lat,lon,query', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify(ips),
});
const data = await response.json() as GeoLocation[];
const results = new Map<string, GeoLocation>();
for (const item of data) {
if (item.status === 'success') {
results.set(item.query, item);
}
}
return results;
} catch (error) {
logger.error('geo', `Error fetching batch geolocation: ${error}`);
return new Map();
}
}
/**
* Sleep for specified milliseconds
*/
function sleep(ms: number): Promise<void> {
return new Promise(resolve => setTimeout(resolve, ms));
}
async function populateProxyLocations() {
console.log('🌍 Populating proxy geolocation data...\n');
try {
// Get all proxies that don't have location data
const result = await pool.query(
'SELECT id, host FROM proxies WHERE city IS NULL OR country IS NULL ORDER BY id'
);
const proxies = result.rows;
console.log(`Found ${proxies.length} proxies without geolocation data\n`);
if (proxies.length === 0) {
console.log('✅ All proxies already have geolocation data!');
await pool.end();
return;
}
// Process in batches of 100 (API limit)
const batchSize = 100;
let processed = 0;
let updated = 0;
for (let i = 0; i < proxies.length; i += batchSize) {
const batch = proxies.slice(i, i + batchSize);
const ips = batch.map(p => p.host);
console.log(`📍 Processing batch ${Math.floor(i / batchSize) + 1}/${Math.ceil(proxies.length / batchSize)} (${ips.length} IPs)...`);
const geoData = await fetchGeoLocationBatch(ips);
// Update database for each successful lookup
for (const proxy of batch) {
const geo = geoData.get(proxy.host);
if (geo) {
await pool.query(
`UPDATE proxies
SET city = $1, state = $2, country = $3, country_code = $4, latitude = $5, longitude = $6, updated_at = NOW()
WHERE id = $7`,
[geo.city, geo.regionName, geo.country, geo.countryCode, geo.lat, geo.lon, proxy.id]
);
updated++;
console.log(`${proxy.host} -> ${geo.city}, ${geo.regionName}, ${geo.country}`);
} else {
console.log(`${proxy.host} -> Lookup failed`);
}
processed++;
}
// Rate limiting: wait 1.5 seconds between batches (40 requests/min max)
if (i + batchSize < proxies.length) {
console.log('⏳ Waiting 1.5s to respect rate limits...\n');
await sleep(1500);
}
}
console.log(`\n✅ Completed!`);
console.log(` Processed: ${processed} proxies`);
console.log(` Updated: ${updated} proxies`);
console.log(` Failed: ${processed - updated} proxies\n`);
// Show location distribution
const locationStats = await pool.query(`
SELECT country, state, city, COUNT(*) as count
FROM proxies
WHERE country IS NOT NULL
GROUP BY country, state, city
ORDER BY count DESC
LIMIT 20
`);
console.log('📊 Top 20 proxy locations:');
console.table(locationStats.rows);
} catch (error) {
console.error('❌ Error:', error);
} finally {
await pool.end();
}
}
populateProxyLocations();

View File

@@ -0,0 +1,165 @@
import * as fs from 'fs';
async function previewImport() {
console.log('📋 Preview of AZDHS import data...\n');
const fileContent = fs.readFileSync('/home/kelly/Documents/azdhs dispos', 'utf-8');
const lines = fileContent.split('\n').map(l => l.trim()).filter(l => l.length > 0);
const dispensaries: any[] = [];
let i = 0;
while (i < lines.length) {
if (lines[i] === 'Get Details') {
i++;
continue;
}
if (i + 1 < lines.length && lines[i + 1] === 'Get Details') {
i += 2;
continue;
}
let name = lines[i];
let companyName = '';
let statusLine = '';
let address = '';
let linesConsumed = 0;
const nextLine = lines[i + 1] || '';
if (nextLine.includes('Operating')) {
companyName = name;
statusLine = lines[i + 1];
address = lines[i + 2];
const getDetails = lines[i + 3];
if (getDetails !== 'Get Details') {
i++;
continue;
}
linesConsumed = 4;
} else {
companyName = lines[i + 1];
statusLine = lines[i + 2];
address = lines[i + 3];
const getDetails = lines[i + 4];
if (getDetails !== 'Get Details') {
i++;
continue;
}
linesConsumed = 5;
}
// Parse phone from status line
let phone = '';
const phoneMatch = statusLine.match(/(\(\d{3}\)\s*\d{3}-\d{4}|\d{3}-\d{3}-\d{4}|\d{10})/);
if (phoneMatch) {
phone = phoneMatch[1].replace(/\D/g, '');
}
// Parse operating status and entity type
const statusParts = statusLine.split('·').map(p => p.trim());
const operatingStatus = statusParts[0] || '';
const entityType = statusParts[1] || '';
// Parse address components
let street = '', city = '', state = 'AZ', zip = '';
const addressParts = address.split(',').map(p => p.trim());
if (addressParts.length >= 3) {
street = addressParts.slice(0, -2).join(', ');
city = addressParts[addressParts.length - 2];
const stateZip = addressParts[addressParts.length - 1];
const stateZipMatch = stateZip.match(/([A-Z]{2})\s+(\d{5})/);
if (stateZipMatch) {
state = stateZipMatch[1];
zip = stateZipMatch[2];
}
} else if (addressParts.length === 2) {
street = addressParts[0];
const cityStateZip = addressParts[1];
const match = cityStateZip.match(/([^,]+),?\s+([A-Z]{2})\s+(\d{5})/);
if (match) {
city = match[1].trim();
state = match[2];
zip = match[3];
}
} else {
street = address;
const zipMatch = address.match(/\b(\d{5})\b/);
if (zipMatch) zip = zipMatch[1];
const cityMatch = address.match(/,\s*([A-Za-z\s]+),\s*AZ/);
if (cityMatch) city = cityMatch[1].trim();
}
dispensaries.push({
name,
companyName,
phone,
street,
city,
state,
zip,
operatingStatus,
entityType
});
i += linesConsumed;
}
console.log(`✅ Found ${dispensaries.length} dispensaries\n`);
// Show first 10
console.log('📋 First 10 dispensaries:\n');
console.table(dispensaries.slice(0, 10).map(d => ({
name: d.name.substring(0, 30),
company: d.companyName.substring(0, 30),
city: d.city,
phone: d.phone,
status: d.operatingStatus,
type: d.entityType.substring(0, 20)
})));
// Show last 10
console.log('\n📋 Last 10 dispensaries:\n');
console.table(dispensaries.slice(-10).map(d => ({
name: d.name.substring(0, 30),
company: d.companyName.substring(0, 30),
city: d.city,
phone: d.phone,
status: d.operatingStatus,
type: d.entityType.substring(0, 20)
})));
// Show counts by city
const cityCounts: any = {};
dispensaries.forEach(d => {
cityCounts[d.city] = (cityCounts[d.city] || 0) + 1;
});
console.log('\n📊 Dispensaries by city (top 10):\n');
const sortedCities = Object.entries(cityCounts)
.sort((a: any, b: any) => b[1] - a[1])
.slice(0, 10);
console.table(sortedCities.map(([city, count]) => ({ city, count })));
// Show entity types
const entityTypes: any = {};
dispensaries.forEach(d => {
entityTypes[d.entityType] = (entityTypes[d.entityType] || 0) + 1;
});
console.log('\n📊 Dispensaries by entity type:\n');
console.table(Object.entries(entityTypes).map(([type, count]) => ({ type, count })));
// Show operating status
const statuses: any = {};
dispensaries.forEach(d => {
statuses[d.operatingStatus] = (statuses[d.operatingStatus] || 0) + 1;
});
console.log('\n📊 Dispensaries by operating status:\n');
console.table(Object.entries(statuses).map(([status, count]) => ({ status, count })));
}
previewImport();

View File

@@ -0,0 +1,29 @@
import { pool } from './src/db/migrate.js';
async function resetJobs() {
const result = await pool.query(`
UPDATE brand_scrape_jobs
SET status = 'pending',
worker_id = NULL,
started_at = NULL,
completed_at = NULL,
products_found = 0,
products_saved = 0,
error_message = NULL,
retry_count = 0
WHERE dispensary_id = 112
`);
console.log(`✅ Reset ${result.rowCount} jobs to pending`);
const count = await pool.query(
'SELECT COUNT(*) FROM brand_scrape_jobs WHERE dispensary_id = 112 AND status = $1',
['pending']
);
console.log(`📊 Total pending jobs: ${count.rows[0].count}`);
await pool.end();
}
resetJobs();

View File

@@ -0,0 +1,43 @@
import { pool } from './src/db/migrate.js';
async function main() {
// Check job status
const statusResult = await pool.query(`
SELECT status, COUNT(*) as count
FROM brand_scrape_jobs
WHERE dispensary_id = 112
GROUP BY status
ORDER BY status
`);
console.log('\nJob Status Distribution:');
statusResult.rows.forEach(row => {
console.log(` ${row.status}: ${row.count}`);
});
// Reset one completed job to pending
const resetResult = await pool.query(`
UPDATE brand_scrape_jobs
SET status = 'pending',
worker_id = NULL,
started_at = NULL,
completed_at = NULL,
products_found = NULL,
products_saved = NULL,
updated_at = NOW()
WHERE dispensary_id = 112
AND status = 'completed'
AND brand_slug = 'select'
RETURNING id, brand_name, status
`);
if (resetResult.rows.length > 0) {
console.log(`\nReset job: ${resetResult.rows[0].brand_name} (ID: ${resetResult.rows[0].id})`);
} else {
console.log('\nNo job reset (Select brand not found in completed jobs)');
}
await pool.end();
}
main().catch(console.error);

View File

@@ -0,0 +1,45 @@
const { Pool } = require('pg');
const pool = new Pool({
connectionString: process.env.DATABASE_URL || 'postgresql://kelly:kelly@localhost:5432/hub'
});
(async () => {
try {
console.log('🔄 Running location migration...');
await pool.query(`
ALTER TABLE proxies
ADD COLUMN IF NOT EXISTS city VARCHAR(100),
ADD COLUMN IF NOT EXISTS state VARCHAR(100),
ADD COLUMN IF NOT EXISTS country VARCHAR(100),
ADD COLUMN IF NOT EXISTS country_code VARCHAR(2),
ADD COLUMN IF NOT EXISTS location_updated_at TIMESTAMP
`);
console.log('✅ Added columns to proxies table');
await pool.query(`
CREATE INDEX IF NOT EXISTS idx_proxies_location ON proxies(country_code, state, city)
`);
console.log('✅ Created location index');
await pool.query(`
ALTER TABLE failed_proxies
ADD COLUMN IF NOT EXISTS city VARCHAR(100),
ADD COLUMN IF NOT EXISTS state VARCHAR(100),
ADD COLUMN IF NOT EXISTS country VARCHAR(100),
ADD COLUMN IF NOT EXISTS country_code VARCHAR(2),
ADD COLUMN IF NOT EXISTS location_updated_at TIMESTAMP
`);
console.log('✅ Added columns to failed_proxies table');
console.log('✅ Migration complete!');
process.exit(0);
} catch (error) {
console.error('❌ Migration failed:', error);
process.exit(1);
}
})();

View File

@@ -0,0 +1,28 @@
const { Pool } = require('pg');
const fs = require('fs');
const path = require('path');
const pool = new Pool({
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
});
async function runMigration() {
const client = await pool.connect();
try {
const migrationPath = '/home/kelly/dutchie-menus/backend/migrations/010_store_logo.sql';
const sql = fs.readFileSync(migrationPath, 'utf8');
console.log('Running migration: 010_store_logo.sql');
await client.query(sql);
console.log('✅ Migration completed successfully');
} catch (error) {
console.error('❌ Migration failed:', error);
} finally {
client.release();
pool.end();
}
}
runMigration();

View File

@@ -0,0 +1,18 @@
import { pool } from './src/db/migrate.js';
import fs from 'fs';
async function runMigration() {
try {
const sql = fs.readFileSync('/home/kelly/dutchie-menus/backend/migrations/add-stock-columns.sql', 'utf8');
console.log('Running migration: add-stock-columns.sql');
await pool.query(sql);
console.log('✅ Migration complete - stock_quantity and stock_status columns added');
await pool.end();
} catch (error: any) {
console.error('❌ Migration failed:', error.message);
await pool.end();
process.exit(1);
}
}
runMigration();

View File

@@ -0,0 +1,18 @@
import { pool } from './src/db/migrate.js';
import fs from 'fs';
async function runMigration() {
try {
const sql = fs.readFileSync('/home/kelly/dutchie-menus/backend/migrations/create-scrape-jobs.sql', 'utf8');
console.log('Running migration: create-scrape-jobs.sql');
await pool.query(sql);
console.log('✅ Migration complete - brand_scrape_jobs table created');
await pool.end();
} catch (error: any) {
console.error('❌ Migration failed:', error.message);
await pool.end();
process.exit(1);
}
}
runMigration();

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,147 @@
import puppeteer from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
import { Pool } from 'pg';
puppeteer.use(StealthPlugin());
const pool = new Pool({
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
});
async function main() {
let browser;
try {
console.log('STEP 2: Getting random proxy from pool...');
const proxyResult = await pool.query(`
SELECT host, port, protocol FROM proxies
ORDER BY RANDOM() LIMIT 1
`);
const proxy = proxyResult.rows[0];
console.log(`✅ Selected proxy: ${proxy.host}:${proxy.port}\n`);
console.log('STEP 3: Launching browser with proxy + anti-fingerprint...');
const proxyUrl = `${proxy.protocol}://${proxy.host}:${proxy.port}`;
browser = await puppeteer.launch({
headless: true,
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
`--proxy-server=${proxyUrl}`,
'--disable-blink-features=AutomationControlled'
]
});
const page = await browser.newPage();
// Set Googlebot user-agent
await page.setUserAgent('Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)');
console.log('✅ Set UA to Googlebot\n');
// Anti-fingerprint: spoof timezone, geolocation, remove webdriver
await page.evaluateOnNewDocument(() => {
// Timezone (Arizona)
Object.defineProperty(Intl.DateTimeFormat.prototype, 'resolvedOptions', {
value: function() { return { timeZone: 'America/Phoenix' }; }
});
// Geolocation (Phoenix)
Object.defineProperty(navigator, 'geolocation', {
get: () => ({
getCurrentPosition: (success: any) => {
setTimeout(() => success({
coords: { latitude: 33.4484, longitude: -112.0740, accuracy: 100 }
}), 100);
}
})
});
// Remove webdriver
Object.defineProperty(navigator, 'webdriver', { get: () => false });
});
console.log('✅ Fingerprint spoofed (timezone=Arizona, geo=Phoenix, webdriver=hidden)\n');
console.log('STEP 4: Navigating to Curaleaf Phoenix Airport brands page...');
const url = 'https://curaleaf.com/stores/curaleaf-dispensary-phoenix-airport/brands';
console.log(`URL: ${url}\n`);
await page.goto(url, { waitUntil: 'networkidle2', timeout: 60000 });
await page.waitForTimeout(5000);
console.log('STEP 5: Scraping brand data from page...');
// Get page info for debugging
const pageInfo = await page.evaluate(() => ({
title: document.title,
url: window.location.href,
bodyLength: document.body.innerHTML.length
}));
console.log(`Page title: "${pageInfo.title}"`);
console.log(`Current URL: ${pageInfo.url}`);
console.log(`Body HTML length: ${pageInfo.bodyLength} chars\n`);
// Scrape brands
const brands = await page.evaluate(() => {
// Try multiple selectors
const selectors = [
'[data-testid*="brand"]',
'[class*="Brand"]',
'[class*="brand"]',
'a[href*="/brand/"]',
'.brand-card',
'.brand-item'
];
const found = new Set<string>();
selectors.forEach(selector => {
document.querySelectorAll(selector).forEach(el => {
const text = el.textContent?.trim();
if (text && text.length > 0 && text.length < 50) {
found.add(text);
}
});
});
return Array.from(found);
});
console.log(`✅ Found ${brands.length} brands:\n`);
brands.forEach((b, i) => console.log(` ${i + 1}. ${b}`));
if (brands.length === 0) {
console.log('\n⚠ No brands found. Possible reasons:');
console.log(' - IP/proxy is blocked');
console.log(' - Page requires different selectors');
console.log(' - Brands load asynchronously');
return;
}
console.log('\n\nSTEP 6: Saving brands to database...');
let saved = 0;
for (const brand of brands) {
try {
await pool.query(`
INSERT INTO products (store_id, name, brand, dutchie_url, in_stock)
VALUES (1, $1, $2, $3, true)
ON CONFLICT (store_id, name, brand) DO NOTHING
`, [`${brand} Product`, brand, url]);
saved++;
} catch (e) {}
}
console.log(`✅ Saved ${saved} brands to database\n`);
} catch (error: any) {
console.error('❌ ERROR:', error.message);
} finally {
if (browser) await browser.close();
await pool.end();
}
}
main();

View File

@@ -0,0 +1,212 @@
import { chromium } from 'playwright-extra';
import stealth from 'puppeteer-extra-plugin-stealth';
import { pool } from './src/db/migrate';
chromium.use(stealth());
async function scrapeAZDHSAPI() {
console.log('🏛️ Scraping AZDHS via API interception...\n');
const browser = await chromium.launch({
headless: false,
});
const context = await browser.newContext({
viewport: { width: 1920, height: 1080 },
});
const page = await context.newPage();
// Capture ALL API responses
const allResponses: any[] = [];
page.on('response', async (response) => {
const url = response.url();
const contentType = response.headers()['content-type'] || '';
// Only capture JSON responses from azcarecheck domain
if (url.includes('azcarecheck.azdhs.gov') && contentType.includes('json')) {
try {
const json = await response.json();
console.log(`📡 Captured JSON from: ${url.substring(0, 80)}...`);
allResponses.push({
url,
data: json,
status: response.status()
});
} catch (e) {
// Not valid JSON or couldn't parse
}
}
});
try {
console.log('📄 Loading AZDHS page...');
await page.goto('https://azcarecheck.azdhs.gov/s/?facilityId=001t000000L0TApAAN', {
waitUntil: 'networkidle',
timeout: 120000
});
console.log('⏳ Waiting 60 seconds to capture all API calls...\n');
await page.waitForTimeout(60000);
console.log(`\n📊 Captured ${allResponses.length} JSON API responses\n`);
// Analyze responses to find dispensary data
let dispensaryData: any[] = [];
for (const resp of allResponses) {
const data = resp.data;
// Look for arrays that might contain dispensary data
const checkForDispensaries = (obj: any, path = ''): any[] => {
if (Array.isArray(obj) && obj.length > 50) {
// Check if array items look like dispensaries
const sample = obj[0];
if (sample && typeof sample === 'object') {
const keys = Object.keys(sample);
if (keys.some(k => k.toLowerCase().includes('name') ||
k.toLowerCase().includes('address') ||
k.toLowerCase().includes('facility'))) {
console.log(` ✅ Found potential dispensary array at ${path} with ${obj.length} items`);
console.log(` Sample keys: ${keys.slice(0, 10).join(', ')}`);
return obj;
}
}
}
if (typeof obj === 'object' && obj !== null) {
for (const [key, value] of Object.entries(obj)) {
const result = checkForDispensaries(value, `${path}.${key}`);
if (result.length > 0) return result;
}
}
return [];
};
const found = checkForDispensaries(data);
if (found.length > 0) {
dispensaryData = found;
console.log(`\n🎯 Found dispensary data! ${found.length} entries`);
console.log(` URL: ${resp.url}\n`);
// Show sample of first entry
console.log('📋 Sample entry:');
console.log(JSON.stringify(found[0], null, 2).substring(0, 500));
break;
}
}
if (dispensaryData.length === 0) {
console.log('❌ Could not find dispensary data in API responses\n');
console.log('🔍 All captured URLs:');
allResponses.forEach((r, i) => {
console.log(` ${i + 1}. ${r.url}`);
});
// Save raw responses for manual inspection
console.log('\n💾 Saving raw API responses to /tmp/azdhs-api-responses.json for inspection...');
const fs = require('fs');
fs.writeFileSync('/tmp/azdhs-api-responses.json', JSON.stringify(allResponses, null, 2));
await browser.close();
await pool.end();
return;
}
// Save to database
console.log('\n💾 Saving AZDHS dispensaries to database...\n');
let savedCount = 0;
let updatedCount = 0;
let skippedCount = 0;
for (const item of dispensaryData) {
try {
// Extract fields - need to inspect the actual structure
// Common Salesforce field patterns: Name, Name__c, FacilityName, etc.
const name = item.Name || item.name || item.FacilityName || item.facility_name ||
item.Name__c || item.dispensaryName || item.BusinessName;
const address = item.Address || item.address || item.Street || item.street ||
item.Address__c || item.StreetAddress || item.street_address;
const city = item.City || item.city || item.City__c;
const state = item.State || item.state || item.State__c || 'AZ';
const zip = item.Zip || item.zip || item.ZipCode || item.zip_code || item.PostalCode || item.Zip__c;
const phone = item.Phone || item.phone || item.PhoneNumber || item.phone_number || item.Phone__c;
const email = item.Email || item.email || item.Email__c;
const lat = item.Latitude || item.latitude || item.lat || item.Latitude__c;
const lng = item.Longitude || item.longitude || item.lng || item.lon || item.Longitude__c;
if (!name || name.length < 3) {
skippedCount++;
continue;
}
// Check if exists
const existing = await pool.query(
'SELECT id FROM stores WHERE LOWER(name) = LOWER($1) AND state = $2 AND data_source = $3',
[name, state, 'azdhs']
);
const slug = name.toLowerCase().replace(/[^a-z0-9]+/g, '-');
const dutchieUrl = `https://azcarecheck.azdhs.gov/s/?name=${encodeURIComponent(name)}`;
if (existing.rows.length > 0) {
await pool.query(`
UPDATE stores SET
address = COALESCE($1, address),
city = COALESCE($2, city),
zip = COALESCE($3, zip),
phone = COALESCE($4, phone),
email = COALESCE($5, email),
latitude = COALESCE($6, latitude),
longitude = COALESCE($7, longitude),
updated_at = CURRENT_TIMESTAMP
WHERE id = $8
`, [address, city, zip, phone, email, lat, lng, existing.rows[0].id]);
updatedCount++;
} else {
await pool.query(`
INSERT INTO stores (
name, slug, dutchie_url, address, city, state, zip, phone, email,
latitude, longitude, data_source, active, created_at, updated_at
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, 'azdhs', true, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP)
`, [name, slug, dutchieUrl, address, city, state, zip, phone, email, lat, lng]);
savedCount++;
}
} catch (error) {
console.error(`Error saving: ${error}`);
skippedCount++;
}
}
console.log(`\n✅ Saved ${savedCount} new AZDHS dispensaries`);
console.log(`✅ Updated ${updatedCount} existing AZDHS dispensaries`);
if (skippedCount > 0) console.log(`⚠️ Skipped ${skippedCount} entries`);
// Show totals by source
const totals = await pool.query(`
SELECT data_source, COUNT(*) as count
FROM stores
WHERE state = 'AZ'
GROUP BY data_source
ORDER BY data_source
`);
console.log('\n📊 Arizona dispensaries by source:');
console.table(totals.rows);
} catch (error) {
console.error(`❌ Error: ${error}`);
throw error;
} finally {
await browser.close();
await pool.end();
}
}
scrapeAZDHSAPI();

View File

@@ -0,0 +1,193 @@
import { chromium } from 'playwright-extra';
import stealth from 'puppeteer-extra-plugin-stealth';
import { pool } from './src/db/migrate';
chromium.use(stealth());
async function scrapeAZDHSAuto() {
console.log('🏛️ Scraping AZDHS - Automatic Mode\n');
const browser = await chromium.launch({
headless: false, // Visible so you can see it working
});
const context = await browser.newContext({
viewport: { width: 1920, height: 1080 },
});
const page = await context.newPage();
try {
console.log('📄 Loading AZDHS page...');
await page.goto('https://azcarecheck.azdhs.gov/s/?facilityId=001t000000L0TApAAN', {
waitUntil: 'domcontentloaded',
timeout: 60000
});
console.log('⏳ Waiting 30 seconds for page to fully load and for you to scroll...\n');
await page.waitForTimeout(30000);
console.log('📦 Extracting all dispensaries from the page...\n');
// Extract all dispensaries
const dispensaries = await page.evaluate(() => {
const results: any[] = [];
// Look for all possible dispensary container elements
const containers = document.querySelectorAll(
'article, [class*="facility"], [class*="dispensary"], [class*="location"], ' +
'.slds-card, lightning-card, [data-id], [data-facility-id]'
);
containers.forEach((card) => {
const disp: any = {};
// Get all text from the card
const fullText = card.textContent?.trim() || '';
disp.rawText = fullText.substring(0, 500);
// Try various selectors for name
const nameSelectors = ['h3', 'h2', 'h4', '[class*="title"]', '[class*="name"]', 'strong', 'b'];
for (const selector of nameSelectors) {
const el = card.querySelector(selector);
if (el && el.textContent && el.textContent.trim().length > 3) {
disp.name = el.textContent.trim();
break;
}
}
// Extract phone
const phoneLink = card.querySelector('a[href^="tel:"]');
if (phoneLink) {
disp.phone = phoneLink.getAttribute('href')?.replace('tel:', '').replace(/\D/g, '');
} else {
// Look for phone pattern in text
const phoneMatch = fullText.match(/(\d{3}[-.]?\d{3}[-.]?\d{4})/);
if (phoneMatch) disp.phone = phoneMatch[1];
}
// Extract email
const emailLink = card.querySelector('a[href^="mailto:"]');
if (emailLink) {
disp.email = emailLink.getAttribute('href')?.replace('mailto:', '');
}
// Extract address - look for street address pattern
const addressMatch = fullText.match(/(\d+\s+[A-Za-z0-9\s]+(?:Street|St|Avenue|Ave|Road|Rd|Boulevard|Blvd|Drive|Dr|Lane|Ln|Way|Circle|Court|Parkway)\.?(?:\s+(?:Suite|Ste|Unit|#)\s*[\w-]+)?)/i);
if (addressMatch) {
disp.address = addressMatch[1].trim();
}
// Extract city
const cityMatch = fullText.match(/([A-Z][a-z]+(?:\s+[A-Z][a-z]+)*),\s*AZ/);
if (cityMatch) {
disp.city = cityMatch[1];
}
// Extract ZIP
const zipMatch = fullText.match(/\b(\d{5})(?:-\d{4})?\b/);
if (zipMatch) {
disp.zip = zipMatch[1];
}
// Only add if we found at least a name
if (disp.name && disp.name.length > 3) {
results.push(disp);
}
});
return results;
});
console.log(`✅ Found ${dispensaries.length} dispensary entries!\n`);
if (dispensaries.length > 0) {
console.log('📋 Sample of first 5:');
console.table(dispensaries.slice(0, 5).map(d => ({
name: d.name?.substring(0, 40),
phone: d.phone,
city: d.city,
})));
}
// Save to database
console.log('\n💾 Saving to database with data_source="azdhs"...\n');
let savedCount = 0;
let updatedCount = 0;
let skippedCount = 0;
for (const disp of dispensaries) {
if (!disp.name) {
skippedCount++;
continue;
}
try {
// Check if exists by name
const existing = await pool.query(
'SELECT id FROM stores WHERE LOWER(name) = LOWER($1) AND state = $2 AND data_source = $3',
[disp.name, 'AZ', 'azdhs']
);
const slug = disp.name.toLowerCase().replace(/[^a-z0-9]+/g, '-');
const dutchieUrl = `https://azcarecheck.azdhs.gov/s/?name=${encodeURIComponent(disp.name)}`;
if (existing.rows.length > 0) {
await pool.query(`
UPDATE stores SET
address = COALESCE($1, address),
city = COALESCE($2, city),
zip = COALESCE($3, zip),
phone = COALESCE($4, phone),
email = COALESCE($5, email),
updated_at = CURRENT_TIMESTAMP
WHERE id = $6
`, [disp.address, disp.city, disp.zip, disp.phone, disp.email, existing.rows[0].id]);
updatedCount++;
} else {
await pool.query(`
INSERT INTO stores (
name, slug, dutchie_url, address, city, state, zip, phone, email,
data_source, active, created_at, updated_at
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, 'azdhs', true, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP)
`, [disp.name, slug, dutchieUrl, disp.address, disp.city, 'AZ', disp.zip, disp.phone, disp.email]);
savedCount++;
}
} catch (error) {
console.error(`Error saving ${disp.name}: ${error}`);
skippedCount++;
}
}
console.log(`\n✅ Saved ${savedCount} new AZDHS dispensaries`);
console.log(`✅ Updated ${updatedCount} existing AZDHS dispensaries`);
if (skippedCount > 0) console.log(`⚠️ Skipped ${skippedCount} entries`);
// Show totals by source
const totals = await pool.query(`
SELECT data_source, COUNT(*) as count
FROM stores
WHERE state = 'AZ'
GROUP BY data_source
ORDER BY data_source
`);
console.log('\n📊 Arizona dispensaries by source:');
console.table(totals.rows);
console.log('\n✅ AZDHS scraping complete!');
} catch (error) {
console.error(`❌ Error: ${error}`);
throw error;
} finally {
console.log('\n👉 Browser will close in 5 seconds...');
await page.waitForTimeout(5000);
await browser.close();
await pool.end();
}
}
scrapeAZDHSAuto();

View File

@@ -0,0 +1,108 @@
import { chromium } from 'playwright-extra';
import stealth from 'puppeteer-extra-plugin-stealth';
import { pool } from './src/db/migrate';
chromium.use(stealth());
async function scrapeAZDHSBetter() {
console.log('🏛️ Scraping AZDHS official map (improved approach)...\n');
const browser = await chromium.launch({
headless: false,
});
const context = await browser.newContext({
viewport: { width: 1920, height: 1080 },
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
});
const page = await context.newPage();
// Capture API requests
const apiData: any[] = [];
page.on('response', async (response) => {
const url = response.url();
if (url.includes('dispensar') || url.includes('facility') || url.includes('location')) {
try {
const json = await response.json();
console.log(`📡 Captured API response from: ${url.substring(0, 100)}...`);
apiData.push({ url, data: json });
} catch (e) {
// Not JSON
}
}
});
try {
console.log('📄 Loading AZDHS page (waiting up to 60s for JavaScript)...');
await page.goto('https://azcarecheck.azdhs.gov/s/?facilityId=001t000000L0TApAAN', {
waitUntil: 'domcontentloaded',
timeout: 60000
});
// Wait longer for JavaScript to execute
console.log('⏳ Waiting 20 seconds for Salesforce to fully load...');
await page.waitForTimeout(20000);
// Try to find and click "View All" or expand the map
console.log('🔍 Looking for buttons to expand results...');
const viewAllButton = page.locator('button:has-text("View All"), button:has-text("Show All"), a:has-text("View All")').first();
if (await viewAllButton.isVisible().catch(() => false)) {
console.log(' ✅ Found View All button, clicking...');
await viewAllButton.click();
await page.waitForTimeout(5000);
}
// Try extracting data directly from page
console.log('\n📦 Extracting dispensary data from page...');
const dispensaries = await page.evaluate(() => {
const results: any[] = [];
// Look for various data patterns
const elements = document.querySelectorAll('[data-facility], [data-location], article, .facility, .location, .dispensary');
elements.forEach((el) => {
const text = el.textContent || '';
// Try to extract structured data
if (text.length > 20 && text.length < 500) {
// Look for name patterns
const nameMatch = text.match(/([A-Z][a-z]+(?:\s+[A-Z][a-z]+){1,5})/);
if (nameMatch) {
results.push({
rawText: text.substring(0, 200),
element: el.className,
});
}
}
});
return results;
});
console.log(`\n📊 Found ${dispensaries.length} potential dispensary elements`);
console.log(`📊 Captured ${apiData.length} API responses`);
if (apiData.length > 0) {
console.log('\n🎯 Analyzing API data...');
console.log(JSON.stringify(apiData[0], null, 2).substring(0, 1000));
}
if (dispensaries.length > 0) {
console.log('\n📋 Sample dispensary elements:');
console.log(dispensaries.slice(0, 3));
}
} catch (error) {
console.error(`❌ Error: ${error}`);
throw error;
} finally {
await browser.close();
await pool.end();
}
}
scrapeAZDHSBetter();

Some files were not shown because too many files have changed in this diff Show More