feat: Add stale process monitor, users route, landing page, archive old scripts
- Add backend stale process monitoring API (/api/stale-processes) - Add users management route - Add frontend landing page and stale process monitor UI on /scraper-tools - Move old development scripts to backend/archive/ - Update frontend build with new features 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
180
backend/archive/SESSION_SUMMARY.md
Normal file
180
backend/archive/SESSION_SUMMARY.md
Normal file
@@ -0,0 +1,180 @@
|
||||
# Session Summary - Age Gate Bypass Implementation
|
||||
|
||||
## Problem
|
||||
Brands weren't populating when clicking "Scrape Store" for Curaleaf dispensaries. Root cause: category discovery was finding 0 categories because age verification gates were blocking access to the Dutchie menu.
|
||||
|
||||
## What Was Accomplished
|
||||
|
||||
### 1. Created Universal Age Gate Bypass System
|
||||
**File:** `/home/kelly/dutchie-menus/backend/src/utils/age-gate.ts`
|
||||
|
||||
Three main functions:
|
||||
- `setAgeGateCookies(page, url, state)` - Sets age gate bypass cookies BEFORE navigation
|
||||
- `bypassAgeGate(page, state)` - Attempts to bypass age gate AFTER page load using multiple methods
|
||||
- `detectStateFromUrl(url)` - Auto-detects state from URL patterns
|
||||
|
||||
**Key Features:**
|
||||
- Multiple bypass strategies: custom dropdowns, standard selects, buttons, state cards
|
||||
- Enhanced React event dispatching (mousedown, mouseup, click, change, input)
|
||||
- Proper failure detection - checks final URL to verify success
|
||||
- Cookie-based approach as primary method
|
||||
|
||||
### 2. Updated Category Discovery
|
||||
**File:** `/home/kelly/dutchie-menus/backend/src/services/category-discovery.ts`
|
||||
|
||||
Changes at lines 120-129:
|
||||
```typescript
|
||||
// Set age gate bypass cookies BEFORE navigation
|
||||
const state = detectStateFromUrl(baseUrl);
|
||||
await setAgeGateCookies(page, baseUrl, state);
|
||||
|
||||
await page.goto(baseUrl, { waitUntil: 'domcontentloaded', timeout: 60000 });
|
||||
|
||||
// If age gate still appears, try to bypass it
|
||||
await bypassAgeGate(page, state);
|
||||
```
|
||||
|
||||
### 3. Updated Scraper
|
||||
**File:** `/home/kelly/dutchie-menus/backend/src/services/scraper.ts`
|
||||
|
||||
Changes at lines 379-392:
|
||||
- Imports setAgeGateCookies (line 10)
|
||||
- Sets cookies before navigation (line 381)
|
||||
- Attempts bypass if age gate still appears (line 392)
|
||||
|
||||
### 4. Test Scripts Created
|
||||
|
||||
**`test-improved-age-gate.ts`** - Tests cookie-based bypass approach
|
||||
**`capture-age-gate-cookies.ts`** - Opens visible browser to manually capture real cookies (requires X11)
|
||||
**`debug-age-gate-detailed.ts`** - Visible browser debugging
|
||||
**`debug-after-state-select.ts`** - Checks state after selecting Arizona
|
||||
|
||||
## Current Status
|
||||
|
||||
### ✅ Working
|
||||
1. Age gate detection properly identifies gates
|
||||
2. Cookie setting function works (cookies are set correctly)
|
||||
3. Multiple bypass methods attempt to click through gates
|
||||
4. Failure detection now accurately reports when bypass fails
|
||||
5. Category discovery successfully created 10 categories for Curaleaf store (ID 18)
|
||||
|
||||
### ❌ Not Working
|
||||
**Curaleaf's Specific Age Gate:**
|
||||
- URL: `https://curaleaf.com/age-gate?returnurl=...`
|
||||
- Uses React-based custom dropdown (shadcn/radix UI)
|
||||
- Ignores cookies we set
|
||||
- Doesn't respond to automated clicks
|
||||
- Doesn't trigger navigation after state selection
|
||||
|
||||
**Why It Fails:**
|
||||
- Curaleaf's React implementation likely checks for:
|
||||
- Real user interaction patterns
|
||||
- Additional signals beyond cookies (session storage, local storage)
|
||||
- Specific event sequences that automation doesn't replicate
|
||||
- Automated clicks don't trigger the React state changes needed
|
||||
|
||||
## Test Results
|
||||
|
||||
```bash
|
||||
# Latest test - Cookie approach
|
||||
DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npx tsx test-improved-age-gate.ts
|
||||
```
|
||||
|
||||
**Result:** ❌ FAILED
|
||||
- Cookies set successfully
|
||||
- Page still redirects to `/age-gate`
|
||||
- Click automation finds elements but navigation doesn't occur
|
||||
|
||||
## Database State
|
||||
|
||||
**Categories created:** 10 categories for store_id 18 (Curaleaf - 48th Street)
|
||||
- Categories are in the database and ready
|
||||
- Dutchie menu detection works
|
||||
- Category URLs are properly formatted
|
||||
|
||||
**Brands:** Not tested yet (can't scrape without bypassing age gate)
|
||||
|
||||
## Files Modified
|
||||
|
||||
1. `/home/kelly/dutchie-menus/backend/src/utils/age-gate.ts` - Created
|
||||
2. `/home/kelly/dutchie-menus/backend/src/services/category-discovery.ts` - Updated
|
||||
3. `/home/kelly/dutchie-menus/backend/src/services/scraper.ts` - Updated
|
||||
4. `/home/kelly/dutchie-menus/backend/curaleaf-cookies.json` - Test cookies (don't work)
|
||||
|
||||
## Next Steps / Options
|
||||
|
||||
### Option 1: Test with Different Store
|
||||
Find a Dutchie dispensary with a simpler age gate that responds to automation.
|
||||
|
||||
### Option 2: Get Real Cookies
|
||||
Run `capture-age-gate-cookies.ts` on a machine with display/X11:
|
||||
```bash
|
||||
# On machine with display
|
||||
DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npx tsx capture-age-gate-cookies.ts
|
||||
# Manually complete age gate in browser
|
||||
# Press ENTER to capture real cookies
|
||||
# Copy cookies to production server
|
||||
```
|
||||
|
||||
### Option 3: Switch to Playwright
|
||||
Playwright may handle React apps better than Puppeteer. Consider migrating the age gate bypass to use Playwright.
|
||||
|
||||
### Option 4: Use Proxy with Session Persistence
|
||||
Use a rotating proxy service that maintains session state between requests.
|
||||
|
||||
### Option 5: Focus on Other Stores First
|
||||
Skip Curaleaf for now and implement scraping for dispensaries without complex age gates, then circle back.
|
||||
|
||||
## How to Continue
|
||||
|
||||
1. **Check if categories exist:**
|
||||
```sql
|
||||
SELECT * FROM categories WHERE store_id = 18;
|
||||
```
|
||||
|
||||
2. **Test scraping (will fail at age gate):**
|
||||
```bash
|
||||
# Via UI: Click "Scrape Store" button for Curaleaf
|
||||
# OR via test script
|
||||
```
|
||||
|
||||
3. **Try different store:**
|
||||
- Find another Dutchie dispensary
|
||||
- Add it to the stores table
|
||||
- Run category discovery
|
||||
- Test scraping
|
||||
|
||||
## Important Notes
|
||||
|
||||
- All cannabis dispensary sites will have age gates (as you correctly noted)
|
||||
- The bypass infrastructure is solid and should work for simpler gates
|
||||
- Curaleaf specifically uses advanced React patterns that resist automation
|
||||
- Categories ARE created successfully despite age gate (using fallback detection)
|
||||
- The scraper WILL work once we can bypass the age gate
|
||||
|
||||
## Key Insight
|
||||
|
||||
Category discovery succeeded because it checks the page source to detect Dutchie, then uses predefined categories. Product scraping requires actually viewing the product listings, which can't happen while stuck at the age gate.
|
||||
|
||||
## Commands to Resume
|
||||
|
||||
```bash
|
||||
# Start backend (if not running)
|
||||
cd /home/kelly/dutchie-menus/backend
|
||||
PORT=3012 DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npm run dev
|
||||
|
||||
# Check categories
|
||||
DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npx tsx -e "
|
||||
import { pool } from './src/db/migrate';
|
||||
const result = await pool.query('SELECT * FROM categories WHERE store_id = 18');
|
||||
console.log(result.rows);
|
||||
process.exit(0);
|
||||
"
|
||||
|
||||
# Test age gate bypass
|
||||
DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npx tsx test-improved-age-gate.ts
|
||||
```
|
||||
|
||||
## What to Tell Claude When Resuming
|
||||
|
||||
"Continue working on the age gate bypass for Curaleaf. We have categories created but can't scrape products because the age gate blocks us. The cookie approach didn't work. Options: try a different store, get real cookies, or switch to Playwright."
|
||||
71
backend/archive/add-curaleaf-az-stores.js
Normal file
71
backend/archive/add-curaleaf-az-stores.js
Normal file
@@ -0,0 +1,71 @@
|
||||
const { Pool } = require('pg');
|
||||
|
||||
const pool = new Pool({
|
||||
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
|
||||
});
|
||||
|
||||
const azStores = [
|
||||
{ slug: 'curaleaf-az-48th-street-med', name: 'Curaleaf - 48th Street (Medical)', city: 'Phoenix' },
|
||||
{ slug: 'curaleaf-az-83rd-dispensary-med', name: 'Curaleaf - 83rd Ave (Medical)', city: 'Phoenix' },
|
||||
{ slug: 'curaleaf-az-bell-med', name: 'Curaleaf - Bell (Medical)', city: 'Phoenix' },
|
||||
{ slug: 'curaleaf-az-camelback-med', name: 'Curaleaf - Camelback (Medical)', city: 'Phoenix' },
|
||||
{ slug: 'curaleaf-az-central-med', name: 'Curaleaf - Central (Medical)', city: 'Phoenix' },
|
||||
{ slug: 'curaleaf-az-gilbert-med', name: 'Curaleaf - Gilbert (Medical)', city: 'Gilbert' },
|
||||
{ slug: 'curaleaf-az-glendale-east', name: 'Curaleaf - Glendale East', city: 'Glendale' },
|
||||
{ slug: 'curaleaf-az-glendale-east-the-kind-relief-med', name: 'Curaleaf - Glendale East Kind Relief (Medical)', city: 'Glendale' },
|
||||
{ slug: 'curaleaf-az-glendale-med', name: 'Curaleaf - Glendale (Medical)', city: 'Glendale' },
|
||||
{ slug: 'curaleaf-az-midtown-med', name: 'Curaleaf - Midtown (Medical)', city: 'Phoenix' },
|
||||
{ slug: 'curaleaf-az-peoria-med', name: 'Curaleaf - Peoria (Medical)', city: 'Peoria' },
|
||||
{ slug: 'curaleaf-az-phoenix-med', name: 'Curaleaf - Phoenix Airport (Medical)', city: 'Phoenix' },
|
||||
{ slug: 'curaleaf-az-queen-creek', name: 'Curaleaf - Queen Creek', city: 'Queen Creek' },
|
||||
{ slug: 'curaleaf-az-queen-creek-whoa-qc-inc-med', name: 'Curaleaf - Queen Creek WHOA (Medical)', city: 'Queen Creek' },
|
||||
{ slug: 'curaleaf-az-scottsdale-natural-remedy-patient-center-med', name: 'Curaleaf - Scottsdale Natural Remedy (Medical)', city: 'Scottsdale' },
|
||||
{ slug: 'curaleaf-az-sedona-med', name: 'Curaleaf - Sedona (Medical)', city: 'Sedona' },
|
||||
{ slug: 'curaleaf-az-tucson-med', name: 'Curaleaf - Tucson (Medical)', city: 'Tucson' },
|
||||
{ slug: 'curaleaf-az-youngtown-med', name: 'Curaleaf - Youngtown (Medical)', city: 'Youngtown' }
|
||||
];
|
||||
|
||||
async function addStores() {
|
||||
const client = await pool.connect();
|
||||
try {
|
||||
for (const store of azStores) {
|
||||
const dutchieUrl = `https://curaleaf.com/stores/${store.slug}`;
|
||||
|
||||
// Skip sandbox stores
|
||||
if (store.slug.includes('sandbox')) continue;
|
||||
|
||||
const result = await client.query(`
|
||||
INSERT INTO stores (
|
||||
name,
|
||||
slug,
|
||||
dutchie_url,
|
||||
active,
|
||||
scrape_enabled,
|
||||
logo_url
|
||||
)
|
||||
VALUES ($1, $2, $3, $4, $5, $6)
|
||||
ON CONFLICT (slug) DO UPDATE
|
||||
SET name = $1, dutchie_url = $3
|
||||
RETURNING id, name
|
||||
`, [
|
||||
store.name,
|
||||
store.slug,
|
||||
dutchieUrl,
|
||||
true,
|
||||
true,
|
||||
'https://curaleaf.com/favicon.ico' // Using favicon as logo for now
|
||||
]);
|
||||
|
||||
console.log(`✅ Added: ${result.rows[0].name} (ID: ${result.rows[0].id})`);
|
||||
}
|
||||
|
||||
console.log(`\n🎉 Successfully added ${azStores.length - 1} Curaleaf Arizona stores!`);
|
||||
} catch (error) {
|
||||
console.error('❌ Error adding stores:', error);
|
||||
} finally {
|
||||
client.release();
|
||||
pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
addStores();
|
||||
71
backend/archive/add-curaleaf-docker.js
Normal file
71
backend/archive/add-curaleaf-docker.js
Normal file
@@ -0,0 +1,71 @@
|
||||
const { Pool } = require('pg');
|
||||
|
||||
const pool = new Pool({
|
||||
connectionString: 'postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus'
|
||||
});
|
||||
|
||||
const azStores = [
|
||||
{ slug: 'curaleaf-az-48th-street-med', name: 'Curaleaf - 48th Street (Medical)', city: 'Phoenix' },
|
||||
{ slug: 'curaleaf-az-83rd-dispensary-med', name: 'Curaleaf - 83rd Ave (Medical)', city: 'Phoenix' },
|
||||
{ slug: 'curaleaf-az-bell-med', name: 'Curaleaf - Bell (Medical)', city: 'Phoenix' },
|
||||
{ slug: 'curaleaf-az-camelback-med', name: 'Curaleaf - Camelback (Medical)', city: 'Phoenix' },
|
||||
{ slug: 'curaleaf-az-central-med', name: 'Curaleaf - Central (Medical)', city: 'Phoenix' },
|
||||
{ slug: 'curaleaf-az-gilbert-med', name: 'Curaleaf - Gilbert (Medical)', city: 'Gilbert' },
|
||||
{ slug: 'curaleaf-az-glendale-east', name: 'Curaleaf - Glendale East', city: 'Glendale' },
|
||||
{ slug: 'curaleaf-az-glendale-east-the-kind-relief-med', name: 'Curaleaf - Glendale East Kind Relief (Medical)', city: 'Glendale' },
|
||||
{ slug: 'curaleaf-az-glendale-med', name: 'Curaleaf - Glendale (Medical)', city: 'Glendale' },
|
||||
{ slug: 'curaleaf-az-midtown-med', name: 'Curaleaf - Midtown (Medical)', city: 'Phoenix' },
|
||||
{ slug: 'curaleaf-az-peoria-med', name: 'Curaleaf - Peoria (Medical)', city: 'Peoria' },
|
||||
{ slug: 'curaleaf-az-phoenix-med', name: 'Curaleaf - Phoenix Airport (Medical)', city: 'Phoenix' },
|
||||
{ slug: 'curaleaf-az-queen-creek', name: 'Curaleaf - Queen Creek', city: 'Queen Creek' },
|
||||
{ slug: 'curaleaf-az-queen-creek-whoa-qc-inc-med', name: 'Curaleaf - Queen Creek WHOA (Medical)', city: 'Queen Creek' },
|
||||
{ slug: 'curaleaf-az-scottsdale-natural-remedy-patient-center-med', name: 'Curaleaf - Scottsdale Natural Remedy (Medical)', city: 'Scottsdale' },
|
||||
{ slug: 'curaleaf-az-sedona-med', name: 'Curaleaf - Sedona (Medical)', city: 'Sedona' },
|
||||
{ slug: 'curaleaf-az-tucson-med', name: 'Curaleaf - Tucson (Medical)', city: 'Tucson' },
|
||||
{ slug: 'curaleaf-az-youngtown-med', name: 'Curaleaf - Youngtown (Medical)', city: 'Youngtown' }
|
||||
];
|
||||
|
||||
async function addStores() {
|
||||
const client = await pool.connect();
|
||||
try {
|
||||
for (const store of azStores) {
|
||||
const dutchieUrl = `https://curaleaf.com/stores/${store.slug}`;
|
||||
|
||||
// Skip sandbox stores
|
||||
if (store.slug.includes('sandbox')) continue;
|
||||
|
||||
const result = await client.query(`
|
||||
INSERT INTO stores (
|
||||
name,
|
||||
slug,
|
||||
dutchie_url,
|
||||
active,
|
||||
scrape_enabled,
|
||||
logo_url
|
||||
)
|
||||
VALUES ($1, $2, $3, $4, $5, $6)
|
||||
ON CONFLICT (slug) DO UPDATE
|
||||
SET name = $1, dutchie_url = $3, logo_url = $6
|
||||
RETURNING id, name
|
||||
`, [
|
||||
store.name,
|
||||
store.slug,
|
||||
dutchieUrl,
|
||||
true,
|
||||
true,
|
||||
'https://curaleaf.com/favicon.ico' // Using favicon as logo for now
|
||||
]);
|
||||
|
||||
console.log(`✅ Added: ${result.rows[0].name} (ID: ${result.rows[0].id})`);
|
||||
}
|
||||
|
||||
console.log(`\n🎉 Successfully added ${azStores.length} Curaleaf Arizona stores!`);
|
||||
} catch (error) {
|
||||
console.error('❌ Error adding stores:', error);
|
||||
} finally {
|
||||
client.release();
|
||||
pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
addStores();
|
||||
22
backend/archive/add-discount-columns.ts
Normal file
22
backend/archive/add-discount-columns.ts
Normal file
@@ -0,0 +1,22 @@
|
||||
import { pool } from './src/db/migrate.js';
|
||||
|
||||
async function main() {
|
||||
console.log('Adding discount columns to products table...');
|
||||
|
||||
try {
|
||||
await pool.query(`
|
||||
ALTER TABLE products
|
||||
ADD COLUMN IF NOT EXISTS discount_type VARCHAR(50),
|
||||
ADD COLUMN IF NOT EXISTS discount_value VARCHAR(100);
|
||||
`);
|
||||
|
||||
console.log('✅ Successfully added discount_type and discount_value columns');
|
||||
} catch (error: any) {
|
||||
console.error('❌ Error adding columns:', error.message);
|
||||
throw error;
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
main().catch(console.error);
|
||||
82
backend/archive/add-geo-fields.ts
Normal file
82
backend/archive/add-geo-fields.ts
Normal file
@@ -0,0 +1,82 @@
|
||||
import { pool } from './src/db/migrate';
|
||||
|
||||
async function addGeoFields() {
|
||||
console.log('🗺️ Adding geo-location fields...\n');
|
||||
|
||||
try {
|
||||
await pool.query(`
|
||||
ALTER TABLE stores
|
||||
ADD COLUMN IF NOT EXISTS latitude DECIMAL(10, 8),
|
||||
ADD COLUMN IF NOT EXISTS longitude DECIMAL(11, 8),
|
||||
ADD COLUMN IF NOT EXISTS region VARCHAR(100),
|
||||
ADD COLUMN IF NOT EXISTS market_area VARCHAR(255),
|
||||
ADD COLUMN IF NOT EXISTS timezone VARCHAR(50)
|
||||
`);
|
||||
|
||||
console.log('✅ Added geo fields to stores table');
|
||||
|
||||
// Create indexes for geo queries
|
||||
await pool.query(`
|
||||
CREATE INDEX IF NOT EXISTS idx_stores_location ON stores(latitude, longitude) WHERE latitude IS NOT NULL;
|
||||
CREATE INDEX IF NOT EXISTS idx_stores_city_state ON stores(city, state);
|
||||
CREATE INDEX IF NOT EXISTS idx_stores_region ON stores(region);
|
||||
CREATE INDEX IF NOT EXISTS idx_stores_market ON stores(market_area);
|
||||
`);
|
||||
|
||||
console.log('✅ Created geo indexes');
|
||||
|
||||
// Create location-based views
|
||||
await pool.query(`
|
||||
CREATE OR REPLACE VIEW stores_by_region AS
|
||||
SELECT
|
||||
region,
|
||||
state,
|
||||
COUNT(*) as store_count,
|
||||
COUNT(DISTINCT city) as cities,
|
||||
array_agg(DISTINCT name ORDER BY name) as store_names
|
||||
FROM stores
|
||||
WHERE active = true
|
||||
GROUP BY region, state
|
||||
ORDER BY store_count DESC
|
||||
`);
|
||||
|
||||
await pool.query(`
|
||||
CREATE OR REPLACE VIEW market_coverage AS
|
||||
SELECT
|
||||
city,
|
||||
state,
|
||||
zip,
|
||||
COUNT(*) as dispensaries,
|
||||
array_agg(name ORDER BY name) as store_names,
|
||||
COUNT(DISTINCT id) as unique_stores
|
||||
FROM stores
|
||||
WHERE active = true
|
||||
GROUP BY city, state, zip
|
||||
ORDER BY dispensaries DESC
|
||||
`);
|
||||
|
||||
console.log('✅ Created location views');
|
||||
|
||||
console.log('\n✅ Geo-location setup complete!');
|
||||
console.log('\n📊 Available location views:');
|
||||
console.log(' - stores_by_region: Stores grouped by region/state');
|
||||
console.log(' - market_coverage: Dispensary density by city');
|
||||
|
||||
console.log('\n💡 Your database now supports:');
|
||||
console.log(' ✅ Lead Generation (contact info + locations)');
|
||||
console.log(' ✅ Market Research (pricing + inventory data)');
|
||||
console.log(' ✅ Investment Planning (market coverage + trends)');
|
||||
console.log(' ✅ Retail Partner Discovery (store directory)');
|
||||
console.log(' ✅ Geo-targeted Campaigns (lat/long + regions)');
|
||||
console.log(' ✅ Trend Analysis (price history + timestamps)');
|
||||
console.log(' ✅ Directory/App Creation (full store catalog)');
|
||||
console.log(' ✅ Delivery Optimization (locations + addresses)');
|
||||
|
||||
} catch (error) {
|
||||
console.error('❌ Error:', error);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
addGeoFields();
|
||||
215
backend/archive/add-price-history.ts
Normal file
215
backend/archive/add-price-history.ts
Normal file
@@ -0,0 +1,215 @@
|
||||
import { pool } from './src/db/migrate';
|
||||
|
||||
async function addPriceHistory() {
|
||||
console.log('💰 Adding price history tracking...\n');
|
||||
|
||||
const client = await pool.connect();
|
||||
|
||||
try {
|
||||
await client.query('BEGIN');
|
||||
|
||||
// Step 1: Create price_history table
|
||||
console.log('1. Creating price_history table...');
|
||||
await client.query(`
|
||||
CREATE TABLE IF NOT EXISTS price_history (
|
||||
id SERIAL PRIMARY KEY,
|
||||
product_id INTEGER REFERENCES products(id) ON DELETE CASCADE,
|
||||
store_id INTEGER REFERENCES stores(id) ON DELETE CASCADE,
|
||||
brand_id INTEGER REFERENCES brands(id) ON DELETE SET NULL,
|
||||
category_id INTEGER REFERENCES categories(id) ON DELETE SET NULL,
|
||||
product_name VARCHAR(500),
|
||||
price DECIMAL(10, 2),
|
||||
sale_price DECIMAL(10, 2),
|
||||
original_price DECIMAL(10, 2),
|
||||
discount_percentage DECIMAL(5, 2),
|
||||
discount_amount DECIMAL(10, 2),
|
||||
in_stock BOOLEAN DEFAULT true,
|
||||
is_special BOOLEAN DEFAULT false,
|
||||
recorded_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||||
)
|
||||
`);
|
||||
|
||||
// Step 2: Add indexes for fast price queries
|
||||
console.log('2. Creating indexes for price queries...');
|
||||
await client.query(`
|
||||
CREATE INDEX IF NOT EXISTS idx_price_history_product ON price_history(product_id, recorded_at DESC);
|
||||
CREATE INDEX IF NOT EXISTS idx_price_history_store ON price_history(store_id, recorded_at DESC);
|
||||
CREATE INDEX IF NOT EXISTS idx_price_history_brand ON price_history(brand_id, recorded_at DESC);
|
||||
CREATE INDEX IF NOT EXISTS idx_price_history_date ON price_history(recorded_at DESC);
|
||||
CREATE INDEX IF NOT EXISTS idx_price_history_price_change ON price_history(product_id, price, recorded_at DESC);
|
||||
`);
|
||||
|
||||
// Step 3: Create function to log price changes
|
||||
console.log('3. Creating price change trigger function...');
|
||||
await client.query(`
|
||||
CREATE OR REPLACE FUNCTION log_price_change()
|
||||
RETURNS TRIGGER AS $$
|
||||
BEGIN
|
||||
-- Only log if price actually changed or product just appeared
|
||||
IF (TG_OP = 'INSERT') OR
|
||||
(OLD.price IS DISTINCT FROM NEW.price) OR
|
||||
(OLD.sale_price IS DISTINCT FROM NEW.sale_price) OR
|
||||
(OLD.discount_percentage IS DISTINCT FROM NEW.discount_percentage) OR
|
||||
(OLD.in_stock IS DISTINCT FROM NEW.in_stock) THEN
|
||||
|
||||
INSERT INTO price_history (
|
||||
product_id, store_id, brand_id, category_id, product_name,
|
||||
price, sale_price, original_price, discount_percentage, discount_amount,
|
||||
in_stock, is_special, recorded_at
|
||||
) VALUES (
|
||||
NEW.id, NEW.store_id, NEW.brand_id, NEW.category_id, NEW.name,
|
||||
NEW.price, NEW.sale_price, NEW.original_price,
|
||||
NEW.discount_percentage, NEW.discount_amount,
|
||||
NEW.in_stock, NEW.is_special, CURRENT_TIMESTAMP
|
||||
);
|
||||
END IF;
|
||||
|
||||
RETURN NEW;
|
||||
END;
|
||||
$$ LANGUAGE plpgsql;
|
||||
`);
|
||||
|
||||
// Step 4: Create trigger on products table
|
||||
console.log('4. Creating trigger on products table...');
|
||||
await client.query(`
|
||||
DROP TRIGGER IF EXISTS price_change_trigger ON products;
|
||||
CREATE TRIGGER price_change_trigger
|
||||
AFTER INSERT OR UPDATE ON products
|
||||
FOR EACH ROW
|
||||
EXECUTE FUNCTION log_price_change();
|
||||
`);
|
||||
|
||||
// Step 5: Populate initial price history from existing products
|
||||
console.log('5. Populating initial price history...');
|
||||
await client.query(`
|
||||
INSERT INTO price_history (
|
||||
product_id, store_id, brand_id, category_id, product_name,
|
||||
price, sale_price, original_price, discount_percentage, discount_amount,
|
||||
in_stock, is_special, recorded_at
|
||||
)
|
||||
SELECT
|
||||
id, store_id, brand_id, category_id, name,
|
||||
price, sale_price, original_price, discount_percentage, discount_amount,
|
||||
in_stock, is_special, COALESCE(first_seen_at, created_at)
|
||||
FROM products
|
||||
WHERE price IS NOT NULL
|
||||
ON CONFLICT DO NOTHING
|
||||
`);
|
||||
|
||||
const countResult = await client.query('SELECT COUNT(*) FROM price_history');
|
||||
console.log(` ✅ ${countResult.rows[0].count} price records created`);
|
||||
|
||||
// Step 6: Create helpful views for price monitoring
|
||||
console.log('6. Creating price monitoring views...');
|
||||
|
||||
// Price changes view
|
||||
await client.query(`
|
||||
CREATE OR REPLACE VIEW price_changes AS
|
||||
WITH price_with_previous AS (
|
||||
SELECT
|
||||
ph.*,
|
||||
LAG(ph.price) OVER (PARTITION BY ph.product_id ORDER BY ph.recorded_at) as previous_price,
|
||||
LAG(ph.recorded_at) OVER (PARTITION BY ph.product_id ORDER BY ph.recorded_at) as previous_date
|
||||
FROM price_history ph
|
||||
)
|
||||
SELECT
|
||||
pwp.product_id,
|
||||
pwp.product_name,
|
||||
s.name as store_name,
|
||||
b.name as brand_name,
|
||||
pwp.previous_price,
|
||||
pwp.price as current_price,
|
||||
pwp.price - pwp.previous_price as price_change,
|
||||
ROUND(((pwp.price - pwp.previous_price) / NULLIF(pwp.previous_price, 0) * 100)::numeric, 2) as price_change_percent,
|
||||
pwp.previous_date,
|
||||
pwp.recorded_at as current_date
|
||||
FROM price_with_previous pwp
|
||||
JOIN stores s ON pwp.store_id = s.id
|
||||
LEFT JOIN brands b ON pwp.brand_id = b.id
|
||||
WHERE pwp.previous_price IS NOT NULL
|
||||
AND pwp.price IS DISTINCT FROM pwp.previous_price
|
||||
ORDER BY pwp.recorded_at DESC
|
||||
`);
|
||||
|
||||
// Current prices view
|
||||
await client.query(`
|
||||
CREATE OR REPLACE VIEW current_prices AS
|
||||
SELECT DISTINCT ON (product_id)
|
||||
ph.product_id,
|
||||
ph.product_name,
|
||||
s.name as store_name,
|
||||
b.name as brand_name,
|
||||
c.name as category_name,
|
||||
ph.price,
|
||||
ph.sale_price,
|
||||
ph.discount_percentage,
|
||||
ph.in_stock,
|
||||
ph.is_special,
|
||||
ph.recorded_at as last_updated
|
||||
FROM price_history ph
|
||||
JOIN stores s ON ph.store_id = s.id
|
||||
LEFT JOIN brands b ON ph.brand_id = b.id
|
||||
LEFT JOIN categories c ON ph.category_id = c.id
|
||||
ORDER BY ph.product_id, ph.recorded_at DESC
|
||||
`);
|
||||
|
||||
// Price trends view (last 30 days)
|
||||
await client.query(`
|
||||
CREATE OR REPLACE VIEW price_trends_30d AS
|
||||
WITH price_data AS (
|
||||
SELECT
|
||||
ph.product_id,
|
||||
ph.product_name,
|
||||
s.name as store_name,
|
||||
ph.price,
|
||||
ph.recorded_at,
|
||||
ROW_NUMBER() OVER (PARTITION BY ph.product_id ORDER BY ph.recorded_at ASC) as first_row,
|
||||
ROW_NUMBER() OVER (PARTITION BY ph.product_id ORDER BY ph.recorded_at DESC) as last_row
|
||||
FROM price_history ph
|
||||
JOIN stores s ON ph.store_id = s.id
|
||||
WHERE ph.recorded_at >= CURRENT_DATE - INTERVAL '30 days'
|
||||
)
|
||||
SELECT
|
||||
product_id,
|
||||
product_name,
|
||||
store_name,
|
||||
COUNT(*) as price_points,
|
||||
MIN(price) as min_price,
|
||||
MAX(price) as max_price,
|
||||
ROUND(AVG(price)::numeric, 2) as avg_price,
|
||||
MAX(CASE WHEN first_row = 1 THEN price END) as starting_price,
|
||||
MAX(CASE WHEN last_row = 1 THEN price END) as current_price
|
||||
FROM price_data
|
||||
GROUP BY product_id, product_name, store_name
|
||||
`);
|
||||
|
||||
await client.query('COMMIT');
|
||||
|
||||
console.log('\n✅ Price history tracking enabled!');
|
||||
console.log('\n📊 Available price monitoring views:');
|
||||
console.log(' - price_changes: See all price increases/decreases');
|
||||
console.log(' - current_prices: Latest price for each product');
|
||||
console.log(' - price_trends_30d: Price trends over last 30 days');
|
||||
|
||||
console.log('\n💡 Example queries:');
|
||||
console.log(' -- Recent price increases:');
|
||||
console.log(' SELECT * FROM price_changes WHERE price_change > 0 ORDER BY current_date DESC LIMIT 10;');
|
||||
console.log(' ');
|
||||
console.log(' -- Products on sale:');
|
||||
console.log(' SELECT * FROM current_prices WHERE sale_price IS NOT NULL;');
|
||||
console.log(' ');
|
||||
console.log(' -- Biggest price drops:');
|
||||
console.log(' SELECT * FROM price_changes WHERE price_change < 0 ORDER BY price_change ASC LIMIT 10;');
|
||||
|
||||
} catch (error) {
|
||||
await client.query('ROLLBACK');
|
||||
console.error('❌ Error:', error);
|
||||
throw error;
|
||||
} finally {
|
||||
client.release();
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
addPriceHistory();
|
||||
45
backend/archive/add-proxy-location-columns.ts
Normal file
45
backend/archive/add-proxy-location-columns.ts
Normal file
@@ -0,0 +1,45 @@
|
||||
import { Pool } from 'pg';
|
||||
|
||||
const pool = new Pool({
|
||||
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
|
||||
});
|
||||
|
||||
async function addLocationColumns() {
|
||||
try {
|
||||
console.log('Adding location columns to proxies table...\n');
|
||||
|
||||
await pool.query(`
|
||||
ALTER TABLE proxies
|
||||
ADD COLUMN IF NOT EXISTS city VARCHAR(100),
|
||||
ADD COLUMN IF NOT EXISTS state VARCHAR(100),
|
||||
ADD COLUMN IF NOT EXISTS country VARCHAR(100),
|
||||
ADD COLUMN IF NOT EXISTS country_code VARCHAR(10),
|
||||
ADD COLUMN IF NOT EXISTS latitude DECIMAL(10, 8),
|
||||
ADD COLUMN IF NOT EXISTS longitude DECIMAL(11, 8)
|
||||
`);
|
||||
|
||||
console.log('✅ Location columns added successfully\n');
|
||||
|
||||
// Show updated schema
|
||||
const result = await pool.query(`
|
||||
SELECT column_name, data_type
|
||||
FROM information_schema.columns
|
||||
WHERE table_name = 'proxies'
|
||||
ORDER BY ordinal_position
|
||||
`);
|
||||
|
||||
console.log('Updated proxies table schema:');
|
||||
console.log('─'.repeat(60));
|
||||
result.rows.forEach(row => {
|
||||
console.log(` ${row.column_name}: ${row.data_type}`);
|
||||
});
|
||||
console.log('─'.repeat(60));
|
||||
|
||||
} catch (error: any) {
|
||||
console.error('Error:', error.message);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
addLocationColumns();
|
||||
74
backend/archive/add-sol-flower-stores.ts
Normal file
74
backend/archive/add-sol-flower-stores.ts
Normal file
@@ -0,0 +1,74 @@
|
||||
import { pool } from './src/db/migrate';
|
||||
|
||||
const solFlowerStores = [
|
||||
{
|
||||
name: 'Sol Flower - Sun City',
|
||||
slug: 'sol-flower-sun-city',
|
||||
dutchie_url: 'https://dutchie.com/dispensary/sol-flower-dispensary',
|
||||
},
|
||||
{
|
||||
name: 'Sol Flower - South Tucson',
|
||||
slug: 'sol-flower-south-tucson',
|
||||
dutchie_url: 'https://dutchie.com/dispensary/sol-flower-dispensary-south-tucson',
|
||||
},
|
||||
{
|
||||
name: 'Sol Flower - North Tucson',
|
||||
slug: 'sol-flower-north-tucson',
|
||||
dutchie_url: 'https://dutchie.com/dispensary/sol-flower-dispensary-north-tucson',
|
||||
},
|
||||
{
|
||||
name: 'Sol Flower - McClintock (Tempe)',
|
||||
slug: 'sol-flower-mcclintock',
|
||||
dutchie_url: 'https://dutchie.com/dispensary/sol-flower-dispensary-mcclintock',
|
||||
},
|
||||
{
|
||||
name: 'Sol Flower - Deer Valley (Phoenix)',
|
||||
slug: 'sol-flower-deer-valley',
|
||||
dutchie_url: 'https://dutchie.com/dispensary/sol-flower-dispensary-deer-valley',
|
||||
},
|
||||
];
|
||||
|
||||
async function addSolFlowerStores() {
|
||||
console.log('🌻 Adding Sol Flower stores to database...\n');
|
||||
|
||||
try {
|
||||
for (const store of solFlowerStores) {
|
||||
// Check if store already exists
|
||||
const existing = await pool.query(
|
||||
'SELECT id FROM stores WHERE slug = $1',
|
||||
[store.slug]
|
||||
);
|
||||
|
||||
if (existing.rows.length > 0) {
|
||||
console.log(`⏭️ Skipping ${store.name} - already exists (ID: ${existing.rows[0].id})`);
|
||||
continue;
|
||||
}
|
||||
|
||||
// Insert store
|
||||
const result = await pool.query(
|
||||
`INSERT INTO stores (name, slug, dutchie_url, active, scrape_enabled, logo_url)
|
||||
VALUES ($1, $2, $3, true, true, $4)
|
||||
RETURNING id`,
|
||||
[store.name, store.slug, store.dutchie_url, 'https://dutchie.com/favicon.ico']
|
||||
);
|
||||
|
||||
console.log(`✅ Added ${store.name} (ID: ${result.rows[0].id})`);
|
||||
}
|
||||
|
||||
console.log('\n✅ All Sol Flower stores added successfully!');
|
||||
|
||||
// Show all stores
|
||||
console.log('\n📊 All stores in database:');
|
||||
const allStores = await pool.query(
|
||||
'SELECT id, name, dutchie_url FROM stores ORDER BY id'
|
||||
);
|
||||
console.table(allStores.rows);
|
||||
|
||||
} catch (error) {
|
||||
console.error('❌ Error:', error);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
addSolFlowerStores();
|
||||
90
backend/archive/add-test-brands.ts
Normal file
90
backend/archive/add-test-brands.ts
Normal file
@@ -0,0 +1,90 @@
|
||||
import { Pool } from 'pg';
|
||||
|
||||
const pool = new Pool({
|
||||
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
|
||||
});
|
||||
|
||||
async function addTestBrands() {
|
||||
try {
|
||||
// Update the store slug to match the local URL
|
||||
console.log('Updating store slug...');
|
||||
await pool.query(
|
||||
`UPDATE stores
|
||||
SET slug = $1,
|
||||
dutchie_url = $2,
|
||||
updated_at = NOW()
|
||||
WHERE slug = $3`,
|
||||
[
|
||||
'curaleaf-az-48th-street',
|
||||
'https://curaleaf.com/stores/curaleaf-dispensary-48th-street',
|
||||
'curaleaf-az-48th-street-med'
|
||||
]
|
||||
);
|
||||
console.log('✓ Store slug updated\n');
|
||||
|
||||
// Get the store ID
|
||||
const storeResult = await pool.query(
|
||||
'SELECT id FROM stores WHERE slug = $1',
|
||||
['curaleaf-az-48th-street']
|
||||
);
|
||||
|
||||
if (storeResult.rows.length === 0) {
|
||||
console.log('Store not found!');
|
||||
return;
|
||||
}
|
||||
|
||||
const storeId = storeResult.rows[0].id;
|
||||
|
||||
// Sample products with brands commonly found at dispensaries
|
||||
const testProducts = [
|
||||
{ name: 'Select Elite Live Resin Cartridge - Clementine', brand: 'Select', price: 45.00, category: 'vape-pens', thc: 82.5 },
|
||||
{ name: 'Curaleaf Flower - Blue Dream', brand: 'Curaleaf', price: 35.00, category: 'flower', thc: 22.0 },
|
||||
{ name: 'Grassroots RSO Syringe', brand: 'Grassroots', price: 40.00, category: 'concentrates', thc: 75.0 },
|
||||
{ name: 'Stiiizy Pod - Skywalker OG', brand: 'Stiiizy', price: 50.00, category: 'vape-pens', thc: 85.0 },
|
||||
{ name: 'Cookies Flower - Gary Payton', brand: 'Cookies', price: 55.00, category: 'flower', thc: 28.0 },
|
||||
{ name: 'Raw Garden Live Resin - Wedding Cake', brand: 'Raw Garden', price: 48.00, category: 'concentrates', thc: 80.5 },
|
||||
{ name: 'Jeeter Pre-Roll - Zkittlez', brand: 'Jeeter', price: 12.00, category: 'pre-rolls', thc: 24.0 },
|
||||
{ name: 'Kiva Camino Gummies - Wild Cherry', brand: 'Kiva', price: 20.00, category: 'edibles', thc: 5.0 },
|
||||
{ name: 'Wyld Gummies - Raspberry', brand: 'Wyld', price: 18.00, category: 'edibles', thc: 10.0 },
|
||||
{ name: 'Papa & Barkley Releaf Balm', brand: 'Papa & Barkley', price: 45.00, category: 'topicals', thc: 3.0 },
|
||||
{ name: 'Brass Knuckles Cartridge - Gorilla Glue', brand: 'Brass Knuckles', price: 42.00, category: 'vape-pens', thc: 83.0 },
|
||||
{ name: 'Heavy Hitters Ultra Extract - Sour Diesel', brand: 'Heavy Hitters', price: 55.00, category: 'concentrates', thc: 90.0 },
|
||||
{ name: 'Cresco Liquid Live Resin - Pineapple Express', brand: 'Cresco', price: 50.00, category: 'vape-pens', thc: 87.0 },
|
||||
{ name: 'Verano Pre-Roll - Mag Landrace', brand: 'Verano', price: 15.00, category: 'pre-rolls', thc: 26.0 },
|
||||
{ name: 'Select Nano Gummies - Watermelon', brand: 'Select', price: 22.00, category: 'edibles', thc: 10.0 }
|
||||
];
|
||||
|
||||
console.log(`Inserting ${testProducts.length} test products with brands...`);
|
||||
console.log('─'.repeat(80));
|
||||
|
||||
for (const product of testProducts) {
|
||||
await pool.query(
|
||||
`INSERT INTO products (
|
||||
store_id, name, brand, price, thc_percentage,
|
||||
dutchie_url, in_stock
|
||||
)
|
||||
VALUES ($1, $2, $3, $4, $5, $6, true)`,
|
||||
[
|
||||
storeId,
|
||||
product.name,
|
||||
product.brand,
|
||||
product.price,
|
||||
product.thc,
|
||||
`https://curaleaf.com/stores/curaleaf-dispensary-48th-street/product/${product.name.toLowerCase().replace(/\s+/g, '-')}`
|
||||
]
|
||||
);
|
||||
console.log(`✓ ${product.brand} - ${product.name}`);
|
||||
}
|
||||
|
||||
console.log('─'.repeat(80));
|
||||
console.log(`\n✅ Added ${testProducts.length} test products with brands to the store\n`);
|
||||
console.log(`View at: http://localhost:5174/stores/az/curaleaf/curaleaf-az-48th-street\n`);
|
||||
|
||||
} catch (error: any) {
|
||||
console.error('Error:', error.message);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
addTestBrands();
|
||||
23
backend/archive/cancel-pending-job.js
Normal file
23
backend/archive/cancel-pending-job.js
Normal file
@@ -0,0 +1,23 @@
|
||||
const { Pool } = require('pg');
|
||||
|
||||
const pool = new Pool({
|
||||
connectionString: process.env.DATABASE_URL || 'postgresql://kelly:kelly@localhost:5432/hub'
|
||||
});
|
||||
|
||||
(async () => {
|
||||
try {
|
||||
await pool.query(`
|
||||
UPDATE proxy_test_jobs
|
||||
SET status = 'cancelled',
|
||||
completed_at = CURRENT_TIMESTAMP,
|
||||
updated_at = CURRENT_TIMESTAMP
|
||||
WHERE id = 2
|
||||
`);
|
||||
|
||||
console.log('✅ Cancelled job ID 2');
|
||||
process.exit(0);
|
||||
} catch (error) {
|
||||
console.error('Error:', error);
|
||||
process.exit(1);
|
||||
}
|
||||
})();
|
||||
53
backend/archive/capture-age-gate-cookies.ts
Normal file
53
backend/archive/capture-age-gate-cookies.ts
Normal file
@@ -0,0 +1,53 @@
|
||||
import puppeteer from 'puppeteer-extra';
|
||||
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
|
||||
import { writeFileSync } from 'fs';
|
||||
|
||||
puppeteer.use(StealthPlugin());
|
||||
|
||||
async function captureAgeGateCookies() {
|
||||
const browser = await puppeteer.launch({
|
||||
headless: false, // Visible browser so you can complete age gate manually
|
||||
args: [
|
||||
'--no-sandbox',
|
||||
'--disable-setuid-sandbox',
|
||||
'--disable-dev-shm-usage',
|
||||
'--disable-blink-features=AutomationControlled'
|
||||
]
|
||||
});
|
||||
|
||||
const page = await browser.newPage();
|
||||
await page.setViewport({ width: 1920, height: 1080 });
|
||||
|
||||
console.log('\n===========================================');
|
||||
console.log('INSTRUCTIONS:');
|
||||
console.log('1. A browser window will open');
|
||||
console.log('2. Complete the age gate manually');
|
||||
console.log('3. Wait until you see the store page load');
|
||||
console.log('4. Press ENTER in this terminal when done');
|
||||
console.log('===========================================\n');
|
||||
|
||||
await page.goto('https://curaleaf.com/stores/curaleaf-az-48th-street');
|
||||
|
||||
// Wait for user to complete age gate manually
|
||||
await new Promise((resolve) => {
|
||||
process.stdin.once('data', () => resolve(null));
|
||||
});
|
||||
|
||||
// Get cookies after age gate
|
||||
const cookies = await page.cookies();
|
||||
console.log('\nCaptured cookies:', JSON.stringify(cookies, null, 2));
|
||||
|
||||
// Save cookies to file
|
||||
writeFileSync(
|
||||
'/home/kelly/dutchie-menus/backend/curaleaf-cookies.json',
|
||||
JSON.stringify(cookies, null, 2)
|
||||
);
|
||||
|
||||
console.log('\n✅ Cookies saved to curaleaf-cookies.json');
|
||||
console.log('Current URL:', page.url());
|
||||
|
||||
await browser.close();
|
||||
process.exit(0);
|
||||
}
|
||||
|
||||
captureAgeGateCookies();
|
||||
27
backend/archive/check-48th-store.ts
Normal file
27
backend/archive/check-48th-store.ts
Normal file
@@ -0,0 +1,27 @@
|
||||
import { Pool } from 'pg';
|
||||
|
||||
const pool = new Pool({
|
||||
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
|
||||
});
|
||||
|
||||
async function check() {
|
||||
try {
|
||||
const result = await pool.query("SELECT id, name, slug, dutchie_url FROM stores WHERE slug LIKE '%48th%'");
|
||||
console.log('Stores with "48th" in slug:');
|
||||
console.log('─'.repeat(80));
|
||||
result.rows.forEach(store => {
|
||||
console.log(`ID: ${store.id}`);
|
||||
console.log(`Name: ${store.name}`);
|
||||
console.log(`Slug: ${store.slug}`);
|
||||
console.log(`URL: ${store.dutchie_url}`);
|
||||
console.log('─'.repeat(80));
|
||||
});
|
||||
console.log(`Total: ${result.rowCount}`);
|
||||
} catch (error: any) {
|
||||
console.error('Error:', error.message);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
check();
|
||||
23
backend/archive/check-brand-jobs.ts
Normal file
23
backend/archive/check-brand-jobs.ts
Normal file
@@ -0,0 +1,23 @@
|
||||
import { pool } from './src/db/migrate.js';
|
||||
|
||||
async function main() {
|
||||
const result = await pool.query(`
|
||||
SELECT id, brand_name, status, products_scraped
|
||||
FROM brand_jobs
|
||||
WHERE dispensary_id = 112
|
||||
AND status = 'completed'
|
||||
ORDER BY products_scraped DESC
|
||||
LIMIT 10
|
||||
`);
|
||||
|
||||
console.log('\nCompleted Brand Jobs:');
|
||||
console.log('='.repeat(80));
|
||||
result.rows.forEach((row: any) => {
|
||||
console.log(`${row.id}: ${row.brand_name} - ${row.products_scraped} products`);
|
||||
});
|
||||
console.log('='.repeat(80) + '\n');
|
||||
|
||||
await pool.end();
|
||||
}
|
||||
|
||||
main().catch(console.error);
|
||||
22
backend/archive/check-brand-names.ts
Normal file
22
backend/archive/check-brand-names.ts
Normal file
@@ -0,0 +1,22 @@
|
||||
import { pool } from './src/db/migrate.js';
|
||||
|
||||
async function checkBrandNames() {
|
||||
const result = await pool.query(`
|
||||
SELECT brand_slug, brand_name
|
||||
FROM brand_scrape_jobs
|
||||
WHERE dispensary_id = 112
|
||||
ORDER BY id
|
||||
LIMIT 20
|
||||
`);
|
||||
|
||||
console.log('\nBrand Names in Database:');
|
||||
console.log('='.repeat(60));
|
||||
result.rows.forEach((row, idx) => {
|
||||
console.log(`${idx + 1}. slug: ${row.brand_slug}`);
|
||||
console.log(` name: ${row.brand_name}`);
|
||||
});
|
||||
|
||||
await pool.end();
|
||||
}
|
||||
|
||||
checkBrandNames().catch(console.error);
|
||||
17
backend/archive/check-brands-table.ts
Normal file
17
backend/archive/check-brands-table.ts
Normal file
@@ -0,0 +1,17 @@
|
||||
import { pool } from './src/db/migrate';
|
||||
|
||||
async function checkTable() {
|
||||
const result = await pool.query(`
|
||||
SELECT column_name, data_type, is_nullable
|
||||
FROM information_schema.columns
|
||||
WHERE table_name = 'brands'
|
||||
ORDER BY ordinal_position
|
||||
`);
|
||||
|
||||
console.log('Brands table structure:');
|
||||
console.table(result.rows);
|
||||
|
||||
await pool.end();
|
||||
}
|
||||
|
||||
checkTable();
|
||||
55
backend/archive/check-brands.ts
Normal file
55
backend/archive/check-brands.ts
Normal file
@@ -0,0 +1,55 @@
|
||||
import pg from 'pg';
|
||||
const { Pool } = pg;
|
||||
|
||||
const pool = new Pool({
|
||||
connectionString: process.env.DATABASE_URL
|
||||
});
|
||||
|
||||
async function checkBrands() {
|
||||
try {
|
||||
// Get dispensary info
|
||||
const dispensaryResult = await pool.query(
|
||||
"SELECT id, name FROM dispensaries WHERE dutchie_slug = 'AZ-Deeply-Rooted'"
|
||||
);
|
||||
|
||||
if (dispensaryResult.rows.length === 0) {
|
||||
console.log('Dispensary not found');
|
||||
return;
|
||||
}
|
||||
|
||||
const dispensary = dispensaryResult.rows[0];
|
||||
console.log(`Dispensary: ${dispensary.name} (ID: ${dispensary.id})`);
|
||||
|
||||
// Get brand count
|
||||
const brandCountResult = await pool.query(
|
||||
`SELECT COUNT(DISTINCT brand) as brand_count
|
||||
FROM products
|
||||
WHERE dispensary_id = $1 AND brand IS NOT NULL AND brand != ''`,
|
||||
[dispensary.id]
|
||||
);
|
||||
|
||||
console.log(`\nTotal distinct brands: ${brandCountResult.rows[0].brand_count}`);
|
||||
|
||||
// List all brands
|
||||
const brandsResult = await pool.query(
|
||||
`SELECT DISTINCT brand, COUNT(*) as product_count
|
||||
FROM products
|
||||
WHERE dispensary_id = $1 AND brand IS NOT NULL AND brand != ''
|
||||
GROUP BY brand
|
||||
ORDER BY brand`,
|
||||
[dispensary.id]
|
||||
);
|
||||
|
||||
console.log(`\nBrands found:`);
|
||||
brandsResult.rows.forEach(row => {
|
||||
console.log(` - ${row.brand} (${row.product_count} products)`);
|
||||
});
|
||||
|
||||
} catch (error) {
|
||||
console.error('Error:', error);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
checkBrands();
|
||||
27
backend/archive/check-db.ts
Normal file
27
backend/archive/check-db.ts
Normal file
@@ -0,0 +1,27 @@
|
||||
import { Pool } from 'pg';
|
||||
|
||||
const pool = new Pool({
|
||||
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
|
||||
});
|
||||
|
||||
async function checkDB() {
|
||||
try {
|
||||
const result = await pool.query(`
|
||||
SELECT COUNT(*) as total,
|
||||
COUNT(*) FILTER (WHERE active = true) as active
|
||||
FROM proxies
|
||||
`);
|
||||
|
||||
console.log('Proxies:', result.rows[0]);
|
||||
|
||||
const stores = await pool.query('SELECT slug FROM stores LIMIT 5');
|
||||
console.log('Sample stores:', stores.rows.map(r => r.slug));
|
||||
|
||||
} catch (error: any) {
|
||||
console.error('Error:', error.message);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
checkDB();
|
||||
74
backend/archive/check-deeply-rooted.ts
Normal file
74
backend/archive/check-deeply-rooted.ts
Normal file
@@ -0,0 +1,74 @@
|
||||
import pg from 'pg';
|
||||
const { Pool } = pg;
|
||||
|
||||
const pool = new Pool({
|
||||
connectionString: process.env.DATABASE_URL
|
||||
});
|
||||
|
||||
async function check() {
|
||||
try {
|
||||
// Get dispensary info - try different column names
|
||||
const dispensaryResult = await pool.query(
|
||||
"SELECT * FROM dispensaries WHERE name ILIKE '%Deeply Rooted%' LIMIT 1"
|
||||
);
|
||||
|
||||
if (dispensaryResult.rows.length === 0) {
|
||||
console.log('Dispensary not found. Listing all dispensaries:');
|
||||
const all = await pool.query("SELECT id, name FROM dispensaries LIMIT 10");
|
||||
all.rows.forEach(d => console.log(` ID ${d.id}: ${d.name}`));
|
||||
return;
|
||||
}
|
||||
|
||||
const dispensary = dispensaryResult.rows[0];
|
||||
console.log(`Dispensary: ${dispensary.name} (ID: ${dispensary.id})`);
|
||||
console.log(`Columns:`, Object.keys(dispensary));
|
||||
|
||||
// Get product count
|
||||
const productCountResult = await pool.query(
|
||||
`SELECT COUNT(*) as total_products FROM products WHERE dispensary_id = $1`,
|
||||
[dispensary.id]
|
||||
);
|
||||
console.log(`\nTotal products: ${productCountResult.rows[0].total_products}`);
|
||||
|
||||
// Get brand count and list
|
||||
const brandCountResult = await pool.query(
|
||||
`SELECT COUNT(DISTINCT brand) as brand_count
|
||||
FROM products
|
||||
WHERE dispensary_id = $1 AND brand IS NOT NULL AND brand != ''`,
|
||||
[dispensary.id]
|
||||
);
|
||||
|
||||
console.log(`Total distinct brands: ${brandCountResult.rows[0].brand_count}`);
|
||||
|
||||
// List all brands
|
||||
const brandsResult = await pool.query(
|
||||
`SELECT DISTINCT brand, COUNT(*) as product_count
|
||||
FROM products
|
||||
WHERE dispensary_id = $1 AND brand IS NOT NULL AND brand != ''
|
||||
GROUP BY brand
|
||||
ORDER BY product_count DESC`,
|
||||
[dispensary.id]
|
||||
);
|
||||
|
||||
console.log(`\nBrands with products:`);
|
||||
brandsResult.rows.forEach(row => {
|
||||
console.log(` - ${row.brand} (${row.product_count} products)`);
|
||||
});
|
||||
|
||||
// Count products without brands
|
||||
const noBrandResult = await pool.query(
|
||||
`SELECT COUNT(*) as no_brand_count
|
||||
FROM products
|
||||
WHERE dispensary_id = $1 AND (brand IS NULL OR brand = '')`,
|
||||
[dispensary.id]
|
||||
);
|
||||
console.log(`\nProducts without brand: ${noBrandResult.rows[0].no_brand_count}`);
|
||||
|
||||
} catch (error) {
|
||||
console.error('Error:', error);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
check();
|
||||
33
backend/archive/check-discounts.ts
Normal file
33
backend/archive/check-discounts.ts
Normal file
@@ -0,0 +1,33 @@
|
||||
import pkg from 'pg';
|
||||
const { Pool } = pkg;
|
||||
|
||||
const pool = new Pool({
|
||||
connectionString: process.env.DATABASE_URL
|
||||
});
|
||||
|
||||
async function main() {
|
||||
const result = await pool.query(`
|
||||
SELECT
|
||||
COUNT(*) as total_products,
|
||||
COUNT(CASE WHEN discount_type IS NOT NULL AND discount_value IS NOT NULL THEN 1 END) as products_with_discounts
|
||||
FROM products
|
||||
`);
|
||||
|
||||
console.log('Product Count:');
|
||||
console.log(result.rows[0]);
|
||||
|
||||
// Get a sample of products with discounts
|
||||
const sample = await pool.query(`
|
||||
SELECT name, brand, regular_price, sale_price, discount_type, discount_value
|
||||
FROM products
|
||||
WHERE discount_type IS NOT NULL AND discount_value IS NOT NULL
|
||||
LIMIT 5
|
||||
`);
|
||||
|
||||
console.log('\nSample Products with Discounts:');
|
||||
console.log(sample.rows);
|
||||
|
||||
await pool.end();
|
||||
}
|
||||
|
||||
main().catch(console.error);
|
||||
25
backend/archive/check-enrichment-progress.ts
Normal file
25
backend/archive/check-enrichment-progress.ts
Normal file
@@ -0,0 +1,25 @@
|
||||
import { pool } from './src/db/migrate.js';
|
||||
|
||||
async function main() {
|
||||
const result = await pool.query(`
|
||||
SELECT
|
||||
COUNT(*) as total,
|
||||
COUNT(regular_price) as with_prices,
|
||||
COUNT(*) - COUNT(regular_price) as without_prices
|
||||
FROM products
|
||||
WHERE dispensary_id = 112
|
||||
`);
|
||||
|
||||
const stats = result.rows[0];
|
||||
const pct = ((parseInt(stats.with_prices) / parseInt(stats.total)) * 100).toFixed(1);
|
||||
|
||||
console.log('\nENRICHMENT PROGRESS:');
|
||||
console.log(` Total products: ${stats.total}`);
|
||||
console.log(` With prices: ${stats.with_prices} (${pct}%)`);
|
||||
console.log(` Without prices: ${stats.without_prices}`);
|
||||
console.log('');
|
||||
|
||||
await pool.end();
|
||||
}
|
||||
|
||||
main().catch(console.error);
|
||||
89
backend/archive/check-for-prices.ts
Normal file
89
backend/archive/check-for-prices.ts
Normal file
@@ -0,0 +1,89 @@
|
||||
import { firefox } from 'playwright';
|
||||
import { getRandomProxy } from './src/utils/proxyManager.js';
|
||||
|
||||
async function checkForPrices() {
|
||||
const proxy = await getRandomProxy();
|
||||
if (!proxy) {
|
||||
console.log('No proxy available');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const browser = await firefox.launch({ headless: true });
|
||||
const context = await browser.newContext({
|
||||
viewport: { width: 1920, height: 1080 },
|
||||
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
|
||||
proxy: {
|
||||
server: proxy.server,
|
||||
username: proxy.username,
|
||||
password: proxy.password
|
||||
}
|
||||
});
|
||||
|
||||
const page = await context.newPage();
|
||||
|
||||
const brandUrl = 'https://dutchie.com/embedded-menu/AZ-Deeply-Rooted/brands/alien-labs';
|
||||
console.log(`Loading: ${brandUrl}`);
|
||||
|
||||
await page.goto(brandUrl, { waitUntil: 'domcontentloaded', timeout: 60000 });
|
||||
await page.waitForTimeout(5000);
|
||||
|
||||
// Check for any dollar signs on the entire page
|
||||
const pageText = await page.evaluate(() => document.body.textContent);
|
||||
const hasDollarSigns = pageText?.includes('$');
|
||||
|
||||
console.log('\n' + '='.repeat(80));
|
||||
console.log('PRICE AVAILABILITY CHECK:');
|
||||
console.log('='.repeat(80));
|
||||
console.log(`\nPage contains '$' symbol: ${hasDollarSigns ? 'YES' : 'NO'}`);
|
||||
|
||||
if (hasDollarSigns) {
|
||||
// Find all text containing dollar signs
|
||||
const priceElements = await page.evaluate(() => {
|
||||
const walker = document.createTreeWalker(
|
||||
document.body,
|
||||
NodeFilter.SHOW_TEXT,
|
||||
null
|
||||
);
|
||||
|
||||
const results: string[] = [];
|
||||
let node;
|
||||
|
||||
while (node = walker.nextNode()) {
|
||||
const text = node.textContent?.trim();
|
||||
if (text && text.includes('$')) {
|
||||
results.push(text);
|
||||
}
|
||||
}
|
||||
|
||||
return results.slice(0, 10); // First 10 instances
|
||||
});
|
||||
|
||||
console.log('\nText containing "$":');
|
||||
priceElements.forEach((text, idx) => {
|
||||
console.log(` ${idx + 1}. ${text.substring(0, 100)}`);
|
||||
});
|
||||
}
|
||||
|
||||
// Check specifically within product cards
|
||||
const productCardPrices = await page.evaluate(() => {
|
||||
const cards = Array.from(document.querySelectorAll('a[href*="/product/"]'));
|
||||
return cards.slice(0, 5).map(card => ({
|
||||
text: card.textContent?.substring(0, 200),
|
||||
hasDollar: card.textContent?.includes('$') || false
|
||||
}));
|
||||
});
|
||||
|
||||
console.log('\nFirst 5 Product Cards:');
|
||||
productCardPrices.forEach((card, idx) => {
|
||||
console.log(`\n Card ${idx + 1}:`);
|
||||
console.log(` Has $: ${card.hasDollar}`);
|
||||
console.log(` Text: ${card.text}`);
|
||||
});
|
||||
|
||||
console.log('\n' + '='.repeat(80));
|
||||
|
||||
await browser.close();
|
||||
process.exit(0);
|
||||
}
|
||||
|
||||
checkForPrices().catch(console.error);
|
||||
33
backend/archive/check-jobs.js
Normal file
33
backend/archive/check-jobs.js
Normal file
@@ -0,0 +1,33 @@
|
||||
const { Pool } = require('pg');
|
||||
|
||||
const pool = new Pool({
|
||||
connectionString: process.env.DATABASE_URL || 'postgresql://kelly:kelly@localhost:5432/hub'
|
||||
});
|
||||
|
||||
(async () => {
|
||||
try {
|
||||
const result = await pool.query(`
|
||||
SELECT id, status, total_proxies, tested_proxies, passed_proxies, failed_proxies,
|
||||
created_at, started_at, completed_at
|
||||
FROM proxy_test_jobs
|
||||
ORDER BY created_at DESC
|
||||
LIMIT 5
|
||||
`);
|
||||
|
||||
console.log('\n📊 Recent Proxy Test Jobs:');
|
||||
console.log('='.repeat(80));
|
||||
result.rows.forEach(job => {
|
||||
console.log(`\nJob ID: ${job.id}`);
|
||||
console.log(`Status: ${job.status}`);
|
||||
console.log(`Progress: ${job.tested_proxies}/${job.total_proxies} (${job.passed_proxies} passed, ${job.failed_proxies} failed)`);
|
||||
console.log(`Created: ${job.created_at}`);
|
||||
console.log(`Started: ${job.started_at || 'N/A'}`);
|
||||
console.log(`Completed: ${job.completed_at || 'N/A'}`);
|
||||
});
|
||||
|
||||
process.exit(0);
|
||||
} catch (error) {
|
||||
console.error('Error:', error);
|
||||
process.exit(1);
|
||||
}
|
||||
})();
|
||||
40
backend/archive/check-jobs.ts
Normal file
40
backend/archive/check-jobs.ts
Normal file
@@ -0,0 +1,40 @@
|
||||
import pg from 'pg';
|
||||
|
||||
const client = new pg.Client({
|
||||
connectionString: process.env.DATABASE_URL,
|
||||
});
|
||||
|
||||
async function checkJobs() {
|
||||
await client.connect();
|
||||
|
||||
const statusRes = await client.query(`
|
||||
SELECT status, COUNT(*) as count
|
||||
FROM brand_scrape_jobs
|
||||
WHERE dispensary_id = 112
|
||||
GROUP BY status
|
||||
ORDER BY status
|
||||
`);
|
||||
|
||||
console.log('\n📊 Job Status Summary:');
|
||||
console.log('====================');
|
||||
statusRes.rows.forEach(row => {
|
||||
console.log(`${row.status}: ${row.count}`);
|
||||
});
|
||||
|
||||
const activeRes = await client.query(`
|
||||
SELECT worker_id, COUNT(*) as count
|
||||
FROM brand_scrape_jobs
|
||||
WHERE dispensary_id = 112 AND status = 'in_progress'
|
||||
GROUP BY worker_id
|
||||
`);
|
||||
|
||||
console.log('\n👷 Active Workers:');
|
||||
console.log('==================');
|
||||
activeRes.rows.forEach(row => {
|
||||
console.log(`${row.worker_id}: ${row.count} jobs`);
|
||||
});
|
||||
|
||||
await client.end();
|
||||
}
|
||||
|
||||
checkJobs();
|
||||
110
backend/archive/check-leaks.ts
Normal file
110
backend/archive/check-leaks.ts
Normal file
@@ -0,0 +1,110 @@
|
||||
import puppeteer from 'puppeteer-extra';
|
||||
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
|
||||
import { Pool } from 'pg';
|
||||
|
||||
puppeteer.use(StealthPlugin());
|
||||
|
||||
const pool = new Pool({
|
||||
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
|
||||
});
|
||||
|
||||
async function check() {
|
||||
let browser;
|
||||
|
||||
try {
|
||||
const proxyResult = await pool.query(`SELECT host, port, protocol FROM proxies ORDER BY RANDOM() LIMIT 1`);
|
||||
const proxy = proxyResult.rows[0];
|
||||
|
||||
browser = await puppeteer.launch({
|
||||
headless: true,
|
||||
args: ['--no-sandbox', '--disable-setuid-sandbox', `--proxy-server=${proxy.protocol}://${proxy.host}:${proxy.port}`]
|
||||
});
|
||||
|
||||
const page = await browser.newPage();
|
||||
await page.setUserAgent('Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)');
|
||||
|
||||
console.log('🔍 CHECKING FOR DATA LEAKS\n');
|
||||
|
||||
await page.goto('https://curaleaf.com/stores/curaleaf-dispensary-phoenix-airport/brands', {
|
||||
waitUntil: 'networkidle2',
|
||||
timeout: 60000
|
||||
});
|
||||
|
||||
await page.waitForTimeout(5000);
|
||||
|
||||
// Check what browser exposes
|
||||
const browserData = await page.evaluate(() => ({
|
||||
// Automation detection
|
||||
webdriver: navigator.webdriver,
|
||||
hasHeadlessUA: /headless/i.test(navigator.userAgent),
|
||||
|
||||
// User agent
|
||||
userAgent: navigator.userAgent,
|
||||
|
||||
// Chrome detection
|
||||
hasChrome: typeof (window as any).chrome !== 'undefined',
|
||||
chromeKeys: (window as any).chrome ? Object.keys((window as any).chrome) : [],
|
||||
|
||||
// Permissions
|
||||
permissions: navigator.permissions ? 'exists' : 'missing',
|
||||
|
||||
// Languages
|
||||
languages: navigator.languages,
|
||||
language: navigator.language,
|
||||
|
||||
// Plugins
|
||||
pluginCount: navigator.plugins.length,
|
||||
|
||||
// Platform
|
||||
platform: navigator.platform,
|
||||
|
||||
// Screen
|
||||
screenWidth: screen.width,
|
||||
screenHeight: screen.height,
|
||||
|
||||
// JavaScript working?
|
||||
jsWorking: true,
|
||||
|
||||
// Page content
|
||||
title: document.title,
|
||||
bodyLength: document.body.innerHTML.length,
|
||||
hasReactRoot: document.getElementById('__next') !== null,
|
||||
scriptTags: document.querySelectorAll('script').length
|
||||
}));
|
||||
|
||||
console.log('📋 BROWSER FINGERPRINT:');
|
||||
console.log('─'.repeat(60));
|
||||
console.log('navigator.webdriver:', browserData.webdriver, browserData.webdriver ? '❌ LEAKED!' : '✅');
|
||||
console.log('navigator.userAgent:', browserData.userAgent);
|
||||
console.log('Has "headless" in UA:', browserData.hasHeadlessUA, browserData.hasHeadlessUA ? '❌' : '✅');
|
||||
console.log('window.chrome exists:', browserData.hasChrome, browserData.hasChrome ? '✅' : '❌ SUSPICIOUS');
|
||||
console.log('Chrome keys:', browserData.chromeKeys.join(', '));
|
||||
console.log('Languages:', browserData.languages);
|
||||
console.log('Platform:', browserData.platform);
|
||||
console.log('Plugins:', browserData.pluginCount);
|
||||
|
||||
console.log('\n📄 PAGE STATE:');
|
||||
console.log('─'.repeat(60));
|
||||
console.log('JavaScript executing:', browserData.jsWorking ? '✅ YES' : '❌ NO');
|
||||
console.log('Page title:', `"${browserData.title}"`);
|
||||
console.log('Body HTML size:', browserData.bodyLength, 'chars');
|
||||
console.log('React root exists:', browserData.hasReactRoot ? '✅' : '❌');
|
||||
console.log('Script tags:', browserData.scriptTags);
|
||||
|
||||
if (browserData.bodyLength < 1000) {
|
||||
console.log('\n⚠️ PROBLEM: Body too small! JS likely failed to load/execute');
|
||||
}
|
||||
|
||||
if (!browserData.title) {
|
||||
console.log('⚠️ PROBLEM: No page title! Page didn\'t render');
|
||||
}
|
||||
|
||||
} catch (error: any) {
|
||||
console.error('❌', error.message);
|
||||
} finally {
|
||||
if (browser) await browser.close();
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
check();
|
||||
72
backend/archive/check-product-data.ts
Normal file
72
backend/archive/check-product-data.ts
Normal file
@@ -0,0 +1,72 @@
|
||||
import { pool } from './src/db/migrate.js';
|
||||
|
||||
async function checkProductData() {
|
||||
// Get a few recently saved products
|
||||
const result = await pool.query(`
|
||||
SELECT
|
||||
slug,
|
||||
name,
|
||||
brand,
|
||||
variant,
|
||||
regular_price,
|
||||
sale_price,
|
||||
thc_percentage,
|
||||
cbd_percentage,
|
||||
strain_type,
|
||||
in_stock,
|
||||
stock_status,
|
||||
image_url
|
||||
FROM products
|
||||
WHERE dispensary_id = 112
|
||||
AND brand IN ('(the) Essence', 'Abundant Organics', 'AAchieve', 'Alien Labs')
|
||||
ORDER BY updated_at DESC
|
||||
LIMIT 10
|
||||
`);
|
||||
|
||||
console.log('\n📊 Recently Saved Products:');
|
||||
console.log('='.repeat(100));
|
||||
|
||||
result.rows.forEach((row, idx) => {
|
||||
console.log(`\n${idx + 1}. ${row.name} (${row.brand})`);
|
||||
console.log(` Variant: ${row.variant || 'N/A'}`);
|
||||
console.log(` Regular Price: $${row.regular_price || 'N/A'}`);
|
||||
console.log(` Sale Price: $${row.sale_price || 'N/A'}`);
|
||||
console.log(` THC %: ${row.thc_percentage || 'N/A'}%`);
|
||||
console.log(` CBD %: ${row.cbd_percentage || 'N/A'}%`);
|
||||
console.log(` Strain: ${row.strain_type || 'N/A'}`);
|
||||
console.log(` Stock: ${row.stock_status || (row.in_stock ? 'In stock' : 'Out of stock')}`);
|
||||
console.log(` Image: ${row.image_url ? '✓' : 'N/A'}`);
|
||||
});
|
||||
|
||||
console.log('\n' + '='.repeat(100));
|
||||
|
||||
// Count how many products have complete data
|
||||
const stats = await pool.query(`
|
||||
SELECT
|
||||
COUNT(*) as total,
|
||||
COUNT(regular_price) as has_price,
|
||||
COUNT(thc_percentage) as has_thc,
|
||||
COUNT(cbd_percentage) as has_cbd,
|
||||
COUNT(variant) as has_variant,
|
||||
COUNT(strain_type) as has_strain,
|
||||
COUNT(image_url) as has_image
|
||||
FROM products
|
||||
WHERE dispensary_id = 112
|
||||
AND brand IN ('(the) Essence', 'Abundant Organics', 'AAchieve', 'Alien Labs')
|
||||
`);
|
||||
|
||||
const stat = stats.rows[0];
|
||||
console.log('\n📈 Data Completeness for Recently Scraped Brands:');
|
||||
console.log(` Total products: ${stat.total}`);
|
||||
console.log(` Has price: ${stat.has_price} (${Math.round(stat.has_price / stat.total * 100)}%)`);
|
||||
console.log(` Has THC%: ${stat.has_thc} (${Math.round(stat.has_thc / stat.total * 100)}%)`);
|
||||
console.log(` Has CBD%: ${stat.has_cbd} (${Math.round(stat.has_cbd / stat.total * 100)}%)`);
|
||||
console.log(` Has variant: ${stat.has_variant} (${Math.round(stat.has_variant / stat.total * 100)}%)`);
|
||||
console.log(` Has strain type: ${stat.has_strain} (${Math.round(stat.has_strain / stat.total * 100)}%)`);
|
||||
console.log(` Has image: ${stat.has_image} (${Math.round(stat.has_image / stat.total * 100)}%)`);
|
||||
console.log('');
|
||||
|
||||
await pool.end();
|
||||
}
|
||||
|
||||
checkProductData().catch(console.error);
|
||||
75
backend/archive/check-product-detail-page.ts
Normal file
75
backend/archive/check-product-detail-page.ts
Normal file
@@ -0,0 +1,75 @@
|
||||
import { firefox } from 'playwright';
|
||||
import { getRandomProxy } from './src/utils/proxyManager.js';
|
||||
|
||||
async function checkProductDetailPage() {
|
||||
const proxy = await getRandomProxy();
|
||||
if (!proxy) {
|
||||
console.log('No proxy available');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const browser = await firefox.launch({ headless: true });
|
||||
const context = await browser.newContext({
|
||||
viewport: { width: 1920, height: 1080 },
|
||||
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
|
||||
proxy: {
|
||||
server: proxy.server,
|
||||
username: proxy.username,
|
||||
password: proxy.password
|
||||
}
|
||||
});
|
||||
|
||||
const page = await context.newPage();
|
||||
|
||||
// Load a product detail page
|
||||
const productUrl = 'https://dutchie.com/embedded-menu/AZ-Deeply-Rooted/product/alien-labs-cured-resin-cart-dark-web';
|
||||
console.log(`Loading: ${productUrl}`);
|
||||
|
||||
await page.goto(productUrl, { waitUntil: 'domcontentloaded', timeout: 60000 });
|
||||
await page.waitForTimeout(5000);
|
||||
|
||||
// Extract all data from the product detail page
|
||||
const productData = await page.evaluate(() => {
|
||||
const pageText = document.body.textContent || '';
|
||||
|
||||
// Check for prices
|
||||
const priceElements = Array.from(document.querySelectorAll('*')).filter(el => {
|
||||
const text = el.textContent?.trim() || '';
|
||||
return text.match(/\$\d+/) && el.children.length === 0; // Leaf nodes only
|
||||
});
|
||||
|
||||
// Check for stock information
|
||||
const stockElements = Array.from(document.querySelectorAll('*')).filter(el => {
|
||||
const text = el.textContent?.toLowerCase() || '';
|
||||
return (text.includes('stock') || text.includes('available') || text.includes('in stock') || text.includes('out of stock')) && el.children.length === 0;
|
||||
});
|
||||
|
||||
return {
|
||||
hasPrice: pageText.includes('$'),
|
||||
priceText: priceElements.slice(0, 5).map(el => el.textContent?.trim()),
|
||||
stockText: stockElements.slice(0, 5).map(el => el.textContent?.trim()),
|
||||
pageTextSample: pageText.substring(0, 500)
|
||||
};
|
||||
});
|
||||
|
||||
console.log('\n' + '='.repeat(80));
|
||||
console.log('PRODUCT DETAIL PAGE DATA:');
|
||||
console.log('='.repeat(80));
|
||||
console.log('\nHas "$" symbol:', productData.hasPrice);
|
||||
console.log('\nPrice elements found:');
|
||||
productData.priceText.forEach((text, idx) => {
|
||||
console.log(` ${idx + 1}. ${text}`);
|
||||
});
|
||||
console.log('\nStock elements found:');
|
||||
productData.stockText.forEach((text, idx) => {
|
||||
console.log(` ${idx + 1}. ${text}`);
|
||||
});
|
||||
console.log('\nPage text sample:');
|
||||
console.log(productData.pageTextSample);
|
||||
console.log('\n' + '='.repeat(80));
|
||||
|
||||
await browser.close();
|
||||
process.exit(0);
|
||||
}
|
||||
|
||||
checkProductDetailPage().catch(console.error);
|
||||
56
backend/archive/check-product-prices.ts
Normal file
56
backend/archive/check-product-prices.ts
Normal file
@@ -0,0 +1,56 @@
|
||||
import { pool } from './src/db/migrate.js';
|
||||
|
||||
async function checkProductPrices() {
|
||||
const result = await pool.query(`
|
||||
SELECT
|
||||
id,
|
||||
name,
|
||||
brand,
|
||||
regular_price,
|
||||
sale_price,
|
||||
in_stock,
|
||||
stock_status
|
||||
FROM products
|
||||
WHERE dispensary_id = 112
|
||||
ORDER BY brand, name
|
||||
LIMIT 50
|
||||
`);
|
||||
|
||||
console.log('\n' + '='.repeat(100));
|
||||
console.log('PRODUCTS WITH PRICES');
|
||||
console.log('='.repeat(100) + '\n');
|
||||
|
||||
result.rows.forEach((row, idx) => {
|
||||
const regularPrice = row.regular_price ? `$${row.regular_price.toFixed(2)}` : 'N/A';
|
||||
const salePrice = row.sale_price ? `$${row.sale_price.toFixed(2)}` : 'N/A';
|
||||
const stock = row.in_stock ? (row.stock_status || 'In Stock') : 'Out of Stock';
|
||||
|
||||
console.log(`${idx + 1}. ${row.brand} - ${row.name.substring(0, 50)}`);
|
||||
console.log(` Price: ${regularPrice} | Sale: ${salePrice} | Stock: ${stock}`);
|
||||
console.log('');
|
||||
});
|
||||
|
||||
console.log('='.repeat(100) + '\n');
|
||||
|
||||
// Summary stats
|
||||
const stats = await pool.query(`
|
||||
SELECT
|
||||
COUNT(*) as total_products,
|
||||
COUNT(regular_price) as products_with_price,
|
||||
COUNT(sale_price) as products_with_sale,
|
||||
COUNT(CASE WHEN in_stock THEN 1 END) as in_stock_count
|
||||
FROM products
|
||||
WHERE dispensary_id = 112
|
||||
`);
|
||||
|
||||
console.log('SUMMARY:');
|
||||
console.log(` Total products: ${stats.rows[0].total_products}`);
|
||||
console.log(` Products with regular price: ${stats.rows[0].products_with_price}`);
|
||||
console.log(` Products with sale price: ${stats.rows[0].products_with_sale}`);
|
||||
console.log(` Products in stock: ${stats.rows[0].in_stock_count}`);
|
||||
console.log('\n');
|
||||
|
||||
await pool.end();
|
||||
}
|
||||
|
||||
checkProductPrices().catch(console.error);
|
||||
47
backend/archive/check-products-schema.ts
Normal file
47
backend/archive/check-products-schema.ts
Normal file
@@ -0,0 +1,47 @@
|
||||
import { pool } from './src/db/migrate.js';
|
||||
|
||||
async function main() {
|
||||
const result = await pool.query(`
|
||||
SELECT column_name, data_type, is_nullable
|
||||
FROM information_schema.columns
|
||||
WHERE table_name = 'products'
|
||||
ORDER BY ordinal_position;
|
||||
`);
|
||||
|
||||
console.log('Products table columns:');
|
||||
result.rows.forEach(row => {
|
||||
console.log(` ${row.column_name}: ${row.data_type} (${row.is_nullable === 'YES' ? 'nullable' : 'NOT NULL'})`);
|
||||
});
|
||||
|
||||
const constraints = await pool.query(`
|
||||
SELECT constraint_name, constraint_type
|
||||
FROM information_schema.table_constraints
|
||||
WHERE table_name = 'products';
|
||||
`);
|
||||
|
||||
console.log('\nProducts table constraints:');
|
||||
constraints.rows.forEach(row => {
|
||||
console.log(` ${row.constraint_name}: ${row.constraint_type}`);
|
||||
});
|
||||
|
||||
// Get unique constraints details
|
||||
const uniqueConstraints = await pool.query(`
|
||||
SELECT
|
||||
tc.constraint_name,
|
||||
kcu.column_name
|
||||
FROM information_schema.table_constraints tc
|
||||
JOIN information_schema.key_column_usage kcu
|
||||
ON tc.constraint_name = kcu.constraint_name
|
||||
WHERE tc.table_name = 'products'
|
||||
AND tc.constraint_type IN ('PRIMARY KEY', 'UNIQUE');
|
||||
`);
|
||||
|
||||
console.log('\nUnique/Primary key constraints:');
|
||||
uniqueConstraints.rows.forEach(row => {
|
||||
console.log(` ${row.constraint_name}: ${row.column_name}`);
|
||||
});
|
||||
|
||||
await pool.end();
|
||||
}
|
||||
|
||||
main().catch(console.error);
|
||||
29
backend/archive/check-products.ts
Normal file
29
backend/archive/check-products.ts
Normal file
@@ -0,0 +1,29 @@
|
||||
import { Pool } from 'pg';
|
||||
|
||||
const pool = new Pool({
|
||||
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
|
||||
});
|
||||
|
||||
async function checkProducts() {
|
||||
try {
|
||||
console.log('Products table columns:');
|
||||
const productsColumns = await pool.query(`
|
||||
SELECT column_name, data_type
|
||||
FROM information_schema.columns
|
||||
WHERE table_name = 'products'
|
||||
ORDER BY ordinal_position
|
||||
`);
|
||||
productsColumns.rows.forEach(r => console.log(` - ${r.column_name}: ${r.data_type}`));
|
||||
|
||||
console.log('\nSample products:');
|
||||
const products = await pool.query('SELECT * FROM products LIMIT 3');
|
||||
console.log(JSON.stringify(products.rows, null, 2));
|
||||
|
||||
} catch (error: any) {
|
||||
console.error('Error:', error.message);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
checkProducts();
|
||||
31
backend/archive/check-proxies.ts
Normal file
31
backend/archive/check-proxies.ts
Normal file
@@ -0,0 +1,31 @@
|
||||
import { Pool } from 'pg';
|
||||
|
||||
const pool = new Pool({
|
||||
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
|
||||
});
|
||||
|
||||
async function checkProxies() {
|
||||
try {
|
||||
const result = await pool.query(`
|
||||
SELECT
|
||||
COUNT(*) as total,
|
||||
COUNT(*) FILTER (WHERE active = true) as active,
|
||||
COUNT(*) FILTER (WHERE state = 'Arizona') as arizona
|
||||
FROM proxies
|
||||
`);
|
||||
|
||||
console.log('Proxy Stats:');
|
||||
console.log('─'.repeat(40));
|
||||
console.log(`Total: ${result.rows[0].total}`);
|
||||
console.log(`Active: ${result.rows[0].active}`);
|
||||
console.log(`Arizona: ${result.rows[0].arizona}`);
|
||||
console.log('─'.repeat(40));
|
||||
|
||||
} catch (error: any) {
|
||||
console.error('Error:', error.message);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
checkProxies();
|
||||
36
backend/archive/check-proxy-stats.js
Normal file
36
backend/archive/check-proxy-stats.js
Normal file
@@ -0,0 +1,36 @@
|
||||
const { Pool } = require('pg');
|
||||
|
||||
const pool = new Pool({
|
||||
connectionString: process.env.DATABASE_URL || 'postgresql://kelly:kelly@localhost:5432/hub'
|
||||
});
|
||||
|
||||
(async () => {
|
||||
try {
|
||||
const stats = await pool.query(`
|
||||
SELECT
|
||||
COUNT(*) as total,
|
||||
COUNT(*) FILTER (WHERE active = true) as active,
|
||||
COUNT(*) FILTER (WHERE active = false) as inactive,
|
||||
COUNT(*) FILTER (WHERE test_result = 'success') as passed,
|
||||
COUNT(*) FILTER (WHERE test_result = 'failed') as failed,
|
||||
COUNT(*) FILTER (WHERE test_result IS NULL) as untested
|
||||
FROM proxies
|
||||
`);
|
||||
|
||||
const s = stats.rows[0];
|
||||
console.log('\n📊 Proxy Statistics:');
|
||||
console.log('='.repeat(60));
|
||||
console.log(`Total Proxies: ${s.total}`);
|
||||
console.log(`Active: ${s.active} (passing tests)`);
|
||||
console.log(`Inactive: ${s.inactive} (failed tests)`);
|
||||
console.log(`Test Results:`);
|
||||
console.log(` ✅ Passed: ${s.passed}`);
|
||||
console.log(` ❌ Failed: ${s.failed}`);
|
||||
console.log(` ⚪ Untested: ${s.untested}`);
|
||||
|
||||
process.exit(0);
|
||||
} catch (error) {
|
||||
console.error('Error:', error);
|
||||
process.exit(1);
|
||||
}
|
||||
})();
|
||||
60
backend/archive/check-scraper-data.js
Normal file
60
backend/archive/check-scraper-data.js
Normal file
@@ -0,0 +1,60 @@
|
||||
const { Pool } = require('pg');
|
||||
|
||||
const pool = new Pool({
|
||||
connectionString: process.env.DATABASE_URL || 'postgresql://kelly:kelly@localhost:5432/hub'
|
||||
});
|
||||
|
||||
(async () => {
|
||||
try {
|
||||
// Check for categories that have been scraped
|
||||
const historyResult = await pool.query(`
|
||||
SELECT
|
||||
s.id as store_id,
|
||||
s.name as store_name,
|
||||
c.id as category_id,
|
||||
c.name as category_name,
|
||||
c.last_scraped_at,
|
||||
(
|
||||
SELECT COUNT(*)
|
||||
FROM products p
|
||||
WHERE p.store_id = s.id
|
||||
AND p.category_id = c.id
|
||||
) as product_count
|
||||
FROM stores s
|
||||
LEFT JOIN categories c ON c.store_id = s.id
|
||||
WHERE c.last_scraped_at IS NOT NULL
|
||||
ORDER BY c.last_scraped_at DESC
|
||||
LIMIT 10
|
||||
`);
|
||||
|
||||
console.log('\n📊 Scraper History:');
|
||||
console.log('='.repeat(80));
|
||||
if (historyResult.rows.length === 0) {
|
||||
console.log('No scraper history found. No categories have been scraped yet.');
|
||||
} else {
|
||||
historyResult.rows.forEach(row => {
|
||||
console.log(`\nStore: ${row.store_name} (ID: ${row.store_id})`);
|
||||
console.log(`Category: ${row.category_name} (ID: ${row.category_id})`);
|
||||
console.log(`Last Scraped: ${row.last_scraped_at}`);
|
||||
console.log(`Products: ${row.product_count}`);
|
||||
});
|
||||
}
|
||||
|
||||
// Check total categories
|
||||
const totalCategoriesResult = await pool.query(`
|
||||
SELECT COUNT(*) as total FROM categories
|
||||
`);
|
||||
console.log(`\n\nTotal Categories: ${totalCategoriesResult.rows[0].total}`);
|
||||
|
||||
// Check categories with last_scraped_at
|
||||
const scrapedCategoriesResult = await pool.query(`
|
||||
SELECT COUNT(*) as scraped FROM categories WHERE last_scraped_at IS NOT NULL
|
||||
`);
|
||||
console.log(`Categories Scraped: ${scrapedCategoriesResult.rows[0].scraped}`);
|
||||
|
||||
process.exit(0);
|
||||
} catch (error) {
|
||||
console.error('Error:', error);
|
||||
process.exit(1);
|
||||
}
|
||||
})();
|
||||
32
backend/archive/check-select-prices.ts
Normal file
32
backend/archive/check-select-prices.ts
Normal file
@@ -0,0 +1,32 @@
|
||||
import { pool } from './src/db/migrate.js';
|
||||
|
||||
async function main() {
|
||||
const result = await pool.query(`
|
||||
SELECT name, brand, regular_price, sale_price, in_stock, stock_status
|
||||
FROM products
|
||||
WHERE dispensary_id = 112
|
||||
AND brand = 'Select'
|
||||
ORDER BY name
|
||||
LIMIT 10
|
||||
`);
|
||||
|
||||
console.log('\n' + '='.repeat(100));
|
||||
console.log('SELECT BRAND PRODUCTS WITH PRICES (NEW ONE-PASS APPROACH)');
|
||||
console.log('='.repeat(100) + '\n');
|
||||
|
||||
result.rows.forEach((row, idx) => {
|
||||
const regularPrice = row.regular_price ? `$${parseFloat(row.regular_price).toFixed(2)}` : 'N/A';
|
||||
const salePrice = row.sale_price ? `$${parseFloat(row.sale_price).toFixed(2)}` : 'N/A';
|
||||
const stock = row.in_stock ? (row.stock_status || 'In Stock') : 'Out of Stock';
|
||||
|
||||
console.log(`${idx + 1}. ${row.name}`);
|
||||
console.log(` Price: ${regularPrice} | Sale: ${salePrice} | Stock: ${stock}`);
|
||||
console.log('');
|
||||
});
|
||||
|
||||
console.log('='.repeat(100) + '\n');
|
||||
|
||||
await pool.end();
|
||||
}
|
||||
|
||||
main().catch(console.error);
|
||||
161
backend/archive/check-special-product.ts
Normal file
161
backend/archive/check-special-product.ts
Normal file
@@ -0,0 +1,161 @@
|
||||
import { firefox } from 'playwright';
|
||||
import { pool } from './src/db/migrate.js';
|
||||
import { getRandomProxy } from './src/utils/proxyManager.js';
|
||||
|
||||
async function checkProduct() {
|
||||
const proxy = await getRandomProxy();
|
||||
if (!proxy) {
|
||||
console.log('No proxy available');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
console.log(`Using proxy: ${proxy.server}`);
|
||||
|
||||
const browser = await firefox.launch({
|
||||
headless: true,
|
||||
firefoxUserPrefs: {
|
||||
'geo.enabled': true,
|
||||
}
|
||||
});
|
||||
|
||||
const context = await browser.newContext({
|
||||
viewport: { width: 1920, height: 1080 },
|
||||
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0',
|
||||
geolocation: { latitude: 33.4484, longitude: -112.0740 },
|
||||
permissions: ['geolocation'],
|
||||
proxy: {
|
||||
server: proxy.server,
|
||||
username: proxy.username,
|
||||
password: proxy.password
|
||||
}
|
||||
});
|
||||
|
||||
const page = await context.newPage();
|
||||
|
||||
try {
|
||||
console.log('Loading product page...');
|
||||
const url = process.argv[2] || 'https://dutchie.com/embedded-menu/AZ-Deeply-Rooted/product/abundant-organics-flower-mylar-abundant-horizon';
|
||||
await page.goto(url, {
|
||||
waitUntil: 'domcontentloaded',
|
||||
timeout: 30000
|
||||
});
|
||||
|
||||
await page.waitForTimeout(5000);
|
||||
|
||||
const productData = await page.evaluate(() => {
|
||||
const data: any = { fields: {} };
|
||||
const allText = document.body.textContent || '';
|
||||
|
||||
// 1. BASIC INFO
|
||||
const nameEl = document.querySelector('h1');
|
||||
data.fields.name = nameEl?.textContent?.trim() || null;
|
||||
|
||||
// 2. CATEGORY - look for breadcrumbs or category links
|
||||
const breadcrumbs = Array.from(document.querySelectorAll('[class*="breadcrumb"] a, nav a'));
|
||||
data.fields.category = breadcrumbs.map(b => b.textContent?.trim()).filter(Boolean);
|
||||
|
||||
// 3. BRAND
|
||||
const brandSelectors = ['[class*="brand"]', '[data-testid*="brand"]', 'span:has-text("Brand")', 'label:has-text("Brand")'];
|
||||
for (const sel of brandSelectors) {
|
||||
try {
|
||||
const el = document.querySelector(sel);
|
||||
if (el && el.textContent && !el.textContent.includes('Brand:')) {
|
||||
data.fields.brand = el.textContent.trim();
|
||||
break;
|
||||
}
|
||||
} catch {}
|
||||
}
|
||||
|
||||
// 4. PRICES
|
||||
const priceMatches = allText.match(/\$(\d+\.?\d*)/g);
|
||||
data.fields.prices = priceMatches || [];
|
||||
|
||||
// 5. THC/CBD CONTENT
|
||||
const thcMatch = allText.match(/THC[:\s]*(\d+\.?\d*)\s*%/i);
|
||||
const cbdMatch = allText.match(/CBD[:\s]*(\d+\.?\d*)\s*%/i);
|
||||
data.fields.thc = thcMatch ? parseFloat(thcMatch[1]) : null;
|
||||
data.fields.cbd = cbdMatch ? parseFloat(cbdMatch[1]) : null;
|
||||
|
||||
// 6. STRAIN TYPE
|
||||
if (allText.match(/\bindica\b/i)) data.fields.strainType = 'Indica';
|
||||
else if (allText.match(/\bsativa\b/i)) data.fields.strainType = 'Sativa';
|
||||
else if (allText.match(/\bhybrid\b/i)) data.fields.strainType = 'Hybrid';
|
||||
|
||||
// 7. WEIGHT/SIZE OPTIONS
|
||||
const weights = allText.matchAll(/(\d+\.?\d*\s*(?:g|oz|mg|ml|gram|ounce))/gi);
|
||||
data.fields.weights = Array.from(weights).map(m => m[1].trim());
|
||||
|
||||
// 8. DESCRIPTION
|
||||
const descSelectors = ['[class*="description"]', '[class*="Description"]', 'p[class*="product"]'];
|
||||
for (const sel of descSelectors) {
|
||||
const el = document.querySelector(sel);
|
||||
if (el?.textContent && el.textContent.length > 20) {
|
||||
data.fields.description = el.textContent.trim().substring(0, 500);
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
// 9. EFFECTS
|
||||
const effectNames = ['Relaxed', 'Happy', 'Euphoric', 'Uplifted', 'Creative', 'Energetic', 'Focused', 'Calm', 'Sleepy', 'Hungry'];
|
||||
data.fields.effects = effectNames.filter(e => allText.match(new RegExp(`\\b${e}\\b`, 'i')));
|
||||
|
||||
// 10. TERPENES
|
||||
const terpeneNames = ['Myrcene', 'Limonene', 'Caryophyllene', 'Pinene', 'Linalool', 'Humulene'];
|
||||
data.fields.terpenes = terpeneNames.filter(t => allText.match(new RegExp(`\\b${t}\\b`, 'i')));
|
||||
|
||||
// 11. FLAVORS
|
||||
const flavorNames = ['Sweet', 'Citrus', 'Earthy', 'Pine', 'Berry', 'Diesel', 'Sour', 'Floral', 'Spicy'];
|
||||
data.fields.flavors = flavorNames.filter(f => allText.match(new RegExp(`\\b${f}\\b`, 'i')));
|
||||
|
||||
// 12. SPECIAL INFO
|
||||
data.fields.hasSpecialText = allText.includes('Special') || allText.includes('Sale') || allText.includes('Deal');
|
||||
const endsMatch = allText.match(/(?:ends?|expires?)\s+(?:in\s+)?(\d+)\s+(min|hour|day)/i);
|
||||
data.fields.specialEndsIn = endsMatch ? `${endsMatch[1]} ${endsMatch[2]}` : null;
|
||||
|
||||
// 13. IMAGE URLS
|
||||
const images = Array.from(document.querySelectorAll('img[src*="dutchie"]'));
|
||||
data.fields.imageUrls = images.map(img => (img as HTMLImageElement).src).filter(Boolean);
|
||||
|
||||
// 14. ALL VISIBLE TEXT (for debugging)
|
||||
data.allVisibleText = allText.substring(0, 1000);
|
||||
|
||||
// 15. STRUCTURED DATA FROM SCRIPTS
|
||||
const scripts = Array.from(document.querySelectorAll('script'));
|
||||
data.structuredData = {};
|
||||
|
||||
for (const script of scripts) {
|
||||
const content = script.textContent || '';
|
||||
|
||||
const idMatch = content.match(/"id":"([a-f0-9-]+)"/);
|
||||
if (idMatch && idMatch[1].length > 10) {
|
||||
data.structuredData.productId = idMatch[1];
|
||||
}
|
||||
|
||||
const variantMatch = content.match(/"variantId":"([^"]+)"/);
|
||||
if (variantMatch) {
|
||||
data.structuredData.variantId = variantMatch[1];
|
||||
}
|
||||
|
||||
const categoryMatch = content.match(/"category":"([^"]+)"/);
|
||||
if (categoryMatch) {
|
||||
data.structuredData.category = categoryMatch[1];
|
||||
}
|
||||
}
|
||||
|
||||
return data;
|
||||
});
|
||||
|
||||
console.log('\n=== PRODUCT DATA (Time: ' + new Date().toISOString() + ') ===');
|
||||
console.log(JSON.stringify(productData, null, 2));
|
||||
|
||||
await browser.close();
|
||||
await pool.end();
|
||||
} catch (error) {
|
||||
console.error('Error:', error);
|
||||
await browser.close();
|
||||
await pool.end();
|
||||
process.exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
checkProduct();
|
||||
49
backend/archive/check-store.ts
Normal file
49
backend/archive/check-store.ts
Normal file
@@ -0,0 +1,49 @@
|
||||
import { Pool } from 'pg';
|
||||
|
||||
const pool = new Pool({
|
||||
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
|
||||
});
|
||||
|
||||
async function checkStore() {
|
||||
try {
|
||||
const store = await pool.query(`
|
||||
SELECT id, name, slug FROM stores WHERE slug = 'curaleaf-az-48th-street'
|
||||
`);
|
||||
|
||||
if (store.rows.length > 0) {
|
||||
console.log('Store found:', store.rows[0]);
|
||||
|
||||
// Check if it has products
|
||||
const products = await pool.query(`
|
||||
SELECT COUNT(*) as total, COUNT(DISTINCT brand) as brands
|
||||
FROM products WHERE store_id = $1
|
||||
`, [store.rows[0].id]);
|
||||
|
||||
console.log('Store products:', products.rows[0]);
|
||||
|
||||
// Check distinct brands
|
||||
const brands = await pool.query(`
|
||||
SELECT DISTINCT brand FROM products
|
||||
WHERE store_id = $1 AND brand IS NOT NULL
|
||||
ORDER BY brand
|
||||
`, [store.rows[0].id]);
|
||||
|
||||
console.log('\nCurrent brands:', brands.rows.map(r => r.brand));
|
||||
} else {
|
||||
console.log('Store not found');
|
||||
|
||||
// Show available stores
|
||||
const stores = await pool.query(`
|
||||
SELECT slug FROM stores WHERE slug LIKE '%48th%'
|
||||
`);
|
||||
console.log('Stores with 48th:', stores.rows.map(r => r.slug));
|
||||
}
|
||||
|
||||
} catch (error: any) {
|
||||
console.error('Error:', error.message);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
checkStore();
|
||||
24
backend/archive/check-stores.ts
Normal file
24
backend/archive/check-stores.ts
Normal file
@@ -0,0 +1,24 @@
|
||||
import { pool } from './src/db/migrate';
|
||||
|
||||
async function checkStores() {
|
||||
try {
|
||||
const result = await pool.query(`
|
||||
SELECT id, name, slug, dutchie_url
|
||||
FROM stores
|
||||
WHERE name ILIKE '%sol flower%'
|
||||
ORDER BY name
|
||||
`);
|
||||
|
||||
console.log(`Found ${result.rows.length} Sol Flower stores:\n`);
|
||||
result.rows.forEach(store => {
|
||||
console.log(`ID ${store.id}: ${store.name}`);
|
||||
console.log(` URL: ${store.dutchie_url}\n`);
|
||||
});
|
||||
} catch (error) {
|
||||
console.error('Error:', error);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
checkStores();
|
||||
35
backend/archive/check-tables.ts
Normal file
35
backend/archive/check-tables.ts
Normal file
@@ -0,0 +1,35 @@
|
||||
import { Pool } from 'pg';
|
||||
|
||||
const pool = new Pool({
|
||||
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
|
||||
});
|
||||
|
||||
async function checkTables() {
|
||||
try {
|
||||
const result = await pool.query(`
|
||||
SELECT table_name
|
||||
FROM information_schema.tables
|
||||
WHERE table_schema = 'public'
|
||||
ORDER BY table_name
|
||||
`);
|
||||
|
||||
console.log('Tables in database:');
|
||||
result.rows.forEach(r => console.log(' -', r.table_name));
|
||||
|
||||
// Check stores table structure
|
||||
console.log('\nStores table columns:');
|
||||
const storesColumns = await pool.query(`
|
||||
SELECT column_name, data_type
|
||||
FROM information_schema.columns
|
||||
WHERE table_name = 'stores'
|
||||
`);
|
||||
storesColumns.rows.forEach(r => console.log(` - ${r.column_name}: ${r.data_type}`));
|
||||
|
||||
} catch (error: any) {
|
||||
console.error('Error:', error.message);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
checkTables();
|
||||
105
backend/archive/check-treez-categories.ts
Normal file
105
backend/archive/check-treez-categories.ts
Normal file
@@ -0,0 +1,105 @@
|
||||
import { chromium } from 'playwright';
|
||||
|
||||
const GOOGLE_UA = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36';
|
||||
|
||||
async function main() {
|
||||
const browser = await chromium.launch({ headless: true });
|
||||
const context = await browser.newContext({
|
||||
userAgent: GOOGLE_UA
|
||||
});
|
||||
|
||||
const page = await context.newPage();
|
||||
|
||||
try {
|
||||
console.log('Loading menu page...');
|
||||
await page.goto('https://best.treez.io/onlinemenu/?customerType=ADULT', {
|
||||
waitUntil: 'networkidle',
|
||||
timeout: 30000
|
||||
});
|
||||
|
||||
await page.waitForTimeout(3000);
|
||||
|
||||
// Look for category navigation elements
|
||||
console.log('\n=== Checking for category filters/tabs ===\n');
|
||||
|
||||
// Check for common category selectors
|
||||
const categorySelectors = [
|
||||
'nav a',
|
||||
'nav button',
|
||||
'[role="tab"]',
|
||||
'[class*="category"]',
|
||||
'[class*="filter"]',
|
||||
'[class*="nav"]',
|
||||
'.menu-category',
|
||||
'.category-filter',
|
||||
'.product-category'
|
||||
];
|
||||
|
||||
for (const selector of categorySelectors) {
|
||||
const elements = await page.locator(selector).all();
|
||||
if (elements.length > 0) {
|
||||
console.log(`\nFound ${elements.length} elements matching "${selector}":`);
|
||||
for (let i = 0; i < Math.min(10, elements.length); i++) {
|
||||
const text = await elements[i].textContent();
|
||||
const href = await elements[i].getAttribute('href');
|
||||
const className = await elements[i].getAttribute('class');
|
||||
console.log(` ${i + 1}. Text: "${text?.trim()}" | Class: "${className}" | Href: "${href}"`);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Check the main navigation
|
||||
console.log('\n=== Main Navigation Structure ===\n');
|
||||
const navElements = await page.locator('nav, [role="navigation"]').all();
|
||||
console.log(`Found ${navElements.length} navigation elements`);
|
||||
|
||||
for (let i = 0; i < navElements.length; i++) {
|
||||
const navHtml = await navElements[i].innerHTML();
|
||||
console.log(`\nNavigation ${i + 1}:`);
|
||||
console.log(navHtml.substring(0, 500)); // First 500 chars
|
||||
console.log('...');
|
||||
}
|
||||
|
||||
// Check for dropdowns or select elements
|
||||
console.log('\n=== Checking for dropdowns ===\n');
|
||||
const selects = await page.locator('select').all();
|
||||
console.log(`Found ${selects.length} select elements`);
|
||||
|
||||
for (let i = 0; i < selects.length; i++) {
|
||||
const options = await selects[i].locator('option').all();
|
||||
console.log(`\nSelect ${i + 1} has ${options.length} options:`);
|
||||
for (let j = 0; j < Math.min(10, options.length); j++) {
|
||||
const text = await options[j].textContent();
|
||||
const value = await options[j].getAttribute('value');
|
||||
console.log(` - "${text}" (value: ${value})`);
|
||||
}
|
||||
}
|
||||
|
||||
// Look for any clickable category buttons
|
||||
console.log('\n=== Checking for category buttons ===\n');
|
||||
const buttons = await page.locator('button').all();
|
||||
console.log(`Found ${buttons.length} total buttons`);
|
||||
|
||||
const categoryButtons = [];
|
||||
for (const button of buttons) {
|
||||
const text = await button.textContent();
|
||||
const className = await button.getAttribute('class');
|
||||
if (text && (text.includes('Flower') || text.includes('Edible') || text.includes('Vape') ||
|
||||
text.includes('Concentrate') || text.includes('Pre-Roll') || text.includes('All'))) {
|
||||
categoryButtons.push({ text: text.trim(), class: className });
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`Found ${categoryButtons.length} potential category buttons:`);
|
||||
categoryButtons.forEach((btn, i) => {
|
||||
console.log(` ${i + 1}. "${btn.text}" (class: ${btn.class})`);
|
||||
});
|
||||
|
||||
} catch (error: any) {
|
||||
console.error('Error:', error.message);
|
||||
} finally {
|
||||
await browser.close();
|
||||
}
|
||||
}
|
||||
|
||||
main().catch(console.error);
|
||||
52
backend/archive/check-treez-data.ts
Normal file
52
backend/archive/check-treez-data.ts
Normal file
@@ -0,0 +1,52 @@
|
||||
import { pool } from './src/db/migrate.js';
|
||||
|
||||
async function main() {
|
||||
try {
|
||||
// Count total products and unique brands
|
||||
const stats = await pool.query(`
|
||||
SELECT
|
||||
COUNT(*) as total_products,
|
||||
COUNT(DISTINCT brand) as unique_brands
|
||||
FROM products
|
||||
WHERE dispensary_id = 149
|
||||
`);
|
||||
|
||||
console.log('Stats:', stats.rows[0]);
|
||||
|
||||
// Get sample products to verify brand extraction
|
||||
const samples = await pool.query(`
|
||||
SELECT brand, name, variant, dutchie_url
|
||||
FROM products
|
||||
WHERE dispensary_id = 149
|
||||
ORDER BY RANDOM()
|
||||
LIMIT 10
|
||||
`);
|
||||
|
||||
console.log('\nSample products:');
|
||||
samples.rows.forEach(row => {
|
||||
console.log(`Brand: "${row.brand}" | Name: "${row.name}" | Variant: "${row.variant}"`);
|
||||
});
|
||||
|
||||
// Get brand distribution
|
||||
const brands = await pool.query(`
|
||||
SELECT brand, COUNT(*) as count
|
||||
FROM products
|
||||
WHERE dispensary_id = 149
|
||||
GROUP BY brand
|
||||
ORDER BY count DESC
|
||||
LIMIT 15
|
||||
`);
|
||||
|
||||
console.log('\nTop brands:');
|
||||
brands.rows.forEach(row => {
|
||||
console.log(`${row.brand}: ${row.count} products`);
|
||||
});
|
||||
|
||||
} catch (error: any) {
|
||||
console.error('Error:', error.message);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
main().catch(console.error);
|
||||
70
backend/archive/check-treez-pagination.ts
Normal file
70
backend/archive/check-treez-pagination.ts
Normal file
@@ -0,0 +1,70 @@
|
||||
import { chromium } from 'playwright';
|
||||
|
||||
const GOOGLE_UA = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36';
|
||||
|
||||
async function main() {
|
||||
console.log('Checking BEST Dispensary Treez menu for pagination...');
|
||||
|
||||
const browser = await chromium.launch({ headless: false });
|
||||
const context = await browser.newContext({ userAgent: GOOGLE_UA });
|
||||
const page = await context.newPage();
|
||||
|
||||
try {
|
||||
console.log('Loading menu page...');
|
||||
await page.goto('https://best.treez.io/onlinemenu/?customerType=ADULT', {
|
||||
waitUntil: 'networkidle',
|
||||
timeout: 30000
|
||||
});
|
||||
|
||||
await page.waitForTimeout(3000);
|
||||
|
||||
// Check initial count
|
||||
const initialItems = await page.locator('.menu-item').all();
|
||||
console.log(`Initial menu items found: ${initialItems.length}`);
|
||||
|
||||
// Check for pagination controls
|
||||
const paginationButtons = await page.locator('button:has-text("Next"), button:has-text("Load More"), .pagination, [class*="page"], [class*="Pagination"]').all();
|
||||
console.log(`Pagination controls found: ${paginationButtons.length}`);
|
||||
|
||||
// Check page height and scroll
|
||||
const scrollHeight = await page.evaluate(() => document.body.scrollHeight);
|
||||
console.log(`Page scroll height: ${scrollHeight}px`);
|
||||
|
||||
// Try scrolling to bottom
|
||||
console.log('Scrolling to bottom...');
|
||||
await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
|
||||
await page.waitForTimeout(2000);
|
||||
|
||||
// Check if more items loaded after scroll
|
||||
const afterScrollItems = await page.locator('.menu-item').all();
|
||||
console.log(`Menu items after scroll: ${afterScrollItems.length}`);
|
||||
|
||||
// Check for categories/filters
|
||||
const categories = await page.locator('[class*="category"], [class*="filter"], nav a, .nav-link').all();
|
||||
console.log(`Category/filter links found: ${categories.length}`);
|
||||
|
||||
if (categories.length > 0) {
|
||||
console.log('\nCategory links:');
|
||||
for (let i = 0; i < Math.min(categories.length, 10); i++) {
|
||||
const text = await categories[i].textContent();
|
||||
const href = await categories[i].getAttribute('href');
|
||||
console.log(` - ${text?.trim()} (${href})`);
|
||||
}
|
||||
}
|
||||
|
||||
// Take a screenshot
|
||||
await page.screenshot({ path: '/tmp/treez-menu-check.png', fullPage: true });
|
||||
console.log('\nScreenshot saved to /tmp/treez-menu-check.png');
|
||||
|
||||
// Get page HTML to analyze structure
|
||||
const html = await page.content();
|
||||
console.log(`\nPage HTML length: ${html.length} characters`);
|
||||
|
||||
} catch (error: any) {
|
||||
console.error('Error:', error.message);
|
||||
} finally {
|
||||
await browser.close();
|
||||
}
|
||||
}
|
||||
|
||||
main().catch(console.error);
|
||||
215
backend/archive/complete-schema-migration.ts
Normal file
215
backend/archive/complete-schema-migration.ts
Normal file
@@ -0,0 +1,215 @@
|
||||
import { pool } from './src/db/migrate';
|
||||
|
||||
async function migrateCompleteSchema() {
|
||||
console.log('🔧 Migrating to complete normalized schema...\n');
|
||||
|
||||
const client = await pool.connect();
|
||||
|
||||
try {
|
||||
await client.query('BEGIN');
|
||||
|
||||
// Step 1: Add brand_id to products table
|
||||
console.log('1. Adding brand_id column to products...');
|
||||
await client.query(`
|
||||
ALTER TABLE products
|
||||
ADD COLUMN IF NOT EXISTS brand_id INTEGER REFERENCES brands(id) ON DELETE SET NULL
|
||||
`);
|
||||
|
||||
// Step 2: Ensure brands.name has UNIQUE constraint
|
||||
console.log('2. Ensuring brands.name has UNIQUE constraint...');
|
||||
await client.query(`
|
||||
ALTER TABLE brands DROP CONSTRAINT IF EXISTS brands_name_key;
|
||||
ALTER TABLE brands ADD CONSTRAINT brands_name_key UNIQUE (name);
|
||||
`);
|
||||
|
||||
// Step 9: Migrate existing brand text to brands table and update FKs
|
||||
console.log('3. Migrating existing brand data...');
|
||||
await client.query(`
|
||||
-- Insert unique brands from products into brands table
|
||||
INSERT INTO brands (name)
|
||||
SELECT DISTINCT brand
|
||||
FROM products
|
||||
WHERE brand IS NOT NULL AND brand != ''
|
||||
ON CONFLICT (name) DO NOTHING
|
||||
`);
|
||||
|
||||
// Update products to use brand_id
|
||||
await client.query(`
|
||||
UPDATE products p
|
||||
SET brand_id = b.id
|
||||
FROM brands b
|
||||
WHERE p.brand = b.name
|
||||
AND p.brand IS NOT NULL
|
||||
AND p.brand != ''
|
||||
`);
|
||||
|
||||
// Step 9: Create product_brands junction table for historical tracking
|
||||
console.log('3. Creating product_brands tracking table...');
|
||||
await client.query(`
|
||||
CREATE TABLE IF NOT EXISTS product_brands (
|
||||
id SERIAL PRIMARY KEY,
|
||||
product_id INTEGER REFERENCES products(id) ON DELETE CASCADE,
|
||||
brand_id INTEGER REFERENCES brands(id) ON DELETE CASCADE,
|
||||
first_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
last_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
UNIQUE(product_id, brand_id)
|
||||
)
|
||||
`);
|
||||
|
||||
// Populate product_brands from current data
|
||||
await client.query(`
|
||||
INSERT INTO product_brands (product_id, brand_id)
|
||||
SELECT id, brand_id
|
||||
FROM products
|
||||
WHERE brand_id IS NOT NULL
|
||||
ON CONFLICT (product_id, brand_id) DO NOTHING
|
||||
`);
|
||||
|
||||
// Step 9: Add store contact information and address fields
|
||||
console.log('4. Adding store contact info and address fields...');
|
||||
await client.query(`
|
||||
ALTER TABLE stores
|
||||
ADD COLUMN IF NOT EXISTS address TEXT,
|
||||
ADD COLUMN IF NOT EXISTS city VARCHAR(255),
|
||||
ADD COLUMN IF NOT EXISTS state VARCHAR(50),
|
||||
ADD COLUMN IF NOT EXISTS zip VARCHAR(20),
|
||||
ADD COLUMN IF NOT EXISTS phone VARCHAR(50),
|
||||
ADD COLUMN IF NOT EXISTS website TEXT,
|
||||
ADD COLUMN IF NOT EXISTS email VARCHAR(255)
|
||||
`);
|
||||
|
||||
// Step 9: Add product discount tracking
|
||||
console.log('5. Adding product discount fields...');
|
||||
await client.query(`
|
||||
ALTER TABLE products
|
||||
ADD COLUMN IF NOT EXISTS discount_percentage DECIMAL(5, 2),
|
||||
ADD COLUMN IF NOT EXISTS discount_amount DECIMAL(10, 2),
|
||||
ADD COLUMN IF NOT EXISTS sale_price DECIMAL(10, 2)
|
||||
`);
|
||||
|
||||
// Step 9: Add missing timestamp columns
|
||||
console.log('6. Ensuring all timestamp columns exist...');
|
||||
|
||||
// Products timestamps
|
||||
await client.query(`
|
||||
ALTER TABLE products
|
||||
ADD COLUMN IF NOT EXISTS first_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
ADD COLUMN IF NOT EXISTS last_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||||
`);
|
||||
|
||||
// Categories timestamps
|
||||
await client.query(`
|
||||
ALTER TABLE categories
|
||||
ADD COLUMN IF NOT EXISTS first_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
ADD COLUMN IF NOT EXISTS last_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||||
`);
|
||||
|
||||
// Step 9: Add indexes for reporting queries
|
||||
console.log('7. Creating indexes for fast reporting...');
|
||||
|
||||
await client.query(`
|
||||
-- Brand exposure queries (which stores carry which brands)
|
||||
CREATE INDEX IF NOT EXISTS idx_store_brands_brand_active ON store_brands(brand_id, active);
|
||||
CREATE INDEX IF NOT EXISTS idx_store_brands_store_active ON store_brands(store_id, active);
|
||||
CREATE INDEX IF NOT EXISTS idx_store_brands_dates ON store_brands(first_seen_at, last_seen_at);
|
||||
|
||||
-- Product queries by store and brand
|
||||
CREATE INDEX IF NOT EXISTS idx_products_store_brand ON products(store_id, brand_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_products_brand_stock ON products(brand_id, in_stock);
|
||||
CREATE INDEX IF NOT EXISTS idx_products_dates ON products(first_seen_at, last_seen_at);
|
||||
|
||||
-- Category queries
|
||||
CREATE INDEX IF NOT EXISTS idx_categories_store ON categories(store_id, scrape_enabled);
|
||||
|
||||
-- Specials queries
|
||||
CREATE INDEX IF NOT EXISTS idx_products_specials ON products(store_id, is_special) WHERE is_special = true;
|
||||
`);
|
||||
|
||||
// Step 9: Create helper views for common queries
|
||||
console.log('8. Creating reporting views...');
|
||||
|
||||
// Brand exposure view
|
||||
await client.query(`
|
||||
CREATE OR REPLACE VIEW brand_exposure AS
|
||||
SELECT
|
||||
b.id as brand_id,
|
||||
b.name as brand_name,
|
||||
COUNT(DISTINCT sb.store_id) as store_count,
|
||||
COUNT(DISTINCT CASE WHEN sb.active THEN sb.store_id END) as active_store_count,
|
||||
MIN(sb.first_seen_at) as first_seen,
|
||||
MAX(sb.last_seen_at) as last_seen
|
||||
FROM brands b
|
||||
LEFT JOIN store_brands sb ON b.id = sb.brand_id
|
||||
GROUP BY b.id, b.name
|
||||
ORDER BY active_store_count DESC, brand_name
|
||||
`);
|
||||
|
||||
// Brand timeline view (track adds/drops)
|
||||
await client.query(`
|
||||
CREATE OR REPLACE VIEW brand_timeline AS
|
||||
SELECT
|
||||
sb.id,
|
||||
b.name as brand_name,
|
||||
s.name as store_name,
|
||||
sb.first_seen_at as added_on,
|
||||
CASE
|
||||
WHEN sb.active THEN NULL
|
||||
ELSE sb.last_seen_at
|
||||
END as dropped_on,
|
||||
sb.active as currently_active
|
||||
FROM store_brands sb
|
||||
JOIN brands b ON sb.brand_id = b.id
|
||||
JOIN stores s ON sb.store_id = s.id
|
||||
ORDER BY sb.first_seen_at DESC
|
||||
`);
|
||||
|
||||
// Product inventory view
|
||||
await client.query(`
|
||||
CREATE OR REPLACE VIEW product_inventory AS
|
||||
SELECT
|
||||
p.id,
|
||||
p.name as product_name,
|
||||
b.name as brand_name,
|
||||
s.name as store_name,
|
||||
c.name as category_name,
|
||||
p.price,
|
||||
p.in_stock,
|
||||
p.is_special,
|
||||
p.first_seen_at,
|
||||
p.last_seen_at
|
||||
FROM products p
|
||||
JOIN stores s ON p.store_id = s.id
|
||||
LEFT JOIN brands b ON p.brand_id = b.id
|
||||
LEFT JOIN categories c ON p.category_id = c.id
|
||||
ORDER BY p.last_seen_at DESC
|
||||
`);
|
||||
|
||||
await client.query('COMMIT');
|
||||
|
||||
console.log('\n✅ Schema migration complete!');
|
||||
console.log('\n📊 Available reporting views:');
|
||||
console.log(' - brand_exposure: See how many stores carry each brand');
|
||||
console.log(' - brand_timeline: Track when brands were added/dropped');
|
||||
console.log(' - product_inventory: Full product catalog with store/brand info');
|
||||
|
||||
console.log('\n💡 Example queries:');
|
||||
console.log(' -- Brands by exposure:');
|
||||
console.log(' SELECT * FROM brand_exposure ORDER BY active_store_count DESC;');
|
||||
console.log(' ');
|
||||
console.log(' -- Recently dropped brands:');
|
||||
console.log(' SELECT * FROM brand_timeline WHERE dropped_on IS NOT NULL ORDER BY dropped_on DESC;');
|
||||
console.log(' ');
|
||||
console.log(' -- Products by brand:');
|
||||
console.log(' SELECT * FROM product_inventory WHERE brand_name = \'Sol Flower\';');
|
||||
|
||||
} catch (error) {
|
||||
await client.query('ROLLBACK');
|
||||
console.error('❌ Migration failed:', error);
|
||||
throw error;
|
||||
} finally {
|
||||
client.release();
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
migrateCompleteSchema();
|
||||
13
backend/archive/count-products.ts
Normal file
13
backend/archive/count-products.ts
Normal file
@@ -0,0 +1,13 @@
|
||||
import { pool } from './src/db/migrate.js';
|
||||
|
||||
async function countProducts() {
|
||||
const result = await pool.query(
|
||||
`SELECT COUNT(*) as total FROM products WHERE dispensary_id = 112`
|
||||
);
|
||||
|
||||
console.log(`Total products for Deeply Rooted: ${result.rows[0].total}`);
|
||||
|
||||
await pool.end();
|
||||
}
|
||||
|
||||
countProducts();
|
||||
32
backend/archive/create-azdhs-table.ts
Normal file
32
backend/archive/create-azdhs-table.ts
Normal file
@@ -0,0 +1,32 @@
|
||||
import { pool } from './src/db/migrate';
|
||||
|
||||
async function createAZDHSTable() {
|
||||
console.log('🗄️ Creating azdhs_list table...\n');
|
||||
|
||||
await pool.query(`
|
||||
CREATE TABLE IF NOT EXISTS azdhs_list (
|
||||
id SERIAL PRIMARY KEY,
|
||||
name VARCHAR(255) NOT NULL,
|
||||
company_name VARCHAR(255),
|
||||
slug VARCHAR(255),
|
||||
address VARCHAR(500),
|
||||
city VARCHAR(100),
|
||||
state VARCHAR(2) DEFAULT 'AZ',
|
||||
zip VARCHAR(10),
|
||||
phone VARCHAR(20),
|
||||
email VARCHAR(255),
|
||||
status_line TEXT,
|
||||
azdhs_url TEXT,
|
||||
latitude DECIMAL(10, 8),
|
||||
longitude DECIMAL(11, 8),
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
`);
|
||||
|
||||
console.log('✅ Table created successfully!');
|
||||
|
||||
await pool.end();
|
||||
}
|
||||
|
||||
createAZDHSTable();
|
||||
81
backend/archive/create-brands-table.ts
Normal file
81
backend/archive/create-brands-table.ts
Normal file
@@ -0,0 +1,81 @@
|
||||
import { Pool } from 'pg';
|
||||
|
||||
const pool = new Pool({
|
||||
connectionString: 'postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus'
|
||||
});
|
||||
|
||||
async function createBrandsTable() {
|
||||
try {
|
||||
console.log('Creating brands table...');
|
||||
|
||||
await pool.query(`
|
||||
CREATE TABLE IF NOT EXISTS brands (
|
||||
id SERIAL PRIMARY KEY,
|
||||
store_id INTEGER NOT NULL REFERENCES stores(id) ON DELETE CASCADE,
|
||||
name VARCHAR(255) NOT NULL,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||||
)
|
||||
`);
|
||||
|
||||
console.log('✅ Brands table created');
|
||||
|
||||
// Create index for faster queries
|
||||
await pool.query(`
|
||||
CREATE INDEX IF NOT EXISTS idx_brands_store_id ON brands(store_id)
|
||||
`);
|
||||
|
||||
console.log('✅ Index created on store_id');
|
||||
|
||||
// Create unique constraint
|
||||
await pool.query(`
|
||||
CREATE UNIQUE INDEX IF NOT EXISTS idx_brands_store_name ON brands(store_id, name)
|
||||
`);
|
||||
|
||||
console.log('✅ Unique constraint created on (store_id, name)');
|
||||
|
||||
console.log('\nCreating specials table...');
|
||||
|
||||
await pool.query(`
|
||||
CREATE TABLE IF NOT EXISTS specials (
|
||||
id SERIAL PRIMARY KEY,
|
||||
store_id INTEGER NOT NULL REFERENCES stores(id) ON DELETE CASCADE,
|
||||
product_id INTEGER REFERENCES products(id) ON DELETE CASCADE,
|
||||
name VARCHAR(255) NOT NULL,
|
||||
description TEXT,
|
||||
discount_amount NUMERIC(10, 2),
|
||||
discount_percentage NUMERIC(5, 2),
|
||||
special_price NUMERIC(10, 2),
|
||||
original_price NUMERIC(10, 2),
|
||||
valid_date DATE NOT NULL,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||||
)
|
||||
`);
|
||||
|
||||
console.log('✅ Specials table created');
|
||||
|
||||
// Create composite index for fast date-based queries
|
||||
await pool.query(`
|
||||
CREATE INDEX IF NOT EXISTS idx_specials_store_date ON specials(store_id, valid_date DESC)
|
||||
`);
|
||||
|
||||
console.log('✅ Index created on (store_id, valid_date)');
|
||||
|
||||
// Create index on product_id for joins
|
||||
await pool.query(`
|
||||
CREATE INDEX IF NOT EXISTS idx_specials_product_id ON specials(product_id)
|
||||
`);
|
||||
|
||||
console.log('✅ Index created on product_id');
|
||||
|
||||
console.log('\n🎉 All tables and indexes created successfully!');
|
||||
|
||||
} catch (error: any) {
|
||||
console.error('❌ Error:', error.message);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
createBrandsTable();
|
||||
61
backend/archive/create-brands-tables.ts
Normal file
61
backend/archive/create-brands-tables.ts
Normal file
@@ -0,0 +1,61 @@
|
||||
import { pool } from './src/db/migrate';
|
||||
|
||||
async function createBrandsTables() {
|
||||
console.log('📦 Creating brands tracking tables...\n');
|
||||
|
||||
try {
|
||||
// Brands table - stores unique brands across all stores
|
||||
await pool.query(`
|
||||
CREATE TABLE IF NOT EXISTS brands (
|
||||
id SERIAL PRIMARY KEY,
|
||||
name VARCHAR(255) UNIQUE NOT NULL,
|
||||
logo_url TEXT,
|
||||
first_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
last_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||||
)
|
||||
`);
|
||||
console.log('✅ Created brands table');
|
||||
|
||||
// Store-Brand relationship - tracks which brands are at which stores
|
||||
await pool.query(`
|
||||
CREATE TABLE IF NOT EXISTS store_brands (
|
||||
id SERIAL PRIMARY KEY,
|
||||
store_id INTEGER REFERENCES stores(id) ON DELETE CASCADE,
|
||||
brand_id INTEGER REFERENCES brands(id) ON DELETE CASCADE,
|
||||
first_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
last_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
active BOOLEAN DEFAULT true,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
UNIQUE(store_id, brand_id)
|
||||
)
|
||||
`);
|
||||
console.log('✅ Created store_brands table');
|
||||
|
||||
// Add indexes for performance
|
||||
await pool.query(`
|
||||
CREATE INDEX IF NOT EXISTS idx_store_brands_store_id ON store_brands(store_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_store_brands_brand_id ON store_brands(brand_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_store_brands_active ON store_brands(active);
|
||||
CREATE INDEX IF NOT EXISTS idx_brands_name ON brands(name);
|
||||
`);
|
||||
console.log('✅ Created indexes');
|
||||
|
||||
console.log('\n✅ Brands tables created successfully!');
|
||||
console.log('\nTable structure:');
|
||||
console.log(' brands: Stores unique brand names and logos');
|
||||
console.log(' store_brands: Tracks which brands are at which stores with timestamps');
|
||||
console.log('\nReports you can run:');
|
||||
console.log(' - Brand exposure: How many stores carry each brand');
|
||||
console.log(' - Brand timeline: When brands were added/removed from stores');
|
||||
console.log(' - Store changes: Which brands were added/dropped at a store');
|
||||
} catch (error) {
|
||||
console.error('❌ Error:', error);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
createBrandsTables();
|
||||
38
backend/archive/create-table.js
Normal file
38
backend/archive/create-table.js
Normal file
@@ -0,0 +1,38 @@
|
||||
const { Pool } = require('pg');
|
||||
|
||||
const pool = new Pool({
|
||||
connectionString: process.env.DATABASE_URL
|
||||
});
|
||||
|
||||
(async () => {
|
||||
try {
|
||||
await pool.query(`
|
||||
CREATE TABLE IF NOT EXISTS proxy_test_jobs (
|
||||
id SERIAL PRIMARY KEY,
|
||||
status VARCHAR(20) NOT NULL DEFAULT 'pending',
|
||||
total_proxies INTEGER NOT NULL DEFAULT 0,
|
||||
tested_proxies INTEGER NOT NULL DEFAULT 0,
|
||||
passed_proxies INTEGER NOT NULL DEFAULT 0,
|
||||
failed_proxies INTEGER NOT NULL DEFAULT 0,
|
||||
started_at TIMESTAMP,
|
||||
completed_at TIMESTAMP,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
`);
|
||||
|
||||
await pool.query(`
|
||||
CREATE INDEX IF NOT EXISTS idx_proxy_test_jobs_status ON proxy_test_jobs(status);
|
||||
`);
|
||||
|
||||
await pool.query(`
|
||||
CREATE INDEX IF NOT EXISTS idx_proxy_test_jobs_created_at ON proxy_test_jobs(created_at DESC);
|
||||
`);
|
||||
|
||||
console.log('✅ Table created successfully');
|
||||
process.exit(0);
|
||||
} catch (error) {
|
||||
console.error('❌ Error:', error);
|
||||
process.exit(1);
|
||||
}
|
||||
})();
|
||||
22
backend/archive/curaleaf-cookies.json
Normal file
22
backend/archive/curaleaf-cookies.json
Normal file
@@ -0,0 +1,22 @@
|
||||
[
|
||||
{
|
||||
"name": "age_gate_passed",
|
||||
"value": "true",
|
||||
"domain": ".curaleaf.com",
|
||||
"path": "/",
|
||||
"expires": 9999999999,
|
||||
"httpOnly": false,
|
||||
"secure": false,
|
||||
"sameSite": "Lax"
|
||||
},
|
||||
{
|
||||
"name": "selected_state",
|
||||
"value": "Arizona",
|
||||
"domain": ".curaleaf.com",
|
||||
"path": "/",
|
||||
"expires": 9999999999,
|
||||
"httpOnly": false,
|
||||
"secure": false,
|
||||
"sameSite": "Lax"
|
||||
}
|
||||
]
|
||||
83
backend/archive/debug-after-state-select.ts
Normal file
83
backend/archive/debug-after-state-select.ts
Normal file
@@ -0,0 +1,83 @@
|
||||
import puppeteer from 'puppeteer-extra';
|
||||
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
|
||||
import { Browser, Page } from 'puppeteer';
|
||||
|
||||
puppeteer.use(StealthPlugin());
|
||||
|
||||
async function debugAfterStateSelect() {
|
||||
let browser: Browser | null = null;
|
||||
|
||||
try {
|
||||
const url = 'https://curaleaf.com/stores/curaleaf-az-48th-street';
|
||||
|
||||
browser = await puppeteer.launch({
|
||||
headless: 'new',
|
||||
args: [
|
||||
'--no-sandbox',
|
||||
'--disable-setuid-sandbox',
|
||||
'--disable-dev-shm-usage',
|
||||
'--disable-blink-features=AutomationControlled'
|
||||
]
|
||||
});
|
||||
|
||||
const page = await browser.newPage();
|
||||
await page.setViewport({ width: 1920, height: 1080 });
|
||||
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36');
|
||||
|
||||
console.log('Loading page...');
|
||||
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 60000 });
|
||||
await page.waitForTimeout(5000);
|
||||
|
||||
// Click dropdown and select Arizona
|
||||
const stateButton = await page.$('button#state');
|
||||
if (stateButton) {
|
||||
console.log('Clicking state button...');
|
||||
await stateButton.click();
|
||||
await page.waitForTimeout(800);
|
||||
|
||||
console.log('Clicking Arizona...');
|
||||
await page.evaluate(() => {
|
||||
const options = Array.from(document.querySelectorAll('[role="option"]'));
|
||||
const arizona = options.find(el => el.textContent?.toLowerCase() === 'arizona');
|
||||
if (arizona instanceof HTMLElement) {
|
||||
arizona.click();
|
||||
}
|
||||
});
|
||||
|
||||
await page.waitForTimeout(1000);
|
||||
|
||||
console.log('\n=== AFTER selecting Arizona ===');
|
||||
|
||||
// Check what buttons are now visible
|
||||
const elementsAfter = await page.evaluate(() => {
|
||||
return {
|
||||
buttons: Array.from(document.querySelectorAll('button')).map(b => ({
|
||||
text: b.textContent?.trim(),
|
||||
classes: b.className,
|
||||
id: b.id,
|
||||
visible: b.offsetParent !== null
|
||||
})),
|
||||
links: Array.from(document.querySelectorAll('a')).filter(a => a.offsetParent !== null).map(a => ({
|
||||
text: a.textContent?.trim(),
|
||||
href: a.href
|
||||
})),
|
||||
hasAgeQuestion: document.body.textContent?.includes('21') || document.body.textContent?.includes('age')
|
||||
};
|
||||
});
|
||||
|
||||
console.log('\nVisible buttons:', JSON.stringify(elementsAfter.buttons.filter(b => b.visible), null, 2));
|
||||
console.log('\nVisible links:', JSON.stringify(elementsAfter.links, null, 2));
|
||||
console.log('\nHas age question:', elementsAfter.hasAgeQuestion);
|
||||
}
|
||||
|
||||
await browser.close();
|
||||
process.exit(0);
|
||||
|
||||
} catch (error) {
|
||||
console.error('Error:', error);
|
||||
if (browser) await browser.close();
|
||||
process.exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
debugAfterStateSelect();
|
||||
96
backend/archive/debug-age-gate-detailed.ts
Normal file
96
backend/archive/debug-age-gate-detailed.ts
Normal file
@@ -0,0 +1,96 @@
|
||||
import puppeteer from 'puppeteer-extra';
|
||||
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
|
||||
import { Browser, Page } from 'puppeteer';
|
||||
|
||||
puppeteer.use(StealthPlugin());
|
||||
|
||||
async function debugDetailedAgeGate() {
|
||||
let browser: Browser | null = null;
|
||||
|
||||
try {
|
||||
const url = 'https://curaleaf.com/stores/curaleaf-az-48th-street';
|
||||
|
||||
browser = await puppeteer.launch({
|
||||
headless: false, // Run with visible browser
|
||||
args: [
|
||||
'--no-sandbox',
|
||||
'--disable-setuid-sandbox',
|
||||
'--disable-dev-shm-usage',
|
||||
'--disable-blink-features=AutomationControlled'
|
||||
]
|
||||
});
|
||||
|
||||
const page = await browser.newPage();
|
||||
await page.setViewport({ width: 1920, height: 1080 });
|
||||
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36');
|
||||
|
||||
console.log('Loading page...');
|
||||
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 60000 });
|
||||
await page.waitForTimeout(5000);
|
||||
|
||||
console.log(`\nCurrent URL: ${page.url()}`);
|
||||
|
||||
// Check for dropdown button
|
||||
console.log('\nLooking for dropdown button #state...');
|
||||
const stateButton = await page.$('button#state');
|
||||
console.log('State button found:', !!stateButton);
|
||||
|
||||
if (stateButton) {
|
||||
console.log('Clicking state button...');
|
||||
await stateButton.click();
|
||||
await page.waitForTimeout(1000);
|
||||
|
||||
console.log('\nLooking for dropdown options after clicking...');
|
||||
const options = await page.evaluate(() => {
|
||||
// Look for any elements that appeared after clicking
|
||||
const allElements = Array.from(document.querySelectorAll('[role="option"], [class*="option"], [class*="Option"], li'));
|
||||
return allElements.slice(0, 20).map(el => ({
|
||||
text: el.textContent?.trim(),
|
||||
tag: el.tagName,
|
||||
role: el.getAttribute('role'),
|
||||
classes: el.className
|
||||
}));
|
||||
});
|
||||
|
||||
console.log('Found options:', JSON.stringify(options, null, 2));
|
||||
|
||||
// Try to click Arizona
|
||||
console.log('\nTrying to click Arizona option...');
|
||||
const clicked = await page.evaluate(() => {
|
||||
const allElements = Array.from(document.querySelectorAll('[role="option"], [class*="option"], [class*="Option"], li, div, span'));
|
||||
const arizonaEl = allElements.find(el => el.textContent?.toLowerCase().includes('arizona'));
|
||||
if (arizonaEl instanceof HTMLElement) {
|
||||
console.log('Found Arizona element:', arizonaEl.textContent);
|
||||
arizonaEl.click();
|
||||
return true;
|
||||
}
|
||||
return false;
|
||||
});
|
||||
|
||||
console.log('Arizona clicked:', clicked);
|
||||
|
||||
if (clicked) {
|
||||
console.log('Waiting for navigation...');
|
||||
try {
|
||||
await page.waitForNavigation({ timeout: 10000 });
|
||||
console.log('Navigation successful!');
|
||||
} catch (e) {
|
||||
console.log('Navigation timeout');
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`\nFinal URL: ${page.url()}`);
|
||||
console.log('\nPress Ctrl+C to close browser...');
|
||||
|
||||
// Keep browser open for inspection
|
||||
await new Promise(() => {});
|
||||
|
||||
} catch (error) {
|
||||
console.error('Error:', error);
|
||||
if (browser) await browser.close();
|
||||
process.exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
debugDetailedAgeGate();
|
||||
78
backend/archive/debug-age-gate-elements.ts
Normal file
78
backend/archive/debug-age-gate-elements.ts
Normal file
@@ -0,0 +1,78 @@
|
||||
import puppeteer from 'puppeteer-extra';
|
||||
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
|
||||
import { Browser, Page } from 'puppeteer';
|
||||
|
||||
puppeteer.use(StealthPlugin());
|
||||
|
||||
async function debugAgeGateElements() {
|
||||
let browser: Browser | null = null;
|
||||
|
||||
try {
|
||||
const url = 'https://curaleaf.com/stores/curaleaf-az-48th-street';
|
||||
|
||||
browser = await puppeteer.launch({
|
||||
headless: 'new',
|
||||
args: [
|
||||
'--no-sandbox',
|
||||
'--disable-setuid-sandbox',
|
||||
'--disable-dev-shm-usage',
|
||||
'--disable-blink-features=AutomationControlled'
|
||||
]
|
||||
});
|
||||
|
||||
const page = await browser.newPage();
|
||||
await page.setViewport({ width: 1920, height: 1080 });
|
||||
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36');
|
||||
|
||||
console.log('Loading page...');
|
||||
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 60000 });
|
||||
await page.waitForTimeout(5000); // Wait for React to render
|
||||
|
||||
const elements = await page.evaluate(() => {
|
||||
const allClickable = Array.from(document.querySelectorAll('button, a, div[role="button"], [onclick]'));
|
||||
|
||||
return {
|
||||
buttons: Array.from(document.querySelectorAll('button')).map(b => ({
|
||||
text: b.textContent?.trim(),
|
||||
classes: b.className,
|
||||
id: b.id
|
||||
})),
|
||||
links: Array.from(document.querySelectorAll('a')).map(a => ({
|
||||
text: a.textContent?.trim(),
|
||||
href: a.href,
|
||||
classes: a.className
|
||||
})),
|
||||
divs: Array.from(document.querySelectorAll('div[role="button"], div[onclick], [class*="card"], [class*="Card"], [class*="state"], [class*="State"]')).slice(0, 20).map(d => ({
|
||||
text: d.textContent?.trim().substring(0, 100),
|
||||
classes: d.className,
|
||||
role: d.getAttribute('role')
|
||||
}))
|
||||
};
|
||||
});
|
||||
|
||||
console.log('\n=== BUTTONS ===');
|
||||
elements.buttons.forEach((b, i) => {
|
||||
console.log(`${i + 1}. "${b.text}" [${b.classes}] #${b.id}`);
|
||||
});
|
||||
|
||||
console.log('\n=== LINKS ===');
|
||||
elements.links.slice(0, 10).forEach((a, i) => {
|
||||
console.log(`${i + 1}. "${a.text}" -> ${a.href}`);
|
||||
});
|
||||
|
||||
console.log('\n=== DIVS/CARDS ===');
|
||||
elements.divs.forEach((d, i) => {
|
||||
console.log(`${i + 1}. "${d.text}" [${d.classes}] role=${d.role}`);
|
||||
});
|
||||
|
||||
await browser.close();
|
||||
process.exit(0);
|
||||
|
||||
} catch (error) {
|
||||
console.error('Error:', error);
|
||||
if (browser) await browser.close();
|
||||
process.exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
debugAgeGateElements();
|
||||
88
backend/archive/debug-azdhs-page.ts
Normal file
88
backend/archive/debug-azdhs-page.ts
Normal file
@@ -0,0 +1,88 @@
|
||||
import { chromium } from 'playwright-extra';
|
||||
import stealth from 'puppeteer-extra-plugin-stealth';
|
||||
import { pool } from './src/db/migrate';
|
||||
|
||||
chromium.use(stealth());
|
||||
|
||||
async function debugAZDHSPage() {
|
||||
console.log('🔍 Debugging AZDHS page structure...\n');
|
||||
|
||||
const browser = await chromium.launch({
|
||||
headless: false,
|
||||
});
|
||||
|
||||
const context = await browser.newContext({
|
||||
viewport: { width: 1920, height: 1080 },
|
||||
});
|
||||
|
||||
const page = await context.newPage();
|
||||
|
||||
try {
|
||||
console.log('📄 Loading page...');
|
||||
await page.goto('https://azcarecheck.azdhs.gov/s/?facilityId=001t000000L0TApAAN', {
|
||||
waitUntil: 'domcontentloaded',
|
||||
timeout: 60000
|
||||
});
|
||||
|
||||
console.log('⏳ Waiting 30 seconds for you to scroll and load all dispensaries...\n');
|
||||
await page.waitForTimeout(30000);
|
||||
|
||||
console.log('🔍 Analyzing page structure...\n');
|
||||
|
||||
const debug = await page.evaluate(() => {
|
||||
// Get all unique tag names
|
||||
const allElements = document.querySelectorAll('*');
|
||||
const tagCounts: any = {};
|
||||
const classSamples: string[] = [];
|
||||
|
||||
allElements.forEach(el => {
|
||||
const tag = el.tagName.toLowerCase();
|
||||
tagCounts[tag] = (tagCounts[tag] || 0) + 1;
|
||||
|
||||
// Sample some classes
|
||||
if (el.className && typeof el.className === 'string' && el.className.length > 0 && classSamples.length < 50) {
|
||||
classSamples.push(el.className.substring(0, 80));
|
||||
}
|
||||
});
|
||||
|
||||
// Look for elements with text that might be dispensary names
|
||||
const textElements: any[] = [];
|
||||
allElements.forEach(el => {
|
||||
const text = el.textContent?.trim() || '';
|
||||
if (text.length > 10 && text.length < 200 && el.children.length < 5) {
|
||||
textElements.push({
|
||||
tag: el.tagName.toLowerCase(),
|
||||
class: el.className ? el.className.substring(0, 50) : '',
|
||||
text: text.substring(0, 100)
|
||||
});
|
||||
}
|
||||
});
|
||||
|
||||
return {
|
||||
totalElements: allElements.length,
|
||||
tagCounts: Object.entries(tagCounts).sort((a: any, b: any) => b[1] - a[1]).slice(0, 20),
|
||||
classSamples: classSamples.slice(0, 20),
|
||||
textElementsSample: textElements.slice(0, 10)
|
||||
};
|
||||
});
|
||||
|
||||
console.log('📊 Page Structure Analysis:');
|
||||
console.log(`\nTotal elements: ${debug.totalElements}`);
|
||||
console.log('\nTop 20 element types:');
|
||||
console.table(debug.tagCounts);
|
||||
console.log('\nSample classes:');
|
||||
debug.classSamples.forEach((c: string, i: number) => console.log(` ${i + 1}. ${c}`));
|
||||
console.log('\nSample text elements (potential dispensary names):');
|
||||
console.table(debug.textElementsSample);
|
||||
|
||||
} catch (error) {
|
||||
console.error(`❌ Error: ${error}`);
|
||||
} finally {
|
||||
console.log('\n👉 Browser will stay open for 30 seconds so you can inspect...');
|
||||
await page.waitForTimeout(30000);
|
||||
await browser.close();
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
debugAZDHSPage();
|
||||
79
backend/archive/debug-brands-page.ts
Normal file
79
backend/archive/debug-brands-page.ts
Normal file
@@ -0,0 +1,79 @@
|
||||
import { firefox } from 'playwright';
|
||||
import { pool } from './src/db/migrate.js';
|
||||
import { getRandomProxy } from './src/utils/proxyManager.js';
|
||||
|
||||
const dispensaryId = 112;
|
||||
|
||||
async function main() {
|
||||
const dispensaryResult = await pool.query(
|
||||
"SELECT id, name, menu_url FROM dispensaries WHERE id = $1",
|
||||
[dispensaryId]
|
||||
);
|
||||
|
||||
const menuUrl = dispensaryResult.rows[0].menu_url;
|
||||
const proxy = await getRandomProxy();
|
||||
|
||||
const browser = await firefox.launch({ headless: true });
|
||||
const context = await browser.newContext({
|
||||
viewport: { width: 1920, height: 1080 },
|
||||
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
|
||||
proxy: {
|
||||
server: proxy.server,
|
||||
username: proxy.username,
|
||||
password: proxy.password
|
||||
}
|
||||
});
|
||||
|
||||
const page = await context.newPage();
|
||||
const brandsUrl = `${menuUrl}/brands`;
|
||||
|
||||
console.log(`Loading: ${brandsUrl}`);
|
||||
await page.goto(brandsUrl, { waitUntil: 'domcontentloaded', timeout: 60000 });
|
||||
await page.waitForSelector('a[href*="/brands/"]', { timeout: 45000 });
|
||||
await page.waitForTimeout(3000);
|
||||
|
||||
// Get the HTML structure of the first 5 brand links
|
||||
const brandStructures = await page.evaluate(() => {
|
||||
const brandLinks = Array.from(document.querySelectorAll('a[href*="/brands/"]')).slice(0, 10);
|
||||
|
||||
return brandLinks.map(link => {
|
||||
const href = link.getAttribute('href') || '';
|
||||
const slug = href.split('/brands/')[1]?.replace(/\/$/, '') || '';
|
||||
|
||||
return {
|
||||
slug,
|
||||
innerHTML: (link as HTMLElement).innerHTML.substring(0, 300),
|
||||
textContent: link.textContent?.trim(),
|
||||
childElementCount: link.childElementCount,
|
||||
children: Array.from(link.children).map(child => ({
|
||||
tag: child.tagName.toLowerCase(),
|
||||
class: child.className,
|
||||
text: child.textContent?.trim()
|
||||
}))
|
||||
};
|
||||
});
|
||||
});
|
||||
|
||||
console.log('\n' + '='.repeat(80));
|
||||
console.log('BRAND LINK STRUCTURES:');
|
||||
console.log('='.repeat(80));
|
||||
|
||||
brandStructures.forEach((brand, idx) => {
|
||||
console.log(`\n${idx + 1}. slug: ${brand.slug}`);
|
||||
console.log(` textContent: "${brand.textContent}"`);
|
||||
console.log(` childElementCount: ${brand.childElementCount}`);
|
||||
console.log(` children:`);
|
||||
brand.children.forEach((child, childIdx) => {
|
||||
console.log(` ${childIdx + 1}. <${child.tag}> class="${child.class}"`);
|
||||
console.log(` text: "${child.text}"`);
|
||||
});
|
||||
console.log(` innerHTML: ${brand.innerHTML.substring(0, 200)}`);
|
||||
});
|
||||
|
||||
console.log('\n' + '='.repeat(80));
|
||||
|
||||
await browser.close();
|
||||
await pool.end();
|
||||
}
|
||||
|
||||
main().catch(console.error);
|
||||
77
backend/archive/debug-curaleaf-buttons.ts
Normal file
77
backend/archive/debug-curaleaf-buttons.ts
Normal file
@@ -0,0 +1,77 @@
|
||||
import { chromium } from 'playwright';
|
||||
|
||||
async function debugCuraleafButtons() {
|
||||
const browser = await chromium.launch({ headless: true });
|
||||
const context = await browser.newContext({
|
||||
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
|
||||
viewport: { width: 1280, height: 720 }
|
||||
});
|
||||
|
||||
const page = await context.newPage();
|
||||
|
||||
await page.goto('https://curaleaf.com/stores/curaleaf-dispensary-48th-street');
|
||||
await page.waitForTimeout(2000);
|
||||
|
||||
console.log('\n=== BEFORE STATE SELECTION ===\n');
|
||||
|
||||
// Get all buttons
|
||||
const buttonsBefore = await page.locator('button, [role="button"], a').evaluateAll(elements => {
|
||||
return elements.map(el => ({
|
||||
tag: el.tagName,
|
||||
text: el.textContent?.trim().substring(0, 50),
|
||||
id: el.id,
|
||||
class: el.className,
|
||||
visible: el.offsetParent !== null
|
||||
})).filter(b => b.visible);
|
||||
});
|
||||
|
||||
console.log('Buttons before state selection:');
|
||||
buttonsBefore.forEach((b, i) => console.log(`${i + 1}. ${b.tag} - "${b.text}" [id: ${b.id}]`));
|
||||
|
||||
// Click state dropdown
|
||||
const stateButton = page.locator('button#state').first();
|
||||
await stateButton.click();
|
||||
await page.waitForTimeout(1000);
|
||||
|
||||
// Click Arizona
|
||||
const arizona = page.locator('[role="option"]').filter({ hasText: /^Arizona$/i }).first();
|
||||
await arizona.click();
|
||||
await page.waitForTimeout(2000);
|
||||
|
||||
console.log('\n=== AFTER STATE SELECTION ===\n');
|
||||
|
||||
const buttonsAfter = await page.locator('button, [role="button"], a').evaluateAll(elements => {
|
||||
return elements.map(el => ({
|
||||
tag: el.tagName,
|
||||
text: el.textContent?.trim().substring(0, 50),
|
||||
id: el.id,
|
||||
class: el.className,
|
||||
type: el.getAttribute('type'),
|
||||
visible: el.offsetParent !== null
|
||||
})).filter(b => b.visible);
|
||||
});
|
||||
|
||||
console.log('Buttons after state selection:');
|
||||
buttonsAfter.forEach((b, i) => console.log(`${i + 1}. ${b.tag} - "${b.text}" [id: ${b.id}] [type: ${b.type}]`));
|
||||
|
||||
// Check for any form elements
|
||||
const forms = await page.locator('form').count();
|
||||
console.log(`\nForms on page: ${forms}`);
|
||||
|
||||
if (forms > 0) {
|
||||
const formActions = await page.locator('form').evaluateAll(forms => {
|
||||
return forms.map(f => ({
|
||||
action: f.action,
|
||||
method: f.method
|
||||
}));
|
||||
});
|
||||
console.log('Form details:', formActions);
|
||||
}
|
||||
|
||||
await page.screenshot({ path: '/tmp/curaleaf-debug-after-state.png', fullPage: true });
|
||||
console.log('\n📸 Screenshot: /tmp/curaleaf-debug-after-state.png');
|
||||
|
||||
await browser.close();
|
||||
}
|
||||
|
||||
debugCuraleafButtons();
|
||||
103
backend/archive/debug-dutchie-detection.ts
Normal file
103
backend/archive/debug-dutchie-detection.ts
Normal file
@@ -0,0 +1,103 @@
|
||||
import puppeteer from 'puppeteer-extra';
|
||||
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
|
||||
import { bypassAgeGate, detectStateFromUrl } from './src/utils/age-gate';
|
||||
import { Browser, Page } from 'puppeteer';
|
||||
|
||||
puppeteer.use(StealthPlugin());
|
||||
|
||||
async function debugDutchieDetection() {
|
||||
let browser: Browser | null = null;
|
||||
|
||||
try {
|
||||
const url = 'https://curaleaf.com/stores/curaleaf-az-48th-street';
|
||||
|
||||
browser = await puppeteer.launch({
|
||||
headless: 'new',
|
||||
args: [
|
||||
'--no-sandbox',
|
||||
'--disable-setuid-sandbox',
|
||||
'--disable-dev-shm-usage',
|
||||
'--disable-blink-features=AutomationControlled'
|
||||
]
|
||||
});
|
||||
|
||||
const page = await browser.newPage();
|
||||
await page.setViewport({ width: 1920, height: 1080 });
|
||||
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36');
|
||||
|
||||
console.log('Loading page...');
|
||||
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 60000 });
|
||||
await page.waitForTimeout(3000);
|
||||
|
||||
console.log('Bypassing age gate...');
|
||||
const state = detectStateFromUrl(url);
|
||||
await bypassAgeGate(page, state);
|
||||
|
||||
console.log('\nWaiting 5 more seconds for Dutchie menu to load...');
|
||||
await page.waitForTimeout(5000);
|
||||
|
||||
console.log('\nChecking for Dutchie markers...');
|
||||
|
||||
const dutchieInfo = await page.evaluate(() => {
|
||||
// Check window.reactEnv
|
||||
const hasReactEnv = !!(window as any).reactEnv;
|
||||
const reactEnvDetails = hasReactEnv ? JSON.stringify((window as any).reactEnv, null, 2) : null;
|
||||
|
||||
// Check HTML content
|
||||
const htmlContent = document.documentElement.innerHTML;
|
||||
const hasAdminDutchie = htmlContent.includes('admin.dutchie.com');
|
||||
const hasApiDutchie = htmlContent.includes('api.dutchie.com');
|
||||
const hasEmbeddedMenu = htmlContent.includes('embedded-menu');
|
||||
const hasReactEnvInHtml = htmlContent.includes('window.reactEnv');
|
||||
|
||||
// Check for Dutchie-specific elements
|
||||
const hasProductListItems = document.querySelectorAll('[data-testid="product-list-item"]').length;
|
||||
const hasDutchieScript = !!document.querySelector('script[src*="dutchie"]');
|
||||
|
||||
// Check meta tags
|
||||
const metaTags = Array.from(document.querySelectorAll('meta')).map(m => ({
|
||||
name: m.getAttribute('name'),
|
||||
content: m.getAttribute('content'),
|
||||
property: m.getAttribute('property')
|
||||
})).filter(m => m.name || m.property);
|
||||
|
||||
return {
|
||||
hasReactEnv,
|
||||
reactEnvDetails,
|
||||
hasAdminDutchie,
|
||||
hasApiDutchie,
|
||||
hasEmbeddedMenu,
|
||||
hasReactEnvInHtml,
|
||||
hasProductListItems,
|
||||
hasDutchieScript,
|
||||
metaTags,
|
||||
pageTitle: document.title,
|
||||
url: window.location.href
|
||||
};
|
||||
});
|
||||
|
||||
console.log('\n=== Dutchie Detection Results ===');
|
||||
console.log('window.reactEnv exists:', dutchieInfo.hasReactEnv);
|
||||
if (dutchieInfo.reactEnvDetails) {
|
||||
console.log('window.reactEnv contents:', dutchieInfo.reactEnvDetails);
|
||||
}
|
||||
console.log('Has admin.dutchie.com in HTML:', dutchieInfo.hasAdminDutchie);
|
||||
console.log('Has api.dutchie.com in HTML:', dutchieInfo.hasApiDutchie);
|
||||
console.log('Has "embedded-menu" in HTML:', dutchieInfo.hasEmbeddedMenu);
|
||||
console.log('Has "window.reactEnv" in HTML:', dutchieInfo.hasReactEnvInHtml);
|
||||
console.log('Product list items found:', dutchieInfo.hasProductListItems);
|
||||
console.log('Has Dutchie script tag:', dutchieInfo.hasDutchieScript);
|
||||
console.log('Page title:', dutchieInfo.pageTitle);
|
||||
console.log('Current URL:', dutchieInfo.url);
|
||||
|
||||
await browser.close();
|
||||
process.exit(0);
|
||||
|
||||
} catch (error) {
|
||||
console.error('Error:', error);
|
||||
if (browser) await browser.close();
|
||||
process.exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
debugDutchieDetection();
|
||||
134
backend/archive/debug-dutchie-selectors.ts
Normal file
134
backend/archive/debug-dutchie-selectors.ts
Normal file
@@ -0,0 +1,134 @@
|
||||
import { createStealthBrowser, createStealthContext, waitForPageLoad, isCloudflareChallenge, waitForCloudflareChallenge } from './src/utils/stealthBrowser';
|
||||
import { getRandomProxy } from './src/utils/proxyManager';
|
||||
import { pool } from './src/db/migrate';
|
||||
import * as fs from 'fs/promises';
|
||||
|
||||
async function debugDutchieSelectors() {
|
||||
console.log('🔍 Debugging Dutchie page structure...\n');
|
||||
|
||||
const url = 'https://dutchie.com/dispensary/sol-flower-dispensary';
|
||||
|
||||
// Get proxy
|
||||
const proxy = await getRandomProxy();
|
||||
console.log(`Using proxy: ${proxy?.server || 'none'}\n`);
|
||||
|
||||
const browser = await createStealthBrowser({ proxy: proxy || undefined, headless: true });
|
||||
|
||||
try {
|
||||
const context = await createStealthContext(browser, { state: 'Arizona' });
|
||||
const page = await context.newPage();
|
||||
|
||||
console.log(`Loading: ${url}`);
|
||||
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 60000 });
|
||||
|
||||
// Check for Cloudflare
|
||||
if (await isCloudflareChallenge(page)) {
|
||||
console.log('🛡️ Cloudflare detected, waiting...');
|
||||
await waitForCloudflareChallenge(page, 60000);
|
||||
}
|
||||
|
||||
await waitForPageLoad(page);
|
||||
|
||||
// Wait for content
|
||||
await page.waitForTimeout(5000);
|
||||
|
||||
console.log('\n📸 Taking screenshot...');
|
||||
await page.screenshot({ path: '/tmp/dutchie-page.png', fullPage: true });
|
||||
|
||||
console.log('💾 Saving HTML...');
|
||||
const html = await page.content();
|
||||
await fs.writeFile('/tmp/dutchie-page.html', html);
|
||||
|
||||
console.log('\n🔎 Looking for common React/product patterns...\n');
|
||||
|
||||
// Try to find product containers by various methods
|
||||
const patterns = [
|
||||
// React data attributes
|
||||
'a[href*="/product/"]',
|
||||
'[data-testid*="product"]',
|
||||
'[data-cy*="product"]',
|
||||
'[data-test*="product"]',
|
||||
|
||||
// Common class patterns
|
||||
'[class*="ProductCard"]',
|
||||
'[class*="product-card"]',
|
||||
'[class*="Product_"]',
|
||||
'[class*="MenuItem"]',
|
||||
'[class*="menu-item"]',
|
||||
|
||||
// Semantic HTML
|
||||
'article',
|
||||
'[role="article"]',
|
||||
'[role="listitem"]',
|
||||
|
||||
// Link patterns
|
||||
'a[href*="/menu/"]',
|
||||
'a[href*="/products/"]',
|
||||
'a[href*="/item/"]',
|
||||
];
|
||||
|
||||
for (const selector of patterns) {
|
||||
const count = await page.locator(selector).count();
|
||||
if (count > 0) {
|
||||
console.log(`✓ ${selector}: ${count} elements`);
|
||||
|
||||
// Get details of first element
|
||||
try {
|
||||
const first = page.locator(selector).first();
|
||||
const html = await first.evaluate(el => el.outerHTML.substring(0, 500));
|
||||
const classes = await first.getAttribute('class');
|
||||
const testId = await first.getAttribute('data-testid');
|
||||
|
||||
console.log(` Classes: ${classes || 'none'}`);
|
||||
console.log(` Data-testid: ${testId || 'none'}`);
|
||||
console.log(` HTML preview: ${html}...`);
|
||||
console.log('');
|
||||
} catch (e) {
|
||||
console.log(` (Could not get element details)`);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Try to extract actual product links
|
||||
console.log('\n🔗 Looking for product links...\n');
|
||||
const links = await page.locator('a[href*="/product/"], a[href*="/menu/"], a[href*="/item/"]').all();
|
||||
|
||||
if (links.length > 0) {
|
||||
console.log(`Found ${links.length} potential product links:`);
|
||||
for (let i = 0; i < Math.min(5, links.length); i++) {
|
||||
const href = await links[i].getAttribute('href');
|
||||
const text = await links[i].textContent();
|
||||
console.log(` ${i + 1}. ${href}`);
|
||||
console.log(` Text: ${text?.substring(0, 100)}`);
|
||||
}
|
||||
}
|
||||
|
||||
// Check page title and URL
|
||||
console.log(`\n📄 Page title: ${await page.title()}`);
|
||||
console.log(`📍 Final URL: ${page.url()}`);
|
||||
|
||||
// Try to find the main content container
|
||||
console.log('\n🎯 Looking for main content container...\n');
|
||||
const mainPatterns = ['main', '[role="main"]', '#root', '#app', '[id*="app"]'];
|
||||
for (const selector of mainPatterns) {
|
||||
const count = await page.locator(selector).count();
|
||||
if (count > 0) {
|
||||
console.log(`✓ ${selector}: found`);
|
||||
const classes = await page.locator(selector).first().getAttribute('class');
|
||||
console.log(` Classes: ${classes || 'none'}`);
|
||||
}
|
||||
}
|
||||
|
||||
console.log('\n✅ Debug complete!');
|
||||
console.log('📸 Screenshot saved to: /tmp/dutchie-page.png');
|
||||
console.log('💾 HTML saved to: /tmp/dutchie-page.html');
|
||||
|
||||
} catch (error) {
|
||||
console.error('❌ Error:', error);
|
||||
} finally {
|
||||
await browser.close();
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
debugDutchieSelectors();
|
||||
171
backend/archive/debug-google-scraper.ts
Normal file
171
backend/archive/debug-google-scraper.ts
Normal file
@@ -0,0 +1,171 @@
|
||||
import { chromium } from 'playwright';
|
||||
import { pool } from './src/db/migrate';
|
||||
import { getRandomProxy } from './src/utils/proxyManager';
|
||||
import * as fs from 'fs';
|
||||
|
||||
async function debugGoogleScraper() {
|
||||
console.log('🔍 Debugging Google scraper with proxy\n');
|
||||
|
||||
// Get a proxy
|
||||
const proxy = await getRandomProxy();
|
||||
if (!proxy) {
|
||||
console.log('❌ No proxies available');
|
||||
await pool.end();
|
||||
return;
|
||||
}
|
||||
|
||||
console.log(`🔌 Using proxy: ${proxy.server}\n`);
|
||||
|
||||
const browser = await chromium.launch({
|
||||
headless: false, // Run in visible mode
|
||||
args: ['--disable-blink-features=AutomationControlled']
|
||||
});
|
||||
|
||||
const contextOptions: any = {
|
||||
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
|
||||
viewport: { width: 1920, height: 1080 },
|
||||
locale: 'en-US',
|
||||
timezoneId: 'America/Phoenix',
|
||||
geolocation: { latitude: 33.4484, longitude: -112.0740 },
|
||||
permissions: ['geolocation'],
|
||||
proxy: {
|
||||
server: proxy.server,
|
||||
username: proxy.username,
|
||||
password: proxy.password
|
||||
}
|
||||
};
|
||||
|
||||
const context = await browser.newContext(contextOptions);
|
||||
|
||||
// Add stealth
|
||||
await context.addInitScript(() => {
|
||||
Object.defineProperty(navigator, 'webdriver', { get: () => false });
|
||||
(window as any).chrome = { runtime: {} };
|
||||
});
|
||||
|
||||
const page = await context.newPage();
|
||||
|
||||
try {
|
||||
// Test with the "All Greens Dispensary" example
|
||||
const testAddress = '1035 W Main St, Quartzsite, AZ 85346';
|
||||
const searchQuery = `${testAddress} dispensary`;
|
||||
const searchUrl = `https://www.google.com/search?q=${encodeURIComponent(searchQuery)}`;
|
||||
|
||||
console.log(`🔍 Testing search: ${searchQuery}`);
|
||||
console.log(`📍 URL: ${searchUrl}\n`);
|
||||
|
||||
await page.goto(searchUrl, { waitUntil: 'networkidle', timeout: 30000 });
|
||||
await page.waitForTimeout(3000);
|
||||
|
||||
// Take screenshot
|
||||
await page.screenshot({ path: '/tmp/google-search-debug.png', fullPage: true });
|
||||
console.log('📸 Screenshot saved to /tmp/google-search-debug.png\n');
|
||||
|
||||
// Get the full HTML
|
||||
const html = await page.content();
|
||||
fs.writeFileSync('/tmp/google-search-debug.html', html);
|
||||
console.log('💾 HTML saved to /tmp/google-search-debug.html\n');
|
||||
|
||||
// Try to find any text that looks like "All Greens"
|
||||
const pageText = await page.evaluate(() => document.body.innerText);
|
||||
const hasAllGreens = pageText.toLowerCase().includes('all greens');
|
||||
console.log(`🔍 Page contains "All Greens": ${hasAllGreens}\n`);
|
||||
|
||||
if (hasAllGreens) {
|
||||
console.log('✅ Google found the business!\n');
|
||||
|
||||
// Let's try to find where the name appears in the DOM
|
||||
const nameInfo = await page.evaluate(() => {
|
||||
const results: any[] = [];
|
||||
const walker = document.createTreeWalker(
|
||||
document.body,
|
||||
NodeFilter.SHOW_TEXT,
|
||||
null
|
||||
);
|
||||
|
||||
let node;
|
||||
while (node = walker.nextNode()) {
|
||||
const text = node.textContent?.trim() || '';
|
||||
if (text.toLowerCase().includes('all greens')) {
|
||||
const element = node.parentElement;
|
||||
results.push({
|
||||
text: text,
|
||||
tagName: element?.tagName,
|
||||
className: element?.className,
|
||||
id: element?.id,
|
||||
dataAttrs: Array.from(element?.attributes || [])
|
||||
.filter(attr => attr.name.startsWith('data-'))
|
||||
.map(attr => `${attr.name}="${attr.value}"`)
|
||||
});
|
||||
}
|
||||
}
|
||||
return results;
|
||||
});
|
||||
|
||||
console.log('📍 Found "All Greens" in these elements:');
|
||||
console.log(JSON.stringify(nameInfo, null, 2));
|
||||
}
|
||||
|
||||
// Try current selectors
|
||||
console.log('\n🧪 Testing current selectors:\n');
|
||||
|
||||
const nameSelectors = [
|
||||
'[data-attrid="title"]',
|
||||
'h2[data-attrid="title"]',
|
||||
'.SPZz6b h2',
|
||||
'h3.LC20lb',
|
||||
'.kp-header .SPZz6b'
|
||||
];
|
||||
|
||||
for (const selector of nameSelectors) {
|
||||
const element = await page.$(selector);
|
||||
if (element) {
|
||||
const text = await element.textContent();
|
||||
console.log(`✅ ${selector}: "${text?.trim()}"`);
|
||||
} else {
|
||||
console.log(`❌ ${selector}: not found`);
|
||||
}
|
||||
}
|
||||
|
||||
// Look for website links
|
||||
console.log('\n🔗 Looking for website links:\n');
|
||||
const links = await page.evaluate(() => {
|
||||
const allLinks = Array.from(document.querySelectorAll('a[href]'));
|
||||
return allLinks
|
||||
.filter(a => {
|
||||
const href = (a as HTMLAnchorElement).href;
|
||||
return href &&
|
||||
!href.includes('google.com') &&
|
||||
!href.includes('youtube.com') &&
|
||||
!href.includes('facebook.com');
|
||||
})
|
||||
.slice(0, 10)
|
||||
.map(a => ({
|
||||
href: (a as HTMLAnchorElement).href,
|
||||
text: a.textContent?.trim().substring(0, 50),
|
||||
className: a.className
|
||||
}));
|
||||
});
|
||||
|
||||
console.log('First 10 non-Google links:');
|
||||
console.log(JSON.stringify(links, null, 2));
|
||||
|
||||
// Look for phone numbers
|
||||
console.log('\n📞 Looking for phone numbers:\n');
|
||||
const phoneMatches = pageText.match(/\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}/g);
|
||||
if (phoneMatches) {
|
||||
console.log('Found phone numbers:', phoneMatches);
|
||||
} else {
|
||||
console.log('No phone numbers found in page text');
|
||||
}
|
||||
|
||||
console.log('\n⏸️ Browser will stay open for 30 seconds for manual inspection...');
|
||||
await page.waitForTimeout(30000);
|
||||
|
||||
} finally {
|
||||
await browser.close();
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
debugGoogleScraper().catch(console.error);
|
||||
56
backend/archive/debug-product-card.ts
Normal file
56
backend/archive/debug-product-card.ts
Normal file
@@ -0,0 +1,56 @@
|
||||
import { firefox } from 'playwright';
|
||||
import { getRandomProxy } from './src/utils/proxyManager.js';
|
||||
|
||||
async function debugProductCard() {
|
||||
const proxy = await getRandomProxy();
|
||||
if (!proxy) {
|
||||
console.log('No proxy available');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const browser = await firefox.launch({ headless: true });
|
||||
const context = await browser.newContext({
|
||||
viewport: { width: 1920, height: 1080 },
|
||||
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
|
||||
proxy: {
|
||||
server: proxy.server,
|
||||
username: proxy.username,
|
||||
password: proxy.password
|
||||
}
|
||||
});
|
||||
|
||||
const page = await context.newPage();
|
||||
|
||||
const brandUrl = 'https://dutchie.com/embedded-menu/AZ-Deeply-Rooted/brands/alien-labs';
|
||||
console.log(`Loading: ${brandUrl}`);
|
||||
|
||||
await page.goto(brandUrl, { waitUntil: 'domcontentloaded', timeout: 60000 });
|
||||
await page.waitForTimeout(3000);
|
||||
|
||||
// Get the first product card's full text content
|
||||
const cardData = await page.evaluate(() => {
|
||||
const card = document.querySelector('a[href*="/product/"]');
|
||||
if (!card) return null;
|
||||
|
||||
return {
|
||||
href: card.getAttribute('href'),
|
||||
innerHTML: card.innerHTML.substring(0, 2000),
|
||||
textContent: card.textContent?.substring(0, 1000)
|
||||
};
|
||||
});
|
||||
|
||||
console.log('\n' + '='.repeat(80));
|
||||
console.log('FIRST PRODUCT CARD DATA:');
|
||||
console.log('='.repeat(80));
|
||||
console.log('\nHREF:', cardData?.href);
|
||||
console.log('\nTEXT CONTENT:');
|
||||
console.log(cardData?.textContent);
|
||||
console.log('\nHTML (first 2000 chars):');
|
||||
console.log(cardData?.innerHTML);
|
||||
console.log('='.repeat(80));
|
||||
|
||||
await browser.close();
|
||||
process.exit(0);
|
||||
}
|
||||
|
||||
debugProductCard().catch(console.error);
|
||||
97
backend/archive/debug-products-page.ts
Normal file
97
backend/archive/debug-products-page.ts
Normal file
@@ -0,0 +1,97 @@
|
||||
import { firefox } from 'playwright';
|
||||
import { getRandomProxy } from './src/utils/proxyManager.js';
|
||||
import { pool } from './src/db/migrate.js';
|
||||
|
||||
async function debugPage() {
|
||||
const proxy = await getRandomProxy();
|
||||
if (!proxy) {
|
||||
console.log('No proxy available');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const browser = await firefox.launch({
|
||||
headless: true,
|
||||
firefoxUserPrefs: { 'geo.enabled': true }
|
||||
});
|
||||
|
||||
const context = await browser.newContext({
|
||||
viewport: { width: 1920, height: 1080 },
|
||||
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
|
||||
geolocation: { latitude: 33.4484, longitude: -112.0740 },
|
||||
permissions: ['geolocation'],
|
||||
proxy: {
|
||||
server: proxy.server,
|
||||
username: proxy.username,
|
||||
password: proxy.password
|
||||
}
|
||||
});
|
||||
|
||||
const page = await context.newPage();
|
||||
|
||||
try {
|
||||
console.log('Loading page...');
|
||||
await page.goto('https://dutchie.com/embedded-menu/AZ-Deeply-Rooted/products/', {
|
||||
waitUntil: 'domcontentloaded',
|
||||
timeout: 60000
|
||||
});
|
||||
|
||||
await page.waitForTimeout(5000);
|
||||
|
||||
// Take screenshot
|
||||
await page.screenshot({ path: '/tmp/products-page.png' });
|
||||
console.log('Screenshot saved to /tmp/products-page.png');
|
||||
|
||||
// Get HTML sample
|
||||
const html = await page.content();
|
||||
console.log('\n=== PAGE TITLE ===');
|
||||
console.log(await page.title());
|
||||
|
||||
console.log('\n=== SEARCHING FOR PRODUCT ELEMENTS ===');
|
||||
|
||||
// Try different selectors
|
||||
const tests = [
|
||||
'a[href*="/product/"]',
|
||||
'[class*="Product"]',
|
||||
'[class*="product"]',
|
||||
'[class*="card"]',
|
||||
'[class*="Card"]',
|
||||
'[data-testid*="product"]',
|
||||
'article',
|
||||
'[role="article"]',
|
||||
];
|
||||
|
||||
for (const selector of tests) {
|
||||
const count = await page.locator(selector).count();
|
||||
console.log(`${selector.padEnd(35)} → ${count} elements`);
|
||||
}
|
||||
|
||||
// Get all links
|
||||
console.log('\n=== ALL LINKS WITH "product" IN HREF ===');
|
||||
const productLinks = await page.evaluate(() => {
|
||||
return Array.from(document.querySelectorAll('a'))
|
||||
.filter(a => a.href.includes('/product/'))
|
||||
.map(a => ({
|
||||
href: a.href,
|
||||
text: a.textContent?.trim().substring(0, 100),
|
||||
classes: a.className
|
||||
}))
|
||||
.slice(0, 10);
|
||||
});
|
||||
console.table(productLinks);
|
||||
|
||||
// Get sample HTML of body
|
||||
console.log('\n=== SAMPLE HTML (first 2000 chars) ===');
|
||||
const bodyHtml = await page.evaluate(() => document.body.innerHTML);
|
||||
console.log(bodyHtml.substring(0, 2000));
|
||||
|
||||
await browser.close();
|
||||
await pool.end();
|
||||
} catch (error) {
|
||||
console.error('Error:', error);
|
||||
await browser.close();
|
||||
await pool.end();
|
||||
process.exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
debugPage();
|
||||
113
backend/archive/debug-sale-price.ts
Normal file
113
backend/archive/debug-sale-price.ts
Normal file
@@ -0,0 +1,113 @@
|
||||
import { firefox } from 'playwright';
|
||||
import { getRandomProxy } from './src/utils/proxyManager.js';
|
||||
|
||||
async function main() {
|
||||
const productUrl = 'https://dutchie.com/embedded-menu/AZ-Deeply-Rooted/product/easy-tiger-live-rosin-aio-pete-s-peach';
|
||||
|
||||
console.log('🔍 Investigating Product with Sale Price...\n');
|
||||
console.log(`URL: ${productUrl}\n`);
|
||||
|
||||
const proxyConfig = await getRandomProxy();
|
||||
if (!proxyConfig) {
|
||||
throw new Error('No proxy available');
|
||||
}
|
||||
console.log(`🔐 Using proxy: ${proxyConfig.server}\n`);
|
||||
|
||||
const browser = await firefox.launch({
|
||||
headless: true,
|
||||
proxy: proxyConfig
|
||||
});
|
||||
|
||||
const context = await browser.newContext({
|
||||
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/115.0'
|
||||
});
|
||||
|
||||
const page = await context.newPage();
|
||||
|
||||
try {
|
||||
console.log('📄 Loading product page...');
|
||||
await page.goto(productUrl, {
|
||||
waitUntil: 'domcontentloaded',
|
||||
timeout: 60000
|
||||
});
|
||||
|
||||
await page.waitForTimeout(3000);
|
||||
console.log('✅ Page loaded\n');
|
||||
|
||||
// Get the full page HTML for inspection
|
||||
const html = await page.content();
|
||||
|
||||
// Look for price-related elements
|
||||
const priceData = await page.evaluate(() => {
|
||||
// Try JSON-LD structured data
|
||||
const scripts = Array.from(document.querySelectorAll('script[type="application/ld+json"]'));
|
||||
let jsonLdData: any = null;
|
||||
|
||||
for (const script of scripts) {
|
||||
try {
|
||||
const data = JSON.parse(script.textContent || '');
|
||||
if (data['@type'] === 'Product') {
|
||||
jsonLdData = data;
|
||||
break;
|
||||
}
|
||||
} catch (e) {}
|
||||
}
|
||||
|
||||
// Look for price elements in various ways
|
||||
const priceElements = Array.from(document.querySelectorAll('[class*="price"], [class*="Price"]'));
|
||||
const priceTexts = priceElements.map(el => ({
|
||||
className: el.className,
|
||||
textContent: el.textContent?.trim().substring(0, 100)
|
||||
}));
|
||||
|
||||
// Get all text containing dollar signs
|
||||
const pageText = document.body.textContent || '';
|
||||
const priceMatches = pageText.match(/\$\d+\.?\d*/g);
|
||||
|
||||
// Look for strikethrough prices (often used for original price when there's a sale)
|
||||
const strikethroughElements = Array.from(document.querySelectorAll('s, del, [style*="line-through"]'));
|
||||
const strikethroughPrices = strikethroughElements.map(el => el.textContent?.trim());
|
||||
|
||||
// Look for elements with "sale", "special", "discount" in class names
|
||||
const saleElements = Array.from(document.querySelectorAll('[class*="sale"], [class*="Sale"], [class*="special"], [class*="Special"], [class*="discount"], [class*="Discount"]'));
|
||||
const saleTexts = saleElements.map(el => ({
|
||||
className: el.className,
|
||||
textContent: el.textContent?.trim().substring(0, 100)
|
||||
}));
|
||||
|
||||
return {
|
||||
jsonLdData,
|
||||
priceElements: priceTexts.slice(0, 10),
|
||||
priceMatches: priceMatches?.slice(0, 20) || [],
|
||||
strikethroughPrices: strikethroughPrices.slice(0, 5),
|
||||
saleElements: saleTexts.slice(0, 10)
|
||||
};
|
||||
});
|
||||
|
||||
console.log('💰 Price Data Found:');
|
||||
console.log(JSON.stringify(priceData, null, 2));
|
||||
|
||||
// Take a screenshot for visual reference
|
||||
await page.screenshot({ path: '/tmp/sale-price-product.png', fullPage: true });
|
||||
console.log('\n📸 Screenshot saved to /tmp/sale-price-product.png');
|
||||
|
||||
// Save a snippet of the HTML around price elements
|
||||
const priceHtmlSnippet = await page.evaluate(() => {
|
||||
const priceElements = Array.from(document.querySelectorAll('[class*="price"], [class*="Price"]'));
|
||||
if (priceElements.length > 0) {
|
||||
return priceElements.slice(0, 3).map(el => el.outerHTML).join('\n\n');
|
||||
}
|
||||
return 'No price elements found';
|
||||
});
|
||||
|
||||
console.log('\n📝 Price HTML Snippet:');
|
||||
console.log(priceHtmlSnippet);
|
||||
|
||||
} catch (error: any) {
|
||||
console.error('❌ Error:', error.message);
|
||||
} finally {
|
||||
await browser.close();
|
||||
}
|
||||
}
|
||||
|
||||
main().catch(console.error);
|
||||
91
backend/archive/debug-scrape.ts
Normal file
91
backend/archive/debug-scrape.ts
Normal file
@@ -0,0 +1,91 @@
|
||||
import puppeteer from 'puppeteer-extra';
|
||||
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
|
||||
import { Pool } from 'pg';
|
||||
import fs from 'fs';
|
||||
|
||||
puppeteer.use(StealthPlugin());
|
||||
|
||||
const pool = new Pool({
|
||||
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
|
||||
});
|
||||
|
||||
async function debug() {
|
||||
let browser;
|
||||
|
||||
try {
|
||||
// Get proxy
|
||||
const proxyResult = await pool.query(`SELECT host, port, protocol FROM proxies ORDER BY RANDOM() LIMIT 1`);
|
||||
const proxy = proxyResult.rows[0];
|
||||
const proxyUrl = `${proxy.protocol}://${proxy.host}:${proxy.port}`;
|
||||
|
||||
console.log('🔌 Proxy:', proxyUrl);
|
||||
|
||||
browser = await puppeteer.launch({
|
||||
headless: true,
|
||||
args: ['--no-sandbox', '--disable-setuid-sandbox', `--proxy-server=${proxyUrl}`]
|
||||
});
|
||||
|
||||
const page = await browser.newPage();
|
||||
|
||||
// Set Googlebot UA
|
||||
await page.setUserAgent('Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)');
|
||||
|
||||
// Log all requests being made
|
||||
page.on('request', request => {
|
||||
console.log('\n📤 REQUEST:', request.method(), request.url());
|
||||
console.log(' Headers:', JSON.stringify(request.headers(), null, 2));
|
||||
});
|
||||
|
||||
// Log all responses
|
||||
page.on('response', response => {
|
||||
console.log('\n📥 RESPONSE:', response.status(), response.url());
|
||||
console.log(' Headers:', JSON.stringify(response.headers(), null, 2));
|
||||
});
|
||||
|
||||
const url = 'https://curaleaf.com/stores/curaleaf-dispensary-phoenix-airport/brands';
|
||||
console.log('\n🌐 Going to:', url);
|
||||
|
||||
await page.goto(url, { waitUntil: 'networkidle2', timeout: 60000 });
|
||||
await page.waitForTimeout(3000);
|
||||
|
||||
// Get what the browser sees
|
||||
const pageData = await page.evaluate(() => ({
|
||||
title: document.title,
|
||||
url: window.location.href,
|
||||
userAgent: navigator.userAgent,
|
||||
bodyHTML: document.body.innerHTML,
|
||||
bodyText: document.body.innerText
|
||||
}));
|
||||
|
||||
console.log('\n📄 PAGE DATA:');
|
||||
console.log('Title:', pageData.title);
|
||||
console.log('URL:', pageData.url);
|
||||
console.log('User Agent (browser sees):', pageData.userAgent);
|
||||
console.log('Body HTML length:', pageData.bodyHTML.length, 'chars');
|
||||
console.log('Body text length:', pageData.bodyText.length, 'chars');
|
||||
|
||||
// Save HTML to file
|
||||
fs.writeFileSync('/tmp/page.html', pageData.bodyHTML);
|
||||
console.log('\n💾 Saved HTML to /tmp/page.html');
|
||||
|
||||
// Save screenshot
|
||||
await page.screenshot({ path: '/tmp/screenshot.png', fullPage: true });
|
||||
console.log('📸 Saved screenshot to /tmp/screenshot.png');
|
||||
|
||||
// Show first 500 chars of HTML
|
||||
console.log('\n📝 First 500 chars of HTML:');
|
||||
console.log(pageData.bodyHTML.substring(0, 500));
|
||||
|
||||
// Show first 500 chars of text
|
||||
console.log('\n📝 First 500 chars of text:');
|
||||
console.log(pageData.bodyText.substring(0, 500));
|
||||
|
||||
} catch (error: any) {
|
||||
console.error('❌ Error:', error.message);
|
||||
} finally {
|
||||
if (browser) await browser.close();
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
debug();
|
||||
68
backend/archive/debug-sol-flower.ts
Normal file
68
backend/archive/debug-sol-flower.ts
Normal file
@@ -0,0 +1,68 @@
|
||||
import { chromium } from 'playwright';
|
||||
|
||||
async function debugSolFlower() {
|
||||
const browser = await chromium.launch({ headless: true });
|
||||
const context = await browser.newContext();
|
||||
const page = await context.newPage();
|
||||
|
||||
try {
|
||||
// Set age gate bypass cookies
|
||||
await context.addCookies([
|
||||
{
|
||||
name: 'age_verified',
|
||||
value: 'true',
|
||||
domain: '.dutchie.com',
|
||||
path: '/',
|
||||
},
|
||||
{
|
||||
name: 'initial_location',
|
||||
value: JSON.stringify({ state: 'Arizona' }),
|
||||
domain: '.dutchie.com',
|
||||
path: '/',
|
||||
},
|
||||
]);
|
||||
|
||||
console.log('🌐 Loading Sol Flower Sun City shop page...');
|
||||
await page.goto('https://dutchie.com/dispensary/sol-flower-dispensary/shop', {
|
||||
waitUntil: 'networkidle',
|
||||
});
|
||||
|
||||
console.log('📸 Taking screenshot...');
|
||||
await page.screenshot({ path: '/tmp/sol-flower-shop.png', fullPage: true });
|
||||
|
||||
// Try to find products with various selectors
|
||||
console.log('\n🔍 Looking for products with different selectors:');
|
||||
|
||||
const selectors = [
|
||||
'a[href*="/product/"]',
|
||||
'[data-testid="product-card"]',
|
||||
'[data-testid="product"]',
|
||||
'.product-card',
|
||||
'.ProductCard',
|
||||
'article',
|
||||
'[role="article"]',
|
||||
];
|
||||
|
||||
for (const selector of selectors) {
|
||||
const count = await page.locator(selector).count();
|
||||
console.log(` ${selector}: ${count} elements`);
|
||||
}
|
||||
|
||||
// Get the page HTML to inspect
|
||||
console.log('\n📄 Page title:', await page.title());
|
||||
|
||||
// Check if there's any text indicating no products
|
||||
const bodyText = await page.locator('body').textContent();
|
||||
if (bodyText?.includes('No products') || bodyText?.includes('no items')) {
|
||||
console.log('⚠️ Page indicates no products available');
|
||||
}
|
||||
|
||||
console.log('\n✅ Screenshot saved to /tmp/sol-flower-shop.png');
|
||||
} catch (error) {
|
||||
console.error('❌ Error:', error);
|
||||
} finally {
|
||||
await browser.close();
|
||||
}
|
||||
}
|
||||
|
||||
debugSolFlower();
|
||||
151
backend/archive/debug-specials-page.ts
Normal file
151
backend/archive/debug-specials-page.ts
Normal file
@@ -0,0 +1,151 @@
|
||||
import { firefox } from 'playwright';
|
||||
import { getRandomProxy } from './src/utils/proxyManager.js';
|
||||
|
||||
async function main() {
|
||||
const menuUrl = 'https://dutchie.com/embedded-menu/AZ-Deeply-Rooted';
|
||||
const specialsUrl = `${menuUrl}/specials`;
|
||||
|
||||
console.log('🔍 Investigating Specials Page...\n');
|
||||
console.log(`URL: ${specialsUrl}\n`);
|
||||
|
||||
const proxyConfig = await getRandomProxy();
|
||||
if (!proxyConfig) {
|
||||
throw new Error('No proxy available');
|
||||
}
|
||||
console.log(`🔐 Using proxy: ${proxyConfig.server}\n`);
|
||||
|
||||
const browser = await firefox.launch({
|
||||
headless: true,
|
||||
proxy: proxyConfig
|
||||
});
|
||||
|
||||
const context = await browser.newContext({
|
||||
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/115.0'
|
||||
});
|
||||
|
||||
const page = await context.newPage();
|
||||
|
||||
try {
|
||||
console.log('📄 Loading specials page...');
|
||||
await page.goto(specialsUrl, {
|
||||
waitUntil: 'domcontentloaded',
|
||||
timeout: 60000
|
||||
});
|
||||
|
||||
await page.waitForTimeout(3000);
|
||||
|
||||
// Scroll to load content
|
||||
for (let i = 0; i < 10; i++) {
|
||||
await page.evaluate(() => window.scrollBy(0, window.innerHeight));
|
||||
await page.waitForTimeout(1000);
|
||||
}
|
||||
|
||||
console.log('✅ Page loaded\n');
|
||||
|
||||
// Check for JSON-LD data
|
||||
const jsonLdData = await page.evaluate(() => {
|
||||
const scripts = Array.from(document.querySelectorAll('script[type="application/ld+json"]'));
|
||||
return scripts.map(script => {
|
||||
try {
|
||||
return JSON.parse(script.textContent || '');
|
||||
} catch (e) {
|
||||
return null;
|
||||
}
|
||||
}).filter(Boolean);
|
||||
});
|
||||
|
||||
if (jsonLdData.length > 0) {
|
||||
console.log('📋 JSON-LD Data Found:');
|
||||
console.log(JSON.stringify(jsonLdData, null, 2));
|
||||
console.log('\n');
|
||||
}
|
||||
|
||||
// Look for product cards
|
||||
const productCards = await page.evaluate(() => {
|
||||
const cards = Array.from(document.querySelectorAll('a[href*="/product/"]'));
|
||||
return cards.slice(0, 5).map(card => ({
|
||||
href: card.getAttribute('href'),
|
||||
text: card.textContent?.trim().substring(0, 100)
|
||||
}));
|
||||
});
|
||||
|
||||
console.log('🛍️ Product Cards Found:', productCards.length);
|
||||
if (productCards.length > 0) {
|
||||
console.log('First 5 products:');
|
||||
productCards.forEach((card, idx) => {
|
||||
console.log(` ${idx + 1}. ${card.href}`);
|
||||
console.log(` Text: ${card.text}\n`);
|
||||
});
|
||||
}
|
||||
|
||||
// Look for special indicators
|
||||
const specialData = await page.evaluate(() => {
|
||||
const pageText = document.body.textContent || '';
|
||||
|
||||
// Look for common special-related keywords
|
||||
const hasDiscount = pageText.toLowerCase().includes('discount');
|
||||
const hasSale = pageText.toLowerCase().includes('sale');
|
||||
const hasOff = pageText.toLowerCase().includes('off');
|
||||
const hasDeal = pageText.toLowerCase().includes('deal');
|
||||
const hasPromo = pageText.toLowerCase().includes('promo');
|
||||
|
||||
// Look for percentage or dollar off indicators
|
||||
const percentMatches = pageText.match(/(\d+)%\s*off/gi);
|
||||
const dollarMatches = pageText.match(/\$(\d+)\s*off/gi);
|
||||
|
||||
// Try to find any special tags or badges
|
||||
const badges = Array.from(document.querySelectorAll('[class*="badge"], [class*="tag"], [class*="special"], [class*="sale"], [class*="discount"]'));
|
||||
const badgeTexts = badges.map(b => b.textContent?.trim()).filter(Boolean).slice(0, 10);
|
||||
|
||||
return {
|
||||
keywords: {
|
||||
hasDiscount,
|
||||
hasSale,
|
||||
hasOff,
|
||||
hasDeal,
|
||||
hasPromo
|
||||
},
|
||||
percentMatches: percentMatches || [],
|
||||
dollarMatches: dollarMatches || [],
|
||||
badgeTexts,
|
||||
totalBadges: badges.length
|
||||
};
|
||||
});
|
||||
|
||||
console.log('\n🏷️ Special Indicators:');
|
||||
console.log(JSON.stringify(specialData, null, 2));
|
||||
|
||||
// Get page title and any heading text
|
||||
const pageInfo = await page.evaluate(() => {
|
||||
const title = document.title;
|
||||
const h1 = document.querySelector('h1')?.textContent?.trim();
|
||||
const h2s = Array.from(document.querySelectorAll('h2')).map(h => h.textContent?.trim()).slice(0, 3);
|
||||
|
||||
return { title, h1, h2s };
|
||||
});
|
||||
|
||||
console.log('\n📰 Page Info:');
|
||||
console.log(JSON.stringify(pageInfo, null, 2));
|
||||
|
||||
// Check if there are any price elements visible
|
||||
const priceInfo = await page.evaluate(() => {
|
||||
const pageText = document.body.textContent || '';
|
||||
const priceMatches = pageText.match(/\$(\d+\.?\d*)/g);
|
||||
|
||||
return {
|
||||
pricesFound: priceMatches?.length || 0,
|
||||
samplePrices: priceMatches?.slice(0, 10) || []
|
||||
};
|
||||
});
|
||||
|
||||
console.log('\n💰 Price Info:');
|
||||
console.log(JSON.stringify(priceInfo, null, 2));
|
||||
|
||||
} catch (error: any) {
|
||||
console.error('❌ Error:', error.message);
|
||||
} finally {
|
||||
await browser.close();
|
||||
}
|
||||
}
|
||||
|
||||
main().catch(console.error);
|
||||
17
backend/archive/delete-dispensary-products.ts
Normal file
17
backend/archive/delete-dispensary-products.ts
Normal file
@@ -0,0 +1,17 @@
|
||||
import { pool } from './src/db/migrate.js';
|
||||
|
||||
async function main() {
|
||||
try {
|
||||
const result = await pool.query(`
|
||||
DELETE FROM products WHERE dispensary_id = 149
|
||||
`);
|
||||
|
||||
console.log(`Deleted ${result.rowCount} products from dispensary 149`);
|
||||
} catch (error: any) {
|
||||
console.error('Error:', error.message);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
main().catch(console.error);
|
||||
71
backend/archive/diagnose-curaleaf-page.ts
Normal file
71
backend/archive/diagnose-curaleaf-page.ts
Normal file
@@ -0,0 +1,71 @@
|
||||
import { chromium } from 'playwright';
|
||||
import { bypassAgeGatePlaywright } from './src/utils/age-gate-playwright';
|
||||
|
||||
async function diagnoseCuraleafPage() {
|
||||
const browser = await chromium.launch({ headless: true });
|
||||
const context = await browser.newContext({
|
||||
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
|
||||
viewport: { width: 1280, height: 720 }
|
||||
});
|
||||
|
||||
const page = await context.newPage();
|
||||
|
||||
try {
|
||||
console.log('Loading Curaleaf page...');
|
||||
await page.goto('https://curaleaf.com/stores/curaleaf-dispensary-48th-street', {
|
||||
waitUntil: 'domcontentloaded'
|
||||
});
|
||||
await page.waitForTimeout(2000);
|
||||
|
||||
console.log('Bypassing age gate...');
|
||||
const bypassed = await bypassAgeGatePlaywright(page, 'Arizona');
|
||||
|
||||
if (!bypassed) {
|
||||
console.log('❌ Failed to bypass age gate');
|
||||
await browser.close();
|
||||
return;
|
||||
}
|
||||
|
||||
console.log('✅ Age gate bypassed!');
|
||||
console.log(`Current URL: ${page.url()}`);
|
||||
|
||||
await page.waitForTimeout(5000);
|
||||
|
||||
// Check page title
|
||||
const title = await page.title();
|
||||
console.log(`\nPage title: ${title}`);
|
||||
|
||||
// Check for "menu" or "shop" links
|
||||
const menuLinks = await page.locator('a:has-text("menu"), a:has-text("shop"), a:has-text("order")').count();
|
||||
console.log(`\nMenu/Shop links found: ${menuLinks}`);
|
||||
|
||||
if (menuLinks > 0) {
|
||||
const links = await page.locator('a:has-text("menu"), a:has-text("shop"), a:has-text("order")').all();
|
||||
console.log('\nMenu links:');
|
||||
for (const link of links.slice(0, 5)) {
|
||||
const text = await link.textContent();
|
||||
const href = await link.getAttribute('href');
|
||||
console.log(` - ${text}: ${href}`);
|
||||
}
|
||||
}
|
||||
|
||||
// Check body text
|
||||
const bodyText = await page.textContent('body') || '';
|
||||
console.log(`\nBody text length: ${bodyText.length} characters`);
|
||||
console.log(`Contains "menu": ${bodyText.toLowerCase().includes('menu')}`);
|
||||
console.log(`Contains "shop": ${bodyText.toLowerCase().includes('shop')}`);
|
||||
console.log(`Contains "product": ${bodyText.toLowerCase().includes('product')}`);
|
||||
console.log(`Contains "dutchie": ${bodyText.toLowerCase().includes('dutchie')}`);
|
||||
|
||||
// Take screenshot
|
||||
await page.screenshot({ path: '/tmp/curaleaf-diagnosed.png', fullPage: true });
|
||||
console.log('\n📸 Screenshot saved: /tmp/curaleaf-diagnosed.png');
|
||||
|
||||
} catch (error) {
|
||||
console.error('Error:', error);
|
||||
} finally {
|
||||
await browser.close();
|
||||
}
|
||||
}
|
||||
|
||||
diagnoseCuraleafPage();
|
||||
319
backend/archive/enrich-azdhs-from-google-maps.ts
Normal file
319
backend/archive/enrich-azdhs-from-google-maps.ts
Normal file
@@ -0,0 +1,319 @@
|
||||
import { firefox } from 'playwright';
|
||||
import { pool } from './src/db/migrate';
|
||||
import { getRandomProxy } from './src/utils/proxyManager';
|
||||
|
||||
interface DispensaryEnrichment {
|
||||
id: number;
|
||||
azdhs_name: string;
|
||||
address: string;
|
||||
city: string;
|
||||
state: string;
|
||||
zip: string;
|
||||
dba_name?: string;
|
||||
website?: string;
|
||||
google_phone?: string;
|
||||
google_rating?: number;
|
||||
google_review_count?: number;
|
||||
confidence: 'high' | 'medium' | 'low';
|
||||
notes?: string;
|
||||
}
|
||||
|
||||
async function enrichFromGoogleMaps() {
|
||||
console.log('🦊 Enriching AZDHS dispensaries from Google Maps using Firefox\n');
|
||||
|
||||
// Get a proxy
|
||||
const proxy = await getRandomProxy();
|
||||
if (!proxy) {
|
||||
console.log('❌ No proxies available');
|
||||
await pool.end();
|
||||
return;
|
||||
}
|
||||
|
||||
console.log(`🔌 Using proxy: ${proxy.server}\n`);
|
||||
|
||||
const browser = await firefox.launch({
|
||||
headless: true,
|
||||
firefoxUserPrefs: {
|
||||
'geo.enabled': true,
|
||||
'geo.provider.use_corelocation': true,
|
||||
'geo.prompt.testing': true,
|
||||
'geo.prompt.testing.allow': true,
|
||||
}
|
||||
});
|
||||
|
||||
const contextOptions: any = {
|
||||
viewport: { width: 1920, height: 1080 },
|
||||
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0',
|
||||
geolocation: { latitude: 33.4484, longitude: -112.0740 }, // Phoenix, AZ
|
||||
permissions: ['geolocation'],
|
||||
proxy: {
|
||||
server: proxy.server,
|
||||
username: proxy.username,
|
||||
password: proxy.password
|
||||
}
|
||||
};
|
||||
|
||||
const context = await browser.newContext(contextOptions);
|
||||
const page = await context.newPage();
|
||||
|
||||
try {
|
||||
// Get all dispensaries that don't have website yet
|
||||
const result = await pool.query(`
|
||||
SELECT id, slug, name, address, city, state, zip, phone, website, dba_name
|
||||
FROM dispensaries
|
||||
WHERE website IS NULL OR website = ''
|
||||
ORDER BY id
|
||||
LIMIT 50
|
||||
`);
|
||||
|
||||
const dispensaries = result.rows;
|
||||
console.log(`📋 Found ${dispensaries.length} dispensaries to enrich\n`);
|
||||
|
||||
let changesCreated = 0;
|
||||
let failed = 0;
|
||||
let skipped = 0;
|
||||
|
||||
for (const disp of dispensaries) {
|
||||
console.log(`\n🔍 Processing: ${disp.name}`);
|
||||
console.log(` Address: ${disp.address}, ${disp.city}, ${disp.state} ${disp.zip}`);
|
||||
|
||||
try {
|
||||
// Search Google Maps with dispensary name + address for better results
|
||||
const searchQuery = `${disp.name} ${disp.address}, ${disp.city}, ${disp.state} ${disp.zip}`;
|
||||
const encodedQuery = encodeURIComponent(searchQuery);
|
||||
const url = `https://www.google.com/maps/search/${encodedQuery}`;
|
||||
|
||||
console.log(` 📍 Searching Maps: ${searchQuery}`);
|
||||
await page.goto(url, {
|
||||
waitUntil: 'domcontentloaded',
|
||||
timeout: 30000
|
||||
});
|
||||
|
||||
// Wait for results
|
||||
await page.waitForTimeout(3000);
|
||||
|
||||
// Extract business data from the first result
|
||||
const businessData = await page.evaluate(() => {
|
||||
const data: any = {};
|
||||
|
||||
// Try to find the place name from the side panel
|
||||
const nameSelectors = [
|
||||
'h1[class*="fontHeadline"]',
|
||||
'h1.DUwDvf',
|
||||
'[data-item-id*="name"] h1'
|
||||
];
|
||||
|
||||
for (const selector of nameSelectors) {
|
||||
const el = document.querySelector(selector);
|
||||
if (el?.textContent) {
|
||||
data.name = el.textContent.trim();
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
// Try to find website
|
||||
const websiteSelectors = [
|
||||
'a[data-item-id="authority"]',
|
||||
'a[data-tooltip="Open website"]',
|
||||
'a[aria-label*="Website"]'
|
||||
];
|
||||
|
||||
for (const selector of websiteSelectors) {
|
||||
const el = document.querySelector(selector) as HTMLAnchorElement;
|
||||
if (el?.href && !el.href.includes('google.com')) {
|
||||
data.website = el.href;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
// Try to find phone
|
||||
const phoneSelectors = [
|
||||
'button[data-item-id*="phone"]',
|
||||
'button[aria-label*="Phone"]',
|
||||
'[data-tooltip*="Copy phone number"]'
|
||||
];
|
||||
|
||||
for (const selector of phoneSelectors) {
|
||||
const el = document.querySelector(selector);
|
||||
if (el?.textContent) {
|
||||
const phoneMatch = el.textContent.match(/\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}/);
|
||||
if (phoneMatch) {
|
||||
data.phone = phoneMatch[0];
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Try to find rating
|
||||
const ratingEl = document.querySelector('[role="img"][aria-label*="stars"]');
|
||||
if (ratingEl) {
|
||||
const label = ratingEl.getAttribute('aria-label');
|
||||
const match = label?.match(/(\d+\.?\d*)\s*stars?/);
|
||||
if (match) {
|
||||
data.rating = parseFloat(match[1]);
|
||||
}
|
||||
}
|
||||
|
||||
// Try to find review count
|
||||
const reviewEl = document.querySelector('[aria-label*="reviews"]');
|
||||
if (reviewEl) {
|
||||
const label = reviewEl.getAttribute('aria-label');
|
||||
const match = label?.match(/([\d,]+)\s*reviews?/);
|
||||
if (match) {
|
||||
data.reviewCount = parseInt(match[1].replace(/,/g, ''));
|
||||
}
|
||||
}
|
||||
|
||||
return data;
|
||||
});
|
||||
|
||||
console.log(` Found data:`, businessData);
|
||||
|
||||
// Determine confidence level
|
||||
let confidence: 'high' | 'medium' | 'low' = 'low';
|
||||
if (businessData.name && businessData.website && businessData.phone) {
|
||||
confidence = 'high';
|
||||
} else if (businessData.name && (businessData.website || businessData.phone)) {
|
||||
confidence = 'medium';
|
||||
}
|
||||
|
||||
// Track if any changes were made for this dispensary
|
||||
let changesMadeForDispensary = 0;
|
||||
|
||||
// Create change records for each field that has new data
|
||||
if (businessData.name && businessData.name !== disp.dba_name) {
|
||||
await pool.query(`
|
||||
INSERT INTO dispensary_changes (
|
||||
dispensary_id, field_name, old_value, new_value,
|
||||
confidence_score, source, change_notes
|
||||
) VALUES ($1, $2, $3, $4, $5, $6, $7)
|
||||
`, [
|
||||
disp.id,
|
||||
'dba_name',
|
||||
disp.dba_name || null,
|
||||
businessData.name,
|
||||
confidence,
|
||||
'google_maps',
|
||||
`Found via Google Maps search for "${disp.name}"`
|
||||
]);
|
||||
console.log(` 📝 Created change record for DBA name`);
|
||||
changesMadeForDispensary++;
|
||||
}
|
||||
|
||||
if (businessData.website && businessData.website !== disp.website) {
|
||||
await pool.query(`
|
||||
INSERT INTO dispensary_changes (
|
||||
dispensary_id, field_name, old_value, new_value,
|
||||
confidence_score, source, change_notes, requires_recrawl
|
||||
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
|
||||
`, [
|
||||
disp.id,
|
||||
'website',
|
||||
disp.website || null,
|
||||
businessData.website,
|
||||
confidence,
|
||||
'google_maps',
|
||||
`Found via Google Maps search for "${disp.name}"`,
|
||||
true
|
||||
]);
|
||||
console.log(` 📝 Created change record for website (requires recrawl)`);
|
||||
changesMadeForDispensary++;
|
||||
}
|
||||
|
||||
if (businessData.phone && businessData.phone !== disp.phone) {
|
||||
await pool.query(`
|
||||
INSERT INTO dispensary_changes (
|
||||
dispensary_id, field_name, old_value, new_value,
|
||||
confidence_score, source, change_notes
|
||||
) VALUES ($1, $2, $3, $4, $5, $6, $7)
|
||||
`, [
|
||||
disp.id,
|
||||
'phone',
|
||||
disp.phone || null,
|
||||
businessData.phone,
|
||||
confidence,
|
||||
'google_maps',
|
||||
`Found via Google Maps search for "${disp.name}"`
|
||||
]);
|
||||
console.log(` 📝 Created change record for phone`);
|
||||
changesMadeForDispensary++;
|
||||
}
|
||||
|
||||
if (businessData.rating) {
|
||||
await pool.query(`
|
||||
INSERT INTO dispensary_changes (
|
||||
dispensary_id, field_name, old_value, new_value,
|
||||
confidence_score, source, change_notes
|
||||
) VALUES ($1, $2, $3, $4, $5, $6, $7)
|
||||
`, [
|
||||
disp.id,
|
||||
'google_rating',
|
||||
null,
|
||||
businessData.rating.toString(),
|
||||
confidence,
|
||||
'google_maps',
|
||||
`Google rating from Maps search`
|
||||
]);
|
||||
console.log(` 📝 Created change record for Google rating`);
|
||||
changesMadeForDispensary++;
|
||||
}
|
||||
|
||||
if (businessData.reviewCount) {
|
||||
await pool.query(`
|
||||
INSERT INTO dispensary_changes (
|
||||
dispensary_id, field_name, old_value, new_value,
|
||||
confidence_score, source, change_notes
|
||||
) VALUES ($1, $2, $3, $4, $5, $6, $7)
|
||||
`, [
|
||||
disp.id,
|
||||
'google_review_count',
|
||||
null,
|
||||
businessData.reviewCount.toString(),
|
||||
confidence,
|
||||
'google_maps',
|
||||
`Google review count from Maps search`
|
||||
]);
|
||||
console.log(` 📝 Created change record for Google review count`);
|
||||
changesMadeForDispensary++;
|
||||
}
|
||||
|
||||
if (changesMadeForDispensary > 0) {
|
||||
console.log(` ✅ Created ${changesMadeForDispensary} change record(s) for review (${confidence} confidence)`);
|
||||
changesCreated += changesMadeForDispensary;
|
||||
} else {
|
||||
console.log(` ⏭️ No new data found`);
|
||||
skipped++;
|
||||
}
|
||||
|
||||
} catch (error) {
|
||||
console.log(` ❌ Error: ${error}`);
|
||||
failed++;
|
||||
}
|
||||
|
||||
// Rate limiting - wait between requests
|
||||
await page.waitForTimeout(3000 + Math.random() * 2000);
|
||||
}
|
||||
|
||||
console.log('\n' + '='.repeat(80));
|
||||
console.log(`\n📊 Summary:`);
|
||||
console.log(` 📝 Change records created: ${changesCreated}`);
|
||||
console.log(` ⏭️ Skipped (no new data): ${skipped}`);
|
||||
console.log(` ❌ Failed: ${failed}`);
|
||||
console.log(`\n💡 Visit the Change Approval page to review and approve these changes.`);
|
||||
|
||||
} finally {
|
||||
await browser.close();
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
async function main() {
|
||||
try {
|
||||
await enrichFromGoogleMaps();
|
||||
} catch (error) {
|
||||
console.error('Fatal error:', error);
|
||||
process.exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
main();
|
||||
284
backend/archive/enrich-dispensaries-from-google.ts
Normal file
284
backend/archive/enrich-dispensaries-from-google.ts
Normal file
@@ -0,0 +1,284 @@
|
||||
import { chromium } from 'playwright';
|
||||
import { pool } from './src/db/migrate';
|
||||
import { getStateProxy, getRandomProxy } from './src/utils/proxyManager';
|
||||
|
||||
interface DispensaryEnrichment {
|
||||
id: number;
|
||||
azdhs_name: string;
|
||||
address: string;
|
||||
city: string;
|
||||
state: string;
|
||||
zip: string;
|
||||
dba_name?: string;
|
||||
website?: string;
|
||||
google_phone?: string;
|
||||
google_rating?: number;
|
||||
google_review_count?: number;
|
||||
confidence: 'high' | 'medium' | 'low';
|
||||
notes?: string;
|
||||
}
|
||||
|
||||
async function enrichDispensariesFromGoogle() {
|
||||
console.log('🔍 Starting Google enrichment for AZDHS dispensaries\n');
|
||||
|
||||
// Get an Arizona proxy if available, otherwise any proxy
|
||||
let proxy = await getStateProxy('Arizona');
|
||||
if (!proxy) {
|
||||
console.log('⚠️ No Arizona proxy available, trying any US proxy...');
|
||||
proxy = await getRandomProxy();
|
||||
}
|
||||
|
||||
if (!proxy) {
|
||||
console.log('❌ No proxies available. Please add proxies to the database.');
|
||||
await pool.end();
|
||||
return;
|
||||
}
|
||||
|
||||
console.log(`🔌 Using proxy: ${proxy.server}\n`);
|
||||
|
||||
const browser = await chromium.launch({
|
||||
headless: true,
|
||||
args: [
|
||||
'--disable-blink-features=AutomationControlled',
|
||||
]
|
||||
});
|
||||
|
||||
const contextOptions: any = {
|
||||
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
|
||||
viewport: { width: 1920, height: 1080 },
|
||||
locale: 'en-US',
|
||||
timezoneId: 'America/Phoenix',
|
||||
geolocation: { latitude: 33.4484, longitude: -112.0740 }, // Phoenix, AZ
|
||||
permissions: ['geolocation'],
|
||||
proxy: {
|
||||
server: proxy.server,
|
||||
username: proxy.username,
|
||||
password: proxy.password
|
||||
}
|
||||
};
|
||||
|
||||
const context = await browser.newContext(contextOptions);
|
||||
|
||||
// Add stealth techniques
|
||||
await context.addInitScript(() => {
|
||||
// Remove webdriver flag
|
||||
Object.defineProperty(navigator, 'webdriver', { get: () => false });
|
||||
|
||||
// Chrome runtime
|
||||
(window as any).chrome = { runtime: {} };
|
||||
|
||||
// Permissions
|
||||
const originalQuery = window.navigator.permissions.query;
|
||||
window.navigator.permissions.query = (parameters: any) => (
|
||||
parameters.name === 'notifications' ?
|
||||
Promise.resolve({ state: Notification.permission } as PermissionStatus) :
|
||||
originalQuery(parameters)
|
||||
);
|
||||
});
|
||||
|
||||
const page = await context.newPage();
|
||||
|
||||
try {
|
||||
// Get all dispensaries that don't have website yet
|
||||
const result = await pool.query(`
|
||||
SELECT id, name, address, city, state, zip, phone
|
||||
FROM azdhs_list
|
||||
WHERE website IS NULL OR website = ''
|
||||
ORDER BY id
|
||||
LIMIT 2
|
||||
`);
|
||||
|
||||
const dispensaries = result.rows;
|
||||
console.log(`📋 Found ${dispensaries.length} dispensaries to enrich\n`);
|
||||
|
||||
let enriched = 0;
|
||||
let failed = 0;
|
||||
const needsReview: DispensaryEnrichment[] = [];
|
||||
|
||||
for (const disp of dispensaries) {
|
||||
console.log(`\n🔍 Processing: ${disp.name}`);
|
||||
console.log(` Address: ${disp.address}, ${disp.city}, ${disp.state} ${disp.zip}`);
|
||||
|
||||
try {
|
||||
// Search Google for the address + dispensary
|
||||
const searchQuery = `${disp.address}, ${disp.city}, ${disp.state} ${disp.zip} dispensary`;
|
||||
const searchUrl = `https://www.google.com/search?q=${encodeURIComponent(searchQuery)}`;
|
||||
|
||||
console.log(` Searching: ${searchQuery}`);
|
||||
await page.goto(searchUrl, { waitUntil: 'domcontentloaded', timeout: 15000 });
|
||||
await page.waitForTimeout(2000);
|
||||
|
||||
// Try to extract Google Business info
|
||||
const businessData = await page.evaluate(() => {
|
||||
const data: any = {};
|
||||
|
||||
// Try to find business name
|
||||
const nameSelectors = [
|
||||
'[data-attrid="title"]',
|
||||
'h2[data-attrid="title"]',
|
||||
'.SPZz6b h2',
|
||||
'h3.LC20lb',
|
||||
'.kp-header .SPZz6b'
|
||||
];
|
||||
|
||||
for (const selector of nameSelectors) {
|
||||
const el = document.querySelector(selector);
|
||||
if (el?.textContent) {
|
||||
data.name = el.textContent.trim();
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
// Try to find website
|
||||
const websiteSelectors = [
|
||||
'a[data-dtype="d3ph"]',
|
||||
'.yuRUbf a',
|
||||
'a.ab_button[href^="http"]'
|
||||
];
|
||||
|
||||
for (const selector of websiteSelectors) {
|
||||
const el = document.querySelector(selector) as HTMLAnchorElement;
|
||||
if (el?.href && !el.href.includes('google.com')) {
|
||||
data.website = el.href;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
// Try to find phone
|
||||
const phoneSelectors = [
|
||||
'[data-dtype="d3ph"]',
|
||||
'span[data-dtype="d3ph"]',
|
||||
'.LrzXr.zdqRlf'
|
||||
];
|
||||
|
||||
for (const selector of phoneSelectors) {
|
||||
const el = document.querySelector(selector);
|
||||
if (el?.textContent && /\d{3}.*\d{3}.*\d{4}/.test(el.textContent)) {
|
||||
data.phone = el.textContent.trim();
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
// Try to find rating
|
||||
const ratingEl = document.querySelector('.Aq14fc');
|
||||
if (ratingEl?.textContent) {
|
||||
const match = ratingEl.textContent.match(/(\d+\.?\d*)/);
|
||||
if (match) data.rating = parseFloat(match[1]);
|
||||
}
|
||||
|
||||
// Try to find review count
|
||||
const reviewEl = document.querySelector('.hqzQac span');
|
||||
if (reviewEl?.textContent) {
|
||||
const match = reviewEl.textContent.match(/(\d+)/);
|
||||
if (match) data.reviewCount = parseInt(match[1]);
|
||||
}
|
||||
|
||||
return data;
|
||||
});
|
||||
|
||||
console.log(` Found data:`, businessData);
|
||||
|
||||
// Determine confidence level
|
||||
let confidence: 'high' | 'medium' | 'low' = 'low';
|
||||
if (businessData.name && businessData.website && businessData.phone) {
|
||||
confidence = 'high';
|
||||
} else if (businessData.name && (businessData.website || businessData.phone)) {
|
||||
confidence = 'medium';
|
||||
}
|
||||
|
||||
const enrichment: DispensaryEnrichment = {
|
||||
id: disp.id,
|
||||
azdhs_name: disp.name,
|
||||
address: disp.address,
|
||||
city: disp.city,
|
||||
state: disp.state,
|
||||
zip: disp.zip,
|
||||
dba_name: businessData.name,
|
||||
website: businessData.website,
|
||||
google_phone: businessData.phone,
|
||||
google_rating: businessData.rating,
|
||||
google_review_count: businessData.reviewCount,
|
||||
confidence
|
||||
};
|
||||
|
||||
if (confidence === 'high') {
|
||||
// Auto-update high confidence matches
|
||||
await pool.query(`
|
||||
UPDATE azdhs_list
|
||||
SET
|
||||
dba_name = $1,
|
||||
website = $2,
|
||||
google_rating = $3,
|
||||
google_review_count = $4,
|
||||
updated_at = CURRENT_TIMESTAMP
|
||||
WHERE id = $5
|
||||
`, [
|
||||
businessData.name,
|
||||
businessData.website,
|
||||
businessData.rating,
|
||||
businessData.reviewCount,
|
||||
disp.id
|
||||
]);
|
||||
|
||||
console.log(` ✅ Updated (high confidence)`);
|
||||
enriched++;
|
||||
} else {
|
||||
// Flag for manual review
|
||||
needsReview.push(enrichment);
|
||||
console.log(` ⚠️ Needs review (${confidence} confidence)`);
|
||||
}
|
||||
|
||||
} catch (error) {
|
||||
console.log(` ❌ Error: ${error}`);
|
||||
failed++;
|
||||
}
|
||||
|
||||
// Rate limiting - wait between requests
|
||||
await page.waitForTimeout(3000 + Math.random() * 2000);
|
||||
}
|
||||
|
||||
console.log('\n' + '='.repeat(80));
|
||||
console.log(`\n📊 Summary:`);
|
||||
console.log(` ✅ Enriched: ${enriched}`);
|
||||
console.log(` ⚠️ Needs review: ${needsReview.length}`);
|
||||
console.log(` ❌ Failed: ${failed}`);
|
||||
|
||||
if (needsReview.length > 0) {
|
||||
console.log('\n📋 Dispensaries needing manual review:\n');
|
||||
console.table(needsReview.map(d => ({
|
||||
ID: d.id,
|
||||
'AZDHS Name': d.azdhs_name.substring(0, 30),
|
||||
'Google Name': d.dba_name?.substring(0, 30) || '-',
|
||||
Website: d.website ? 'Yes' : 'No',
|
||||
Phone: d.google_phone ? 'Yes' : 'No',
|
||||
Confidence: d.confidence
|
||||
})));
|
||||
}
|
||||
|
||||
} finally {
|
||||
await browser.close();
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
// Add missing columns if they don't exist
|
||||
async function setupDatabase() {
|
||||
await pool.query(`
|
||||
ALTER TABLE azdhs_list
|
||||
ADD COLUMN IF NOT EXISTS dba_name VARCHAR(255),
|
||||
ADD COLUMN IF NOT EXISTS google_rating DECIMAL(2,1),
|
||||
ADD COLUMN IF NOT EXISTS google_review_count INTEGER
|
||||
`);
|
||||
}
|
||||
|
||||
async function main() {
|
||||
try {
|
||||
await setupDatabase();
|
||||
await enrichDispensariesFromGoogle();
|
||||
} catch (error) {
|
||||
console.error('Fatal error:', error);
|
||||
process.exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
main();
|
||||
218
backend/archive/enrich-prices.ts
Normal file
218
backend/archive/enrich-prices.ts
Normal file
@@ -0,0 +1,218 @@
|
||||
import { firefox } from 'playwright';
|
||||
import { pool } from './src/db/migrate.js';
|
||||
import { getRandomProxy } from './src/utils/proxyManager.js';
|
||||
|
||||
const workerNum = process.argv[2] || `P${Date.now().toString().slice(-4)}`;
|
||||
const dispensaryId = parseInt(process.argv[3] || '112', 10);
|
||||
const batchSize = 10; // Process 10 products per batch
|
||||
|
||||
interface Product {
|
||||
id: number;
|
||||
slug: string;
|
||||
name: string;
|
||||
brand: string;
|
||||
dutchie_url: string;
|
||||
}
|
||||
|
||||
async function getProductsNeedingPrices(limit: number): Promise<Product[]> {
|
||||
const result = await pool.query(`
|
||||
SELECT id, slug, name, brand, dutchie_url
|
||||
FROM products
|
||||
WHERE dispensary_id = $1
|
||||
AND regular_price IS NULL
|
||||
AND dutchie_url IS NOT NULL
|
||||
ORDER BY id
|
||||
LIMIT $2
|
||||
`, [dispensaryId, limit]);
|
||||
|
||||
return result.rows;
|
||||
}
|
||||
|
||||
async function extractPriceFromPage(page: any, productUrl: string): Promise<{
|
||||
regularPrice?: number;
|
||||
salePrice?: number;
|
||||
}> {
|
||||
try {
|
||||
console.log(`[${workerNum}] Loading: ${productUrl}`);
|
||||
|
||||
await page.goto(productUrl, {
|
||||
waitUntil: 'domcontentloaded',
|
||||
timeout: 30000
|
||||
});
|
||||
|
||||
await page.waitForTimeout(2000);
|
||||
|
||||
// Extract price data from the page
|
||||
const priceData = await page.evaluate(() => {
|
||||
// Try JSON-LD structured data first
|
||||
const scripts = Array.from(document.querySelectorAll('script[type="application/ld+json"]'));
|
||||
|
||||
for (const script of scripts) {
|
||||
try {
|
||||
const data = JSON.parse(script.textContent || '');
|
||||
if (data['@type'] === 'Product' && data.offers) {
|
||||
return {
|
||||
regularPrice: parseFloat(data.offers.price) || undefined,
|
||||
salePrice: undefined
|
||||
};
|
||||
}
|
||||
} catch (e) {
|
||||
// Continue to next script
|
||||
}
|
||||
}
|
||||
|
||||
// Fallback: extract from page text
|
||||
const pageText = document.body.textContent || '';
|
||||
|
||||
// Look for price patterns like $30.00, $40.00
|
||||
const priceMatches = pageText.match(/\$(\d+\.?\d*)/g);
|
||||
|
||||
if (priceMatches && priceMatches.length > 0) {
|
||||
const prices = priceMatches.map(p => parseFloat(p.replace('$', '')));
|
||||
|
||||
// If we find multiple prices, assume first is sale, second is regular
|
||||
if (prices.length >= 2) {
|
||||
return {
|
||||
salePrice: Math.min(prices[0], prices[1]),
|
||||
regularPrice: Math.max(prices[0], prices[1])
|
||||
};
|
||||
} else if (prices.length === 1) {
|
||||
return {
|
||||
regularPrice: prices[0],
|
||||
salePrice: undefined
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
return { regularPrice: undefined, salePrice: undefined };
|
||||
});
|
||||
|
||||
return priceData;
|
||||
|
||||
} catch (error: any) {
|
||||
console.log(`[${workerNum}] ⚠️ Error loading page: ${error.message}`);
|
||||
return { regularPrice: undefined, salePrice: undefined };
|
||||
}
|
||||
}
|
||||
|
||||
async function updateProductPrice(
|
||||
productId: number,
|
||||
regularPrice?: number,
|
||||
salePrice?: number
|
||||
): Promise<void> {
|
||||
await pool.query(`
|
||||
UPDATE products
|
||||
SET regular_price = $1,
|
||||
sale_price = $2,
|
||||
updated_at = CURRENT_TIMESTAMP
|
||||
WHERE id = $3
|
||||
`, [regularPrice || null, salePrice || null, productId]);
|
||||
}
|
||||
|
||||
async function main() {
|
||||
console.log(`\n${'='.repeat(70)}`);
|
||||
console.log(`💰 PRICE ENRICHMENT WORKER - ${workerNum}`);
|
||||
console.log(` Dispensary ID: ${dispensaryId}`);
|
||||
console.log(` Batch Size: ${batchSize} products`);
|
||||
console.log(`${'='.repeat(70)}\n`);
|
||||
|
||||
// Get dispensary info
|
||||
const dispensaryResult = await pool.query(
|
||||
"SELECT id, name, menu_url FROM dispensaries WHERE id = $1",
|
||||
[dispensaryId]
|
||||
);
|
||||
|
||||
if (dispensaryResult.rows.length === 0) {
|
||||
console.error(`[${workerNum}] ❌ Dispensary ID ${dispensaryId} not found`);
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
console.log(`[${workerNum}] ✅ Dispensary: ${dispensaryResult.rows[0].name}\n`);
|
||||
|
||||
// Get proxy
|
||||
const proxy = await getRandomProxy();
|
||||
if (!proxy) {
|
||||
console.log(`[${workerNum}] ❌ No proxy available`);
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
console.log(`[${workerNum}] 🔐 Using proxy: ${proxy.server}\n`);
|
||||
|
||||
// Launch browser
|
||||
const browser = await firefox.launch({ headless: true });
|
||||
|
||||
const context = await browser.newContext({
|
||||
viewport: { width: 1920, height: 1080 },
|
||||
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
|
||||
proxy: {
|
||||
server: proxy.server,
|
||||
username: proxy.username,
|
||||
password: proxy.password
|
||||
}
|
||||
});
|
||||
|
||||
const page = await context.newPage();
|
||||
|
||||
let totalProcessed = 0;
|
||||
let totalWithPrices = 0;
|
||||
let totalNoPrices = 0;
|
||||
let batchNum = 0;
|
||||
|
||||
// Keep processing batches
|
||||
while (true) {
|
||||
const products = await getProductsNeedingPrices(batchSize);
|
||||
|
||||
if (products.length === 0) {
|
||||
console.log(`[${workerNum}] ℹ️ No more products need price enrichment`);
|
||||
break;
|
||||
}
|
||||
|
||||
batchNum++;
|
||||
console.log(`[${workerNum}] ${'─'.repeat(70)}`);
|
||||
console.log(`[${workerNum}] 📦 BATCH #${batchNum}: Processing ${products.length} products`);
|
||||
console.log(`[${workerNum}] ${'─'.repeat(70)}\n`);
|
||||
|
||||
for (let i = 0; i < products.length; i++) {
|
||||
const product = products[i];
|
||||
|
||||
console.log(`[${workerNum}] [${i + 1}/${products.length}] ${product.brand} - ${product.name.substring(0, 40)}`);
|
||||
|
||||
const { regularPrice, salePrice } = await extractPriceFromPage(page, product.dutchie_url);
|
||||
|
||||
await updateProductPrice(product.id, regularPrice, salePrice);
|
||||
|
||||
totalProcessed++;
|
||||
|
||||
if (regularPrice || salePrice) {
|
||||
totalWithPrices++;
|
||||
const priceStr = salePrice
|
||||
? `Sale: $${salePrice.toFixed(2)} (Reg: $${regularPrice?.toFixed(2) || 'N/A'})`
|
||||
: `Price: $${regularPrice?.toFixed(2)}`;
|
||||
console.log(`[${workerNum}] ✅ ${priceStr}`);
|
||||
} else {
|
||||
totalNoPrices++;
|
||||
console.log(`[${workerNum}] ⚠️ No price found`);
|
||||
}
|
||||
|
||||
// Small delay between products
|
||||
await page.waitForTimeout(500);
|
||||
}
|
||||
|
||||
console.log(`\n[${workerNum}] ✅ Batch #${batchNum} complete\n`);
|
||||
|
||||
// Delay between batches
|
||||
await page.waitForTimeout(2000);
|
||||
}
|
||||
|
||||
console.log(`\n[${workerNum}] ${'='.repeat(70)}`);
|
||||
console.log(`[${workerNum}] ✅ PRICE ENRICHMENT COMPLETE`);
|
||||
console.log(`[${workerNum}] Products processed: ${totalProcessed}`);
|
||||
console.log(`[${workerNum}] Products with prices: ${totalWithPrices}`);
|
||||
console.log(`[${workerNum}] Products without prices: ${totalNoPrices}`);
|
||||
console.log(`[${workerNum}] ${'='.repeat(70)}\n`);
|
||||
|
||||
await browser.close();
|
||||
await pool.end();
|
||||
}
|
||||
|
||||
main().catch(console.error);
|
||||
178
backend/archive/enrich-single-dispensary.ts
Normal file
178
backend/archive/enrich-single-dispensary.ts
Normal file
@@ -0,0 +1,178 @@
|
||||
import { firefox } from 'playwright';
|
||||
import { pool } from './src/db/migrate';
|
||||
import { getRandomProxy } from './src/utils/proxyManager';
|
||||
|
||||
async function enrichSingleDispensary() {
|
||||
const address = '1115 Circulo Mercado';
|
||||
const city = 'Rio Rico';
|
||||
const state = 'AZ';
|
||||
const zip = '85648';
|
||||
|
||||
console.log(`🦊 Enriching: ${address}, ${city}, ${state} ${zip}\\n`);
|
||||
|
||||
const proxy = await getRandomProxy();
|
||||
if (!proxy) {
|
||||
console.log('❌ No proxies available');
|
||||
await pool.end();
|
||||
return;
|
||||
}
|
||||
|
||||
console.log(`🔌 Using proxy: ${proxy.server}\\n`);
|
||||
|
||||
const browser = await firefox.launch({
|
||||
headless: false,
|
||||
firefoxUserPrefs: {
|
||||
'geo.enabled': true,
|
||||
'geo.provider.use_corelocation': true,
|
||||
'geo.prompt.testing': true,
|
||||
'geo.prompt.testing.allow': true,
|
||||
}
|
||||
});
|
||||
|
||||
const contextOptions: any = {
|
||||
viewport: { width: 1920, height: 1080 },
|
||||
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0',
|
||||
geolocation: { latitude: 33.4484, longitude: -112.0740 },
|
||||
permissions: ['geolocation'],
|
||||
proxy: {
|
||||
server: proxy.server,
|
||||
username: proxy.username,
|
||||
password: proxy.password
|
||||
}
|
||||
};
|
||||
|
||||
const context = await browser.newContext(contextOptions);
|
||||
const page = await context.newPage();
|
||||
|
||||
try {
|
||||
// Search Google Maps
|
||||
const searchQuery = `dispensary ${address}, ${city}, ${state} ${zip}`;
|
||||
const encodedQuery = encodeURIComponent(searchQuery);
|
||||
const url = `https://www.google.com/maps/search/${encodedQuery}`;
|
||||
|
||||
console.log(`📍 Searching Maps: ${searchQuery}`);
|
||||
await page.goto(url, {
|
||||
waitUntil: 'domcontentloaded',
|
||||
timeout: 30000
|
||||
});
|
||||
|
||||
// Wait for results
|
||||
await page.waitForTimeout(5000);
|
||||
|
||||
// Extract business data
|
||||
const businessData = await page.evaluate(() => {
|
||||
const data: any = {};
|
||||
|
||||
// Try to find the place name
|
||||
const nameSelectors = [
|
||||
'h1[class*="fontHeadline"]',
|
||||
'h1.DUwDvf',
|
||||
'[data-item-id*="name"] h1'
|
||||
];
|
||||
|
||||
for (const selector of nameSelectors) {
|
||||
const el = document.querySelector(selector);
|
||||
if (el?.textContent) {
|
||||
data.name = el.textContent.trim();
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
// Try to find website
|
||||
const websiteSelectors = [
|
||||
'a[data-item-id="authority"]',
|
||||
'a[data-tooltip="Open website"]',
|
||||
'a[aria-label*="Website"]'
|
||||
];
|
||||
|
||||
for (const selector of websiteSelectors) {
|
||||
const el = document.querySelector(selector) as HTMLAnchorElement;
|
||||
if (el?.href && !el.href.includes('google.com')) {
|
||||
data.website = el.href;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
// Try to find phone
|
||||
const phoneSelectors = [
|
||||
'button[data-item-id*="phone"]',
|
||||
'button[aria-label*="Phone"]',
|
||||
'[data-tooltip*="Copy phone number"]'
|
||||
];
|
||||
|
||||
for (const selector of phoneSelectors) {
|
||||
const el = document.querySelector(selector);
|
||||
if (el?.textContent) {
|
||||
const phoneMatch = el.textContent.match(/\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}/);
|
||||
if (phoneMatch) {
|
||||
data.phone = phoneMatch[0];
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Try to find rating
|
||||
const ratingEl = document.querySelector('[role="img"][aria-label*="stars"]');
|
||||
if (ratingEl) {
|
||||
const label = ratingEl.getAttribute('aria-label');
|
||||
const match = label?.match(/(\d+\.?\d*)\s*stars?/);
|
||||
if (match) {
|
||||
data.rating = parseFloat(match[1]);
|
||||
}
|
||||
}
|
||||
|
||||
// Try to find review count
|
||||
const reviewEl = document.querySelector('[aria-label*="reviews"]');
|
||||
if (reviewEl) {
|
||||
const label = reviewEl.getAttribute('aria-label');
|
||||
const match = label?.match(/([\d,]+)\s*reviews?/);
|
||||
if (match) {
|
||||
data.reviewCount = parseInt(match[1].replace(/,/g, ''));
|
||||
}
|
||||
}
|
||||
|
||||
return data;
|
||||
});
|
||||
|
||||
console.log(`\\n✅ Found data:`, businessData);
|
||||
|
||||
// Update dutchie database
|
||||
if (businessData.name) {
|
||||
await pool.query(`
|
||||
UPDATE azdhs_list
|
||||
SET
|
||||
dba_name = $1,
|
||||
website = $2,
|
||||
phone = $3,
|
||||
google_rating = $4,
|
||||
google_review_count = $5,
|
||||
updated_at = CURRENT_TIMESTAMP
|
||||
WHERE address = $6 AND city = $7
|
||||
`, [
|
||||
businessData.name,
|
||||
businessData.website,
|
||||
businessData.phone?.replace(/\\D/g, ''),
|
||||
businessData.rating,
|
||||
businessData.reviewCount,
|
||||
address,
|
||||
city
|
||||
]);
|
||||
|
||||
console.log(`\\n✅ Updated database!`);
|
||||
} else {
|
||||
console.log(`\\n❌ No business name found`);
|
||||
}
|
||||
|
||||
// Keep browser open for 10 seconds so you can see the results
|
||||
console.log(`\\n⏳ Keeping browser open for 10 seconds...`);
|
||||
await page.waitForTimeout(10000);
|
||||
|
||||
} catch (error) {
|
||||
console.log(`❌ Error: ${error}`);
|
||||
} finally {
|
||||
await browser.close();
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
enrichSingleDispensary();
|
||||
75
backend/archive/explore-curaleaf-menu.ts
Normal file
75
backend/archive/explore-curaleaf-menu.ts
Normal file
@@ -0,0 +1,75 @@
|
||||
import { chromium } from 'playwright';
|
||||
import { bypassAgeGatePlaywright } from './src/utils/age-gate-playwright';
|
||||
|
||||
async function exploreCuraleafMenu() {
|
||||
const browser = await chromium.launch({ headless: true });
|
||||
const context = await browser.newContext({
|
||||
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
|
||||
viewport: { width: 1280, height: 720 }
|
||||
});
|
||||
|
||||
const page = await context.newPage();
|
||||
|
||||
try {
|
||||
console.log('Loading Curaleaf and bypassing age gate...\n');
|
||||
await page.goto('https://curaleaf.com/stores/curaleaf-dispensary-48th-street');
|
||||
await page.waitForTimeout(2000);
|
||||
await bypassAgeGatePlaywright(page, 'Arizona');
|
||||
await page.waitForTimeout(3000);
|
||||
|
||||
console.log('Current URL:', page.url());
|
||||
console.log('\nLooking for menu navigation...\n');
|
||||
|
||||
// Look for "View Menu", "Shop Now", "Order Online" type links
|
||||
const menuSelectors = [
|
||||
'a:has-text("View Menu")',
|
||||
'a:has-text("Shop Now")',
|
||||
'a:has-text("Order Online")',
|
||||
'a:has-text("Browse Menu")',
|
||||
'button:has-text("View Menu")',
|
||||
'button:has-text("Shop Now")'
|
||||
];
|
||||
|
||||
for (const selector of menuSelectors) {
|
||||
const count = await page.locator(selector).count();
|
||||
if (count > 0) {
|
||||
console.log(`Found: ${selector} (${count} elements)`);
|
||||
const element = page.locator(selector).first();
|
||||
const href = await element.getAttribute('href').catch(() => null);
|
||||
const text = await element.textContent();
|
||||
console.log(` Text: ${text}`);
|
||||
console.log(` Link: ${href}`);
|
||||
|
||||
if (href) {
|
||||
console.log(`\n✅ Found menu link! Clicking to see where it goes...`);
|
||||
await element.click();
|
||||
await page.waitForTimeout(5000);
|
||||
console.log(` New URL: ${page.url()}\n`);
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Check if the URL changed
|
||||
const currentUrl = page.url();
|
||||
if (currentUrl.includes('menu') || currentUrl.includes('shop')) {
|
||||
console.log(`✅ Navigated to menu page: ${currentUrl}\n`);
|
||||
|
||||
// Now look for products
|
||||
await page.waitForTimeout(3000);
|
||||
const productCount = await page.locator('[data-testid^="product"], .product-card, [class*="ProductCard"]').count();
|
||||
console.log(`Products found: ${productCount}`);
|
||||
}
|
||||
|
||||
// Take final screenshot
|
||||
await page.screenshot({ path: '/tmp/curaleaf-menu-exploration.png', fullPage: true });
|
||||
console.log('\n📸 Screenshot saved: /tmp/curaleaf-menu-exploration.png');
|
||||
|
||||
} catch (error) {
|
||||
console.error('Error:', error);
|
||||
} finally {
|
||||
await browser.close();
|
||||
}
|
||||
}
|
||||
|
||||
exploreCuraleafMenu();
|
||||
61
backend/archive/find-dutchie-menu-curaleaf.ts
Normal file
61
backend/archive/find-dutchie-menu-curaleaf.ts
Normal file
@@ -0,0 +1,61 @@
|
||||
import { chromium } from 'playwright';
|
||||
import { bypassAgeGatePlaywright } from './src/utils/age-gate-playwright';
|
||||
|
||||
async function findDutchieMenu() {
|
||||
const browser = await chromium.launch({ headless: true });
|
||||
const context = await browser.newContext({
|
||||
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
|
||||
viewport: { width: 1280, height: 720 }
|
||||
});
|
||||
|
||||
const page = await context.newPage();
|
||||
|
||||
try {
|
||||
console.log('Loading Curaleaf page and bypassing age gate...\n');
|
||||
await page.goto('https://curaleaf.com/stores/curaleaf-dispensary-48th-street');
|
||||
await page.waitForTimeout(2000);
|
||||
await bypassAgeGatePlaywright(page, 'Arizona');
|
||||
await page.waitForTimeout(5000);
|
||||
|
||||
console.log('Looking for Dutchie menu...\n');
|
||||
|
||||
// Check for iframes
|
||||
const frames = page.frames();
|
||||
console.log(`Total frames on page: ${frames.length}`);
|
||||
|
||||
for (let i = 0; i < frames.length; i++) {
|
||||
const frame = frames[i];
|
||||
const url = frame.url();
|
||||
console.log(`Frame ${i}: ${url}`);
|
||||
|
||||
if (url.includes('dutchie')) {
|
||||
console.log(` ✅ This is the Dutchie menu!`);
|
||||
|
||||
// Try to find the actual menu URL
|
||||
const menuUrl = url;
|
||||
console.log(`\n📍 Dutchie Menu URL: ${menuUrl}\n`);
|
||||
|
||||
// We should scrape this URL directly instead of the Curaleaf page
|
||||
console.log('💡 Strategy: Scrape the Dutchie iframe URL directly');
|
||||
console.log(` Example: ${menuUrl.split('?')[0]}/shop/flower\n`);
|
||||
}
|
||||
}
|
||||
|
||||
// Also check for links to Dutchie
|
||||
const dutchieLinks = await page.locator('a[href*="dutchie"]').all();
|
||||
console.log(`\nDutchie links found: ${dutchieLinks.length}`);
|
||||
|
||||
for (const link of dutchieLinks.slice(0, 3)) {
|
||||
const href = await link.getAttribute('href');
|
||||
const text = await link.textContent();
|
||||
console.log(` - ${text}: ${href}`);
|
||||
}
|
||||
|
||||
} catch (error) {
|
||||
console.error('Error:', error);
|
||||
} finally {
|
||||
await browser.close();
|
||||
}
|
||||
}
|
||||
|
||||
findDutchieMenu();
|
||||
90
backend/archive/find-price-location.ts
Normal file
90
backend/archive/find-price-location.ts
Normal file
@@ -0,0 +1,90 @@
|
||||
import { firefox } from 'playwright';
|
||||
import { getRandomProxy } from './src/utils/proxyManager.js';
|
||||
|
||||
async function findPriceLocation() {
|
||||
const proxy = await getRandomProxy();
|
||||
if (!proxy) {
|
||||
console.log('No proxy available');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const browser = await firefox.launch({ headless: true });
|
||||
const context = await browser.newContext({
|
||||
viewport: { width: 1920, height: 1080 },
|
||||
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
|
||||
proxy: {
|
||||
server: proxy.server,
|
||||
username: proxy.username,
|
||||
password: proxy.password
|
||||
}
|
||||
});
|
||||
|
||||
const page = await context.newPage();
|
||||
|
||||
const brandUrl = 'https://dutchie.com/embedded-menu/AZ-Deeply-Rooted/brands/alien-labs';
|
||||
console.log(`Loading: ${brandUrl}`);
|
||||
|
||||
await page.goto(brandUrl, { waitUntil: 'domcontentloaded', timeout: 60000 });
|
||||
await page.waitForTimeout(5000);
|
||||
|
||||
// Find where the prices are in the DOM
|
||||
const priceLocations = await page.evaluate(() => {
|
||||
const findPriceElements = (root: Element) => {
|
||||
const walker = document.createTreeWalker(
|
||||
root,
|
||||
NodeFilter.SHOW_ELEMENT,
|
||||
null
|
||||
);
|
||||
|
||||
const results: Array<{
|
||||
tag: string;
|
||||
classes: string;
|
||||
text: string;
|
||||
isProductCard: boolean;
|
||||
parentInfo: string;
|
||||
}> = [];
|
||||
|
||||
let node: Node | null;
|
||||
|
||||
while (node = walker.nextNode()) {
|
||||
const element = node as Element;
|
||||
const text = element.textContent?.trim() || '';
|
||||
|
||||
if (text.includes('$') && text.match(/\$\d+/)) {
|
||||
const isProductCard = element.closest('a[href*="/product/"]') !== null;
|
||||
const parent = element.parentElement;
|
||||
|
||||
results.push({
|
||||
tag: element.tagName.toLowerCase(),
|
||||
classes: element.className,
|
||||
text: text.substring(0, 150),
|
||||
isProductCard,
|
||||
parentInfo: parent ? `${parent.tagName.toLowerCase()}.${parent.className}` : 'none'
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
return results.slice(0, 15);
|
||||
};
|
||||
|
||||
return findPriceElements(document.body);
|
||||
});
|
||||
|
||||
console.log('\n' + '='.repeat(80));
|
||||
console.log('PRICE ELEMENT LOCATIONS:');
|
||||
console.log('='.repeat(80));
|
||||
|
||||
priceLocations.forEach((loc, idx) => {
|
||||
console.log(`\n${idx + 1}. <${loc.tag}> ${loc.classes ? `class="${loc.classes}"` : ''}`);
|
||||
console.log(` In product card: ${loc.isProductCard ? 'YES' : 'NO'}`);
|
||||
console.log(` Parent: ${loc.parentInfo}`);
|
||||
console.log(` Text: ${loc.text}`);
|
||||
});
|
||||
|
||||
console.log('\n' + '='.repeat(80));
|
||||
|
||||
await browser.close();
|
||||
process.exit(0);
|
||||
}
|
||||
|
||||
findPriceLocation().catch(console.error);
|
||||
82
backend/archive/find-products-curaleaf.ts
Normal file
82
backend/archive/find-products-curaleaf.ts
Normal file
@@ -0,0 +1,82 @@
|
||||
import { chromium } from 'playwright';
|
||||
import { bypassAgeGatePlaywright } from './src/utils/age-gate-playwright';
|
||||
|
||||
async function findProducts() {
|
||||
const browser = await chromium.launch({ headless: true });
|
||||
const context = await browser.newContext({
|
||||
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
|
||||
viewport: { width: 1280, height: 720 }
|
||||
});
|
||||
|
||||
const page = await context.newPage();
|
||||
|
||||
try {
|
||||
console.log('Loading and bypassing age gate...\n');
|
||||
await page.goto('https://curaleaf.com/stores/curaleaf-dispensary-48th-street');
|
||||
await page.waitForTimeout(2000);
|
||||
await bypassAgeGatePlaywright(page, 'Arizona');
|
||||
await page.waitForTimeout(5000);
|
||||
|
||||
console.log('Scrolling page...\n');
|
||||
await page.evaluate(async () => {
|
||||
for (let i = 0; i < 3; i++) {
|
||||
window.scrollBy(0, 1000);
|
||||
await new Promise(r => setTimeout(r, 500));
|
||||
}
|
||||
});
|
||||
await page.waitForTimeout(2000);
|
||||
|
||||
// Try various product selectors
|
||||
const selectors = [
|
||||
'[data-testid^="product"]',
|
||||
'.product',
|
||||
'[class*="Product"]',
|
||||
'[class*="product"]',
|
||||
'article',
|
||||
'[role="article"]',
|
||||
'.card',
|
||||
'[class*="Card"]',
|
||||
'[class*="item"]',
|
||||
'[class*="Item"]'
|
||||
];
|
||||
|
||||
console.log('Trying different product selectors:\n');
|
||||
|
||||
for (const selector of selectors) {
|
||||
const count = await page.locator(selector).count();
|
||||
if (count > 0) {
|
||||
console.log(`✅ ${selector}: ${count} elements found`);
|
||||
|
||||
// Get first few elements
|
||||
const elements = await page.locator(selector).all();
|
||||
for (let i = 0; i < Math.min(3, elements.length); i++) {
|
||||
const text = await elements[i].textContent();
|
||||
const classes = await elements[i].getAttribute('class');
|
||||
console.log(` ${i + 1}. Classes: ${classes}`);
|
||||
console.log(` Text (first 100 chars): ${text?.trim().substring(0, 100)}`);
|
||||
}
|
||||
console.log('');
|
||||
}
|
||||
}
|
||||
|
||||
// Check for buttons/links that might load products
|
||||
const menuButtons = await page.locator('button:has-text("menu"), button:has-text("shop"), a:has-text("View Menu")').count();
|
||||
console.log(`\nMenu/Shop buttons: ${menuButtons}`);
|
||||
|
||||
if (menuButtons > 0) {
|
||||
const buttons = await page.locator('button:has-text("menu"), button:has-text("shop"), a:has-text("View Menu")').all();
|
||||
console.log('Menu buttons found:');
|
||||
for (const btn of buttons.slice(0, 3)) {
|
||||
const text = await btn.textContent();
|
||||
console.log(` - ${text?.trim()}`);
|
||||
}
|
||||
}
|
||||
|
||||
} catch (error) {
|
||||
console.error('Error:', error);
|
||||
} finally {
|
||||
await browser.close();
|
||||
}
|
||||
}
|
||||
|
||||
findProducts();
|
||||
58
backend/archive/fix-brand-names.ts
Normal file
58
backend/archive/fix-brand-names.ts
Normal file
@@ -0,0 +1,58 @@
|
||||
import { pool } from './src/db/migrate.js';
|
||||
|
||||
async function fixBrandNames() {
|
||||
console.log('\n' + '='.repeat(60));
|
||||
console.log('FIXING BRAND NAMES WITH DUPLICATED LETTERS');
|
||||
console.log('='.repeat(60) + '\n');
|
||||
|
||||
// Get all brands with potential duplication (where first letter is duplicated)
|
||||
const result = await pool.query(`
|
||||
SELECT id, brand_slug, brand_name
|
||||
FROM brand_scrape_jobs
|
||||
WHERE dispensary_id = 112
|
||||
AND LENGTH(brand_name) > 1
|
||||
AND SUBSTRING(brand_name, 1, 1) = SUBSTRING(brand_name, 2, 1)
|
||||
ORDER BY brand_name
|
||||
`);
|
||||
|
||||
console.log(`Found ${result.rows.length} brands with potential duplication:\n`);
|
||||
|
||||
let fixed = 0;
|
||||
let skipped = 0;
|
||||
|
||||
for (const row of result.rows) {
|
||||
const originalName = row.brand_name;
|
||||
// Remove the first letter
|
||||
const fixedName = originalName.substring(1);
|
||||
|
||||
console.log(`${row.id}. "${originalName}" → "${fixedName}"`);
|
||||
|
||||
// Update the database
|
||||
await pool.query(`
|
||||
UPDATE brand_scrape_jobs
|
||||
SET brand_name = $1, updated_at = NOW()
|
||||
WHERE id = $2
|
||||
`, [fixedName, row.id]);
|
||||
|
||||
// Also update products table if brand was already scraped
|
||||
const updateResult = await pool.query(`
|
||||
UPDATE products
|
||||
SET brand = $1, updated_at = CURRENT_TIMESTAMP
|
||||
WHERE dispensary_id = 112 AND brand = $2
|
||||
`, [fixedName, originalName]);
|
||||
|
||||
if (updateResult.rowCount && updateResult.rowCount > 0) {
|
||||
console.log(` ✓ Updated ${updateResult.rowCount} products`);
|
||||
}
|
||||
|
||||
fixed++;
|
||||
}
|
||||
|
||||
console.log('\n' + '='.repeat(60));
|
||||
console.log(`✅ FIXED ${fixed} BRAND NAMES`);
|
||||
console.log('='.repeat(60) + '\n');
|
||||
|
||||
await pool.end();
|
||||
}
|
||||
|
||||
fixBrandNames().catch(console.error);
|
||||
52
backend/archive/fix-brands-table.ts
Normal file
52
backend/archive/fix-brands-table.ts
Normal file
@@ -0,0 +1,52 @@
|
||||
import { pool } from './src/db/migrate';
|
||||
|
||||
async function fixBrandsTable() {
|
||||
console.log('🔧 Fixing brands table structure...\n');
|
||||
|
||||
try {
|
||||
// Drop old tables
|
||||
await pool.query('DROP TABLE IF EXISTS store_brands CASCADE');
|
||||
await pool.query('DROP TABLE IF EXISTS product_brands CASCADE');
|
||||
await pool.query('DROP TABLE IF EXISTS brands CASCADE');
|
||||
|
||||
console.log('✅ Dropped old tables');
|
||||
|
||||
// Create brands table with correct structure
|
||||
await pool.query(`
|
||||
CREATE TABLE brands (
|
||||
id SERIAL PRIMARY KEY,
|
||||
name VARCHAR(255) UNIQUE NOT NULL,
|
||||
logo_url TEXT,
|
||||
first_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
last_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||||
)
|
||||
`);
|
||||
console.log('✅ Created brands table');
|
||||
|
||||
// Create store_brands junction table
|
||||
await pool.query(`
|
||||
CREATE TABLE store_brands (
|
||||
id SERIAL PRIMARY KEY,
|
||||
store_id INTEGER REFERENCES stores(id) ON DELETE CASCADE,
|
||||
brand_id INTEGER REFERENCES brands(id) ON DELETE CASCADE,
|
||||
first_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
last_seen_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
active BOOLEAN DEFAULT true,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
UNIQUE(store_id, brand_id)
|
||||
)
|
||||
`);
|
||||
console.log('✅ Created store_brands table');
|
||||
|
||||
console.log('\n✅ Brands tables fixed!');
|
||||
} catch (error) {
|
||||
console.error('❌ Error:', error);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
fixBrandsTable();
|
||||
48
backend/archive/fix-category-urls.ts
Normal file
48
backend/archive/fix-category-urls.ts
Normal file
@@ -0,0 +1,48 @@
|
||||
import { pool } from './src/db/migrate';
|
||||
import { dutchieTemplate } from './src/scrapers/templates/dutchie';
|
||||
|
||||
async function fixCategoryUrls() {
|
||||
console.log('🔧 Fixing category URLs for Dutchie stores\n');
|
||||
|
||||
try {
|
||||
// Get all categories with /shop/ in the URL
|
||||
const result = await pool.query(`
|
||||
SELECT id, store_id, name, slug, dutchie_url
|
||||
FROM categories
|
||||
WHERE dutchie_url LIKE '%/shop/%'
|
||||
ORDER BY store_id, id
|
||||
`);
|
||||
|
||||
console.log(`Found ${result.rows.length} categories to fix:\n`);
|
||||
|
||||
for (const category of result.rows) {
|
||||
console.log(`Category: ${category.name}`);
|
||||
console.log(` Old URL: ${category.dutchie_url}`);
|
||||
|
||||
// Extract base URL (everything before /shop/)
|
||||
const baseUrl = category.dutchie_url.split('/shop/')[0];
|
||||
|
||||
// Use template to build correct URL
|
||||
const newUrl = dutchieTemplate.buildCategoryUrl(baseUrl, category.name);
|
||||
|
||||
console.log(` New URL: ${newUrl}`);
|
||||
|
||||
// Update the category
|
||||
await pool.query(`
|
||||
UPDATE categories
|
||||
SET dutchie_url = $1
|
||||
WHERE id = $2
|
||||
`, [newUrl, category.id]);
|
||||
|
||||
console.log(` ✅ Updated\n`);
|
||||
}
|
||||
|
||||
console.log(`\n✅ All category URLs fixed!`);
|
||||
} catch (error) {
|
||||
console.error('❌ Error:', error);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
fixCategoryUrls();
|
||||
72
backend/archive/fix-docker-db-urls.ts
Normal file
72
backend/archive/fix-docker-db-urls.ts
Normal file
@@ -0,0 +1,72 @@
|
||||
import { Pool } from 'pg';
|
||||
|
||||
// Docker database connection
|
||||
const pool = new Pool({
|
||||
connectionString: 'postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus'
|
||||
});
|
||||
|
||||
const updates = [
|
||||
{ id: 18, url: 'https://dutchie.com/dispensary/curaleaf-dispensary-48th-street' },
|
||||
{ id: 19, url: 'https://dutchie.com/dispensary/curaleaf-83rd-ave' },
|
||||
{ id: 20, url: 'https://dutchie.com/dispensary/curaleaf-bell-road' },
|
||||
{ id: 21, url: 'https://dutchie.com/dispensary/curaleaf-camelback' },
|
||||
{ id: 22, url: 'https://dutchie.com/dispensary/curaleaf-central' },
|
||||
{ id: 23, url: 'https://dutchie.com/dispensary/curaleaf-gilbert' },
|
||||
{ id: 24, url: 'https://dutchie.com/dispensary/curaleaf-glendale-east' },
|
||||
{ id: 25, url: 'https://dutchie.com/dispensary/curaleaf-glendale-east-kind-relief' },
|
||||
{ id: 26, url: 'https://dutchie.com/dispensary/curaleaf-glendale' },
|
||||
{ id: 27, url: 'https://dutchie.com/dispensary/curaleaf-dispensary-midtown' },
|
||||
{ id: 28, url: 'https://dutchie.com/dispensary/curaleaf-dispensary-peoria' },
|
||||
{ id: 29, url: 'https://dutchie.com/dispensary/curaleaf-phoenix' },
|
||||
{ id: 30, url: 'https://dutchie.com/dispensary/curaleaf-queen-creek' },
|
||||
{ id: 31, url: 'https://dutchie.com/dispensary/curaleaf-queen-creek-whoa' },
|
||||
{ id: 32, url: 'https://dutchie.com/dispensary/curaleaf-dispensary-scottsdale' },
|
||||
{ id: 33, url: 'https://dutchie.com/dispensary/curaleaf-dispensary-sedona' },
|
||||
{ id: 34, url: 'https://dutchie.com/dispensary/curaleaf-tucson' },
|
||||
{ id: 35, url: 'https://dutchie.com/dispensary/curaleaf-youngtown' },
|
||||
];
|
||||
|
||||
const solFlowerStores = [
|
||||
{ name: 'Sol Flower - Sun City', slug: 'sol-flower-sun-city', url: 'https://dutchie.com/dispensary/sol-flower-dispensary' },
|
||||
{ name: 'Sol Flower - South Tucson', slug: 'sol-flower-south-tucson', url: 'https://dutchie.com/dispensary/sol-flower-dispensary-south-tucson' },
|
||||
{ name: 'Sol Flower - North Tucson', slug: 'sol-flower-north-tucson', url: 'https://dutchie.com/dispensary/sol-flower-dispensary-north-tucson' },
|
||||
{ name: 'Sol Flower - McClintock (Tempe)', slug: 'sol-flower-mcclintock', url: 'https://dutchie.com/dispensary/sol-flower-dispensary-mcclintock' },
|
||||
{ name: 'Sol Flower - Deer Valley (Phoenix)', slug: 'sol-flower-deer-valley', url: 'https://dutchie.com/dispensary/sol-flower-dispensary-deer-valley' },
|
||||
];
|
||||
|
||||
async function fixDatabase() {
|
||||
console.log('🔧 Fixing Docker database...\n');
|
||||
|
||||
try {
|
||||
// Update Curaleaf URLs
|
||||
console.log('Updating Curaleaf URLs to Dutchie...');
|
||||
for (const update of updates) {
|
||||
await pool.query('UPDATE stores SET dutchie_url = $1 WHERE id = $2', [update.url, update.id]);
|
||||
console.log(`✅ Updated store ID ${update.id}`);
|
||||
}
|
||||
|
||||
// Add Sol Flower stores
|
||||
console.log('\nAdding Sol Flower stores...');
|
||||
for (const store of solFlowerStores) {
|
||||
const result = await pool.query(
|
||||
`INSERT INTO stores (name, slug, dutchie_url, active, scrape_enabled, logo_url)
|
||||
VALUES ($1, $2, $3, true, true, $4)
|
||||
ON CONFLICT (slug) DO UPDATE SET dutchie_url = $3
|
||||
RETURNING id`,
|
||||
[store.name, store.slug, store.url, 'https://dutchie.com/favicon.ico']
|
||||
);
|
||||
console.log(`✅ Added ${store.name} (ID: ${result.rows[0].id})`);
|
||||
}
|
||||
|
||||
console.log('\n✅ Database fixed! Showing all stores:');
|
||||
const all = await pool.query('SELECT id, name, dutchie_url FROM stores ORDER BY id');
|
||||
console.table(all.rows);
|
||||
|
||||
} catch (error) {
|
||||
console.error('❌ Error:', error);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
fixDatabase();
|
||||
38
backend/archive/fix-stuck-job.js
Normal file
38
backend/archive/fix-stuck-job.js
Normal file
@@ -0,0 +1,38 @@
|
||||
const { Pool } = require('pg');
|
||||
|
||||
const pool = new Pool({
|
||||
connectionString: process.env.DATABASE_URL || 'postgresql://kelly:kelly@localhost:5432/hub'
|
||||
});
|
||||
|
||||
(async () => {
|
||||
try {
|
||||
// Check for stuck jobs
|
||||
const result = await pool.query(`
|
||||
SELECT id, status, total_proxies, tested_proxies, started_at
|
||||
FROM proxy_test_jobs
|
||||
WHERE status IN ('pending', 'running')
|
||||
ORDER BY created_at DESC
|
||||
`);
|
||||
|
||||
console.log('Found stuck jobs:', result.rows);
|
||||
|
||||
// Mark them as cancelled
|
||||
if (result.rows.length > 0) {
|
||||
await pool.query(`
|
||||
UPDATE proxy_test_jobs
|
||||
SET status = 'cancelled',
|
||||
completed_at = CURRENT_TIMESTAMP,
|
||||
updated_at = CURRENT_TIMESTAMP
|
||||
WHERE status IN ('pending', 'running')
|
||||
`);
|
||||
console.log('✅ Cleaned up', result.rows.length, 'stuck jobs');
|
||||
} else {
|
||||
console.log('No stuck jobs found');
|
||||
}
|
||||
|
||||
process.exit(0);
|
||||
} catch (error) {
|
||||
console.error('❌ Error:', error);
|
||||
process.exit(1);
|
||||
}
|
||||
})();
|
||||
33
backend/archive/get-best-dispensary.ts
Normal file
33
backend/archive/get-best-dispensary.ts
Normal file
@@ -0,0 +1,33 @@
|
||||
import { pool } from './src/db/migrate.js';
|
||||
|
||||
async function main() {
|
||||
const result = await pool.query(`
|
||||
SELECT
|
||||
id,
|
||||
name,
|
||||
slug,
|
||||
website,
|
||||
menu_url,
|
||||
scraper_template,
|
||||
scraper_config,
|
||||
address,
|
||||
city,
|
||||
state,
|
||||
zip,
|
||||
phone,
|
||||
email
|
||||
FROM dispensaries
|
||||
WHERE slug = 'best-dispensary'
|
||||
`);
|
||||
|
||||
if (result.rows.length === 0) {
|
||||
console.log('BEST Dispensary not found');
|
||||
} else {
|
||||
console.log('BEST Dispensary Details:');
|
||||
console.log(JSON.stringify(result.rows[0], null, 2));
|
||||
}
|
||||
|
||||
await pool.end();
|
||||
}
|
||||
|
||||
main().catch(console.error);
|
||||
23
backend/archive/get-dispensary-ids.ts
Normal file
23
backend/archive/get-dispensary-ids.ts
Normal file
@@ -0,0 +1,23 @@
|
||||
import { pool } from './src/db/migrate.js';
|
||||
|
||||
async function getDispensaryIds() {
|
||||
const result = await pool.query(
|
||||
`SELECT id, name, slug
|
||||
FROM dispensaries
|
||||
WHERE dutchie_url IS NOT NULL
|
||||
ORDER BY id
|
||||
LIMIT 30`
|
||||
);
|
||||
|
||||
console.log('Dispensary IDs available for scraping:');
|
||||
console.log('=====================================');
|
||||
result.rows.forEach(row => {
|
||||
console.log(`${row.id}: ${row.name} (${row.slug})`);
|
||||
});
|
||||
|
||||
console.log(`\nTotal: ${result.rows.length} dispensaries`);
|
||||
|
||||
await pool.end();
|
||||
}
|
||||
|
||||
getDispensaryIds();
|
||||
76
backend/archive/import-proxies.ts
Normal file
76
backend/archive/import-proxies.ts
Normal file
@@ -0,0 +1,76 @@
|
||||
import { readFileSync } from 'fs';
|
||||
import { Pool } from 'pg';
|
||||
|
||||
const pool = new Pool({
|
||||
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
|
||||
});
|
||||
|
||||
async function importProxies() {
|
||||
try {
|
||||
console.log('📥 Reading proxy list from file...\n');
|
||||
|
||||
const proxyFile = '/home/kelly/Downloads/proxyscrape_premium_http_proxies.txt';
|
||||
const fileContent = readFileSync(proxyFile, 'utf-8');
|
||||
const lines = fileContent.trim().split('\n');
|
||||
|
||||
console.log(`Found ${lines.length} proxies in file\n`);
|
||||
|
||||
let imported = 0;
|
||||
let duplicates = 0;
|
||||
let errors = 0;
|
||||
|
||||
for (const line of lines) {
|
||||
const trimmed = line.trim();
|
||||
if (!trimmed) continue;
|
||||
|
||||
const [host, portStr] = trimmed.split(':');
|
||||
const port = parseInt(portStr);
|
||||
|
||||
if (!host || !port) {
|
||||
console.log(`❌ Invalid format: ${trimmed}`);
|
||||
errors++;
|
||||
continue;
|
||||
}
|
||||
|
||||
try {
|
||||
// Insert proxy without testing (set active = false initially)
|
||||
const result = await pool.query(`
|
||||
INSERT INTO proxies (host, port, protocol, active)
|
||||
VALUES ($1, $2, 'http', false)
|
||||
ON CONFLICT (host, port, protocol) DO NOTHING
|
||||
RETURNING id
|
||||
`, [host, port]);
|
||||
|
||||
if (result.rows.length > 0) {
|
||||
imported++;
|
||||
if (imported % 100 === 0) {
|
||||
console.log(`📥 Imported ${imported} proxies...`);
|
||||
}
|
||||
} else {
|
||||
duplicates++;
|
||||
}
|
||||
} catch (error: any) {
|
||||
console.log(`❌ Error importing ${host}:${port}: ${error.message}`);
|
||||
errors++;
|
||||
}
|
||||
}
|
||||
|
||||
console.log('\n✅ Import complete!');
|
||||
console.log('─'.repeat(60));
|
||||
console.log(`Imported: ${imported}`);
|
||||
console.log(`Duplicates: ${duplicates}`);
|
||||
console.log(`Errors: ${errors}`);
|
||||
console.log('─'.repeat(60));
|
||||
|
||||
// Show final count
|
||||
const countResult = await pool.query('SELECT COUNT(*) as total FROM proxies');
|
||||
console.log(`\n📊 Total proxies in database: ${countResult.rows[0].total}\n`);
|
||||
|
||||
} catch (error: any) {
|
||||
console.error('❌ Error:', error.message);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
importProxies();
|
||||
47
backend/archive/inspect-treez-structure.ts
Normal file
47
backend/archive/inspect-treez-structure.ts
Normal file
@@ -0,0 +1,47 @@
|
||||
import { chromium } from 'playwright';
|
||||
|
||||
const GOOGLE_UA = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36';
|
||||
|
||||
async function main() {
|
||||
const browser = await chromium.launch({ headless: true });
|
||||
const context = await browser.newContext({
|
||||
userAgent: GOOGLE_UA
|
||||
});
|
||||
|
||||
const page = await context.newPage();
|
||||
|
||||
try {
|
||||
console.log('Loading menu page...');
|
||||
await page.goto('https://best.treez.io/onlinemenu/?customerType=ADULT', {
|
||||
waitUntil: 'networkidle',
|
||||
timeout: 30000
|
||||
});
|
||||
|
||||
await page.waitForTimeout(3000);
|
||||
|
||||
// Get first 3 menu items and inspect their HTML structure
|
||||
const menuItems = await page.locator('.menu-item').all();
|
||||
|
||||
console.log(`\nFound ${menuItems.length} menu items. Inspecting first 3:\n`);
|
||||
|
||||
for (let i = 0; i < Math.min(3, menuItems.length); i++) {
|
||||
const item = menuItems[i];
|
||||
const html = await item.innerHTML();
|
||||
const text = await item.textContent();
|
||||
|
||||
console.log(`\n========== ITEM ${i + 1} ==========`);
|
||||
console.log('HTML:');
|
||||
console.log(html);
|
||||
console.log('\nText Content:');
|
||||
console.log(text);
|
||||
console.log('='.repeat(40));
|
||||
}
|
||||
|
||||
} catch (error: any) {
|
||||
console.error('Error:', error.message);
|
||||
} finally {
|
||||
await browser.close();
|
||||
}
|
||||
}
|
||||
|
||||
main().catch(console.error);
|
||||
50
backend/archive/launch-25-scrapers.sh
Executable file
50
backend/archive/launch-25-scrapers.sh
Executable file
@@ -0,0 +1,50 @@
|
||||
#!/bin/bash
|
||||
|
||||
# Launch 25 parallel brand scrapers for Deeply Rooted (dispensary ID 112)
|
||||
# 90 brands total:
|
||||
# - Workers 1-15: 4 brands each (brands 0-59)
|
||||
# - Workers 16-25: 3 brands each (brands 60-89)
|
||||
|
||||
DISPENSARY_ID=112
|
||||
DB_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus"
|
||||
|
||||
echo "🚀 Launching 25 parallel brand scrapers..."
|
||||
echo "============================================================"
|
||||
|
||||
# Workers 1-15: 4 brands each
|
||||
for i in {1..15}; do
|
||||
START=$(( (i-1) * 4 ))
|
||||
END=$(( START + 3 ))
|
||||
echo "Starting Worker $i: brands $START-$END"
|
||||
DATABASE_URL="$DB_URL" npx tsx scrape-parallel-brands.ts $DISPENSARY_ID $START $END "W$i" 2>&1 | tee /tmp/scraper-w$i.log &
|
||||
done
|
||||
|
||||
# Workers 16-25: 3 brands each
|
||||
for i in {16..25}; do
|
||||
START=$(( 60 + (i-16) * 3 ))
|
||||
END=$(( START + 2 ))
|
||||
echo "Starting Worker $i: brands $START-$END"
|
||||
DATABASE_URL="$DB_URL" npx tsx scrape-parallel-brands.ts $DISPENSARY_ID $START $END "W$i" 2>&1 | tee /tmp/scraper-w$i.log &
|
||||
done
|
||||
|
||||
echo "============================================================"
|
||||
echo "✅ All 25 workers launched!"
|
||||
echo ""
|
||||
echo "Monitor progress with:"
|
||||
echo " tail -f /tmp/scraper-w1.log (or w2, w3, etc.)"
|
||||
echo " ps aux | grep scrape-parallel"
|
||||
echo ""
|
||||
echo "View all logs:"
|
||||
echo " tail -f /tmp/scraper-w*.log"
|
||||
echo ""
|
||||
echo "Brand allocation:"
|
||||
echo " Workers 1-15: 4 brands each (brands 0-59)"
|
||||
echo " Workers 16-25: 3 brands each (brands 60-89)"
|
||||
echo " Total: 90 brands across 25 workers"
|
||||
echo "============================================================"
|
||||
|
||||
# Wait for all background jobs to finish
|
||||
wait
|
||||
|
||||
echo ""
|
||||
echo "✅ All 25 workers completed!"
|
||||
22
backend/archive/list-dispensaries.ts
Normal file
22
backend/archive/list-dispensaries.ts
Normal file
@@ -0,0 +1,22 @@
|
||||
import { pool } from './src/db/migrate.js';
|
||||
|
||||
async function main() {
|
||||
const result = await pool.query(`
|
||||
SELECT id, name, slug, menu_url, scraper_template
|
||||
FROM dispensaries
|
||||
ORDER BY name
|
||||
`);
|
||||
|
||||
console.log('Available Dispensaries:');
|
||||
result.rows.forEach((row, idx) => {
|
||||
console.log(`${idx + 1}. ${row.name} (ID: ${row.id})`);
|
||||
console.log(` Slug: ${row.slug}`);
|
||||
console.log(` Menu URL: ${row.menu_url || 'N/A'}`);
|
||||
console.log(` Template: ${row.scraper_template || 'N/A'}`);
|
||||
console.log('');
|
||||
});
|
||||
|
||||
await pool.end();
|
||||
}
|
||||
|
||||
main().catch(console.error);
|
||||
27
backend/archive/list-stores.ts
Normal file
27
backend/archive/list-stores.ts
Normal file
@@ -0,0 +1,27 @@
|
||||
import { Pool } from 'pg';
|
||||
|
||||
const pool = new Pool({
|
||||
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
|
||||
});
|
||||
|
||||
async function listStores() {
|
||||
try {
|
||||
const result = await pool.query('SELECT id, name, slug, dutchie_url FROM stores ORDER BY id LIMIT 10');
|
||||
|
||||
console.log('\nStores in database:');
|
||||
console.log('─'.repeat(80));
|
||||
result.rows.forEach(store => {
|
||||
console.log(`ID: ${store.id} | Slug: ${store.slug}`);
|
||||
console.log(`Name: ${store.name}`);
|
||||
console.log(`URL: ${store.dutchie_url}`);
|
||||
console.log('─'.repeat(80));
|
||||
});
|
||||
console.log(`Total: ${result.rowCount} stores\n`);
|
||||
} catch (error: any) {
|
||||
console.error('Error:', error.message);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
listStores();
|
||||
83
backend/archive/migrate-proxy-locations.ts
Normal file
83
backend/archive/migrate-proxy-locations.ts
Normal file
@@ -0,0 +1,83 @@
|
||||
import { Pool } from 'pg';
|
||||
|
||||
async function migrateProxyLocations() {
|
||||
const sailPool = new Pool({
|
||||
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
|
||||
});
|
||||
|
||||
const dutchiePool = new Pool({
|
||||
connectionString: 'postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus'
|
||||
});
|
||||
|
||||
try {
|
||||
console.log('🔄 Migrating proxy location data from SAIL → DUTCHIE\n');
|
||||
|
||||
// Get all proxies with location data from SAIL
|
||||
const result = await sailPool.query(`
|
||||
SELECT host, port, city, state, country, country_code, active
|
||||
FROM proxies
|
||||
WHERE city IS NOT NULL
|
||||
ORDER BY host, port
|
||||
`);
|
||||
|
||||
console.log(`📋 Found ${result.rows.length} proxies with location data in SAIL\n`);
|
||||
|
||||
let updated = 0;
|
||||
let notFound = 0;
|
||||
|
||||
for (const proxy of result.rows) {
|
||||
// Update matching proxy in DUTCHIE database
|
||||
const updateResult = await dutchiePool.query(`
|
||||
UPDATE proxies
|
||||
SET
|
||||
city = $1,
|
||||
state = $2,
|
||||
country = $3,
|
||||
country_code = $4,
|
||||
active = $5
|
||||
WHERE host = $6 AND port = $7
|
||||
RETURNING id
|
||||
`, [
|
||||
proxy.city,
|
||||
proxy.state,
|
||||
proxy.country,
|
||||
proxy.country_code,
|
||||
proxy.active,
|
||||
proxy.host,
|
||||
proxy.port
|
||||
]);
|
||||
|
||||
if (updateResult.rowCount > 0) {
|
||||
updated++;
|
||||
if (updated % 100 === 0) {
|
||||
console.log(` ✅ Updated ${updated}/${result.rows.length}...`);
|
||||
}
|
||||
} else {
|
||||
notFound++;
|
||||
console.log(` ⚠️ Proxy not found in DUTCHIE: ${proxy.host}:${proxy.port}`);
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`\n📊 Results:`);
|
||||
console.log(` ✅ Updated: ${updated}`);
|
||||
console.log(` ⚠️ Not found: ${notFound}`);
|
||||
|
||||
// Verify final counts
|
||||
const dutchieCheck = await dutchiePool.query(`
|
||||
SELECT
|
||||
COUNT(*) as total,
|
||||
COUNT(*) FILTER (WHERE city IS NOT NULL) as with_location
|
||||
FROM proxies
|
||||
`);
|
||||
|
||||
console.log(`\n✅ DUTCHIE database now has:`);
|
||||
console.log(` Total proxies: ${dutchieCheck.rows[0].total}`);
|
||||
console.log(` With location data: ${dutchieCheck.rows[0].with_location}`);
|
||||
|
||||
} finally {
|
||||
await sailPool.end();
|
||||
await dutchiePool.end();
|
||||
}
|
||||
}
|
||||
|
||||
migrateProxyLocations().catch(console.error);
|
||||
204
backend/archive/parse-azdhs-copied-data.ts
Normal file
204
backend/archive/parse-azdhs-copied-data.ts
Normal file
@@ -0,0 +1,204 @@
|
||||
import { pool } from './src/db/migrate';
|
||||
import * as fs from 'fs';
|
||||
|
||||
async function parseAZDHSCopiedData() {
|
||||
console.log('📋 Parsing manually copied AZDHS data...\n');
|
||||
|
||||
const fileContent = fs.readFileSync('/home/kelly/Documents/azdhs dispos', 'utf-8');
|
||||
const lines = fileContent.split('\n').map(l => l.trim()).filter(l => l.length > 0);
|
||||
|
||||
const dispensaries: any[] = [];
|
||||
let i = 0;
|
||||
|
||||
while (i < lines.length) {
|
||||
// Skip if we hit "Get Details" without processing (edge case)
|
||||
if (lines[i] === 'Get Details') {
|
||||
i++;
|
||||
continue;
|
||||
}
|
||||
|
||||
let name = lines[i];
|
||||
let companyName = '';
|
||||
let statusLine = '';
|
||||
let address = '';
|
||||
let linesConsumed = 0;
|
||||
|
||||
// Check if next line is "Get Details" (edge case: end of data)
|
||||
if (i + 1 < lines.length && lines[i + 1] === 'Get Details') {
|
||||
i += 2;
|
||||
continue;
|
||||
}
|
||||
|
||||
// Two possible patterns:
|
||||
// Pattern 1 (5 lines): Name, Company, Status, Address, Get Details
|
||||
// Pattern 2 (4 lines): Name, Status, Address, Get Details (company name = dispensary name)
|
||||
|
||||
// Check if line i+1 contains "Operating" (status line)
|
||||
const nextLine = lines[i + 1];
|
||||
if (nextLine && nextLine.includes('Operating')) {
|
||||
// Pattern 2: 4-line format
|
||||
companyName = name; // Company name same as dispensary name
|
||||
statusLine = lines[i + 1];
|
||||
address = lines[i + 2];
|
||||
const getDetails = lines[i + 3];
|
||||
|
||||
if (getDetails !== 'Get Details') {
|
||||
console.log(`⚠️ Skipping malformed 4-line record at line ${i}: ${name}`);
|
||||
i++;
|
||||
continue;
|
||||
}
|
||||
linesConsumed = 4;
|
||||
} else {
|
||||
// Pattern 1: 5-line format
|
||||
companyName = lines[i + 1];
|
||||
statusLine = lines[i + 2];
|
||||
address = lines[i + 3];
|
||||
const getDetails = lines[i + 4];
|
||||
|
||||
if (getDetails !== 'Get Details') {
|
||||
console.log(`⚠️ Skipping malformed 5-line record at line ${i}: ${name}`);
|
||||
i++;
|
||||
continue;
|
||||
}
|
||||
linesConsumed = 5;
|
||||
}
|
||||
|
||||
// Parse phone from status line
|
||||
let phone = '';
|
||||
const phoneMatch = statusLine.match(/(\(\d{3}\)\s*\d{3}-\d{4}|\d{3}-\d{3}-\d{4}|\d{10})/);
|
||||
if (phoneMatch) {
|
||||
phone = phoneMatch[1].replace(/\D/g, ''); // Remove all non-digits
|
||||
}
|
||||
|
||||
// Parse address components
|
||||
// Format: "123 Street Name, City, AZ 85001"
|
||||
let street = '', city = '', state = 'AZ', zip = '';
|
||||
|
||||
const addressParts = address.split(',').map(p => p.trim());
|
||||
|
||||
if (addressParts.length >= 3) {
|
||||
street = addressParts.slice(0, -2).join(', '); // Everything before city
|
||||
city = addressParts[addressParts.length - 2]; // Second to last
|
||||
|
||||
// Last part should be "AZ 85001"
|
||||
const stateZip = addressParts[addressParts.length - 1];
|
||||
const stateZipMatch = stateZip.match(/([A-Z]{2})\s+(\d{5})/);
|
||||
if (stateZipMatch) {
|
||||
state = stateZipMatch[1];
|
||||
zip = stateZipMatch[2];
|
||||
}
|
||||
} else if (addressParts.length === 2) {
|
||||
street = addressParts[0];
|
||||
const cityStateZip = addressParts[1];
|
||||
|
||||
// Try to extract "City, AZ 85001" from second part
|
||||
const match = cityStateZip.match(/([^,]+),?\s+([A-Z]{2})\s+(\d{5})/);
|
||||
if (match) {
|
||||
city = match[1].trim();
|
||||
state = match[2];
|
||||
zip = match[3];
|
||||
}
|
||||
} else {
|
||||
// Single part address - try best effort
|
||||
street = address;
|
||||
const zipMatch = address.match(/\b(\d{5})\b/);
|
||||
if (zipMatch) zip = zipMatch[1];
|
||||
|
||||
const cityMatch = address.match(/,\s*([A-Za-z\s]+),\s*AZ/);
|
||||
if (cityMatch) city = cityMatch[1].trim();
|
||||
}
|
||||
|
||||
dispensaries.push({
|
||||
name,
|
||||
companyName,
|
||||
phone,
|
||||
street,
|
||||
city,
|
||||
state,
|
||||
zip,
|
||||
statusLine
|
||||
});
|
||||
|
||||
// Move to next record
|
||||
i += linesConsumed;
|
||||
}
|
||||
|
||||
console.log(`✅ Parsed ${dispensaries.length} dispensaries\n`);
|
||||
|
||||
if (dispensaries.length > 0) {
|
||||
console.log('📋 Sample of first 5:');
|
||||
console.table(dispensaries.slice(0, 5).map(d => ({
|
||||
name: d.name.substring(0, 30),
|
||||
city: d.city,
|
||||
phone: d.phone,
|
||||
zip: d.zip
|
||||
})));
|
||||
}
|
||||
|
||||
// Save to database
|
||||
console.log('\n💾 Saving to azdhs_list table...\n');
|
||||
|
||||
let savedCount = 0;
|
||||
let updatedCount = 0;
|
||||
let skippedCount = 0;
|
||||
|
||||
for (const disp of dispensaries) {
|
||||
if (!disp.name || disp.name.length < 3) {
|
||||
skippedCount++;
|
||||
continue;
|
||||
}
|
||||
|
||||
try {
|
||||
// Check if exists by name + address + state (to handle multiple locations with same name)
|
||||
const existing = await pool.query(
|
||||
'SELECT id FROM azdhs_list WHERE LOWER(name) = LOWER($1) AND LOWER(address) = LOWER($2) AND state = $3',
|
||||
[disp.name, disp.street, disp.state]
|
||||
);
|
||||
|
||||
const slug = disp.name.toLowerCase().replace(/[^a-z0-9]+/g, '-');
|
||||
const azdhsUrl = `https://azcarecheck.azdhs.gov/s/?name=${encodeURIComponent(disp.name)}`;
|
||||
|
||||
if (existing.rows.length > 0) {
|
||||
await pool.query(`
|
||||
UPDATE azdhs_list SET
|
||||
company_name = COALESCE($1, company_name),
|
||||
address = COALESCE($2, address),
|
||||
city = COALESCE($3, city),
|
||||
zip = COALESCE($4, zip),
|
||||
phone = COALESCE($5, phone),
|
||||
status_line = COALESCE($6, status_line),
|
||||
updated_at = CURRENT_TIMESTAMP
|
||||
WHERE id = $7
|
||||
`, [disp.companyName, disp.street, disp.city, disp.zip, disp.phone, disp.statusLine, existing.rows[0].id]);
|
||||
updatedCount++;
|
||||
} else {
|
||||
await pool.query(`
|
||||
INSERT INTO azdhs_list (
|
||||
name, company_name, slug, address, city, state, zip, phone, status_line, azdhs_url,
|
||||
created_at, updated_at
|
||||
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP)
|
||||
`, [disp.name, disp.companyName, slug, disp.street, disp.city, disp.state, disp.zip, disp.phone, disp.statusLine, azdhsUrl]);
|
||||
savedCount++;
|
||||
}
|
||||
} catch (error) {
|
||||
console.error(`Error saving ${disp.name}: ${error}`);
|
||||
skippedCount++;
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`\n✅ Saved ${savedCount} new AZDHS dispensaries`);
|
||||
console.log(`✅ Updated ${updatedCount} existing AZDHS dispensaries`);
|
||||
if (skippedCount > 0) console.log(`⚠️ Skipped ${skippedCount} entries`);
|
||||
|
||||
// Show total in azdhs_list
|
||||
const total = await pool.query(`SELECT COUNT(*) as total FROM azdhs_list`);
|
||||
console.log(`\n🎯 Total in azdhs_list table: ${total.rows[0].total}`);
|
||||
|
||||
// Show total in stores (for comparison)
|
||||
const storesTotal = await pool.query(`SELECT COUNT(*) as total FROM stores WHERE state = 'AZ'`);
|
||||
console.log(`🎯 Total in stores table (AZ): ${storesTotal.rows[0].total}`);
|
||||
|
||||
await pool.end();
|
||||
}
|
||||
|
||||
parseAZDHSCopiedData();
|
||||
181
backend/archive/populate-jobs.ts
Normal file
181
backend/archive/populate-jobs.ts
Normal file
@@ -0,0 +1,181 @@
|
||||
import { firefox } from 'playwright';
|
||||
import { pool } from './src/db/migrate.js';
|
||||
import { getRandomProxy } from './src/utils/proxyManager.js';
|
||||
|
||||
const dispensaryId = parseInt(process.argv[2] || '112', 10);
|
||||
|
||||
interface Brand {
|
||||
slug: string;
|
||||
name: string;
|
||||
url: string;
|
||||
}
|
||||
|
||||
async function scrapeBrandsList(menuUrl: string, context: any, page: any): Promise<Brand[]> {
|
||||
try {
|
||||
const brandsUrl = `${menuUrl}/brands`;
|
||||
console.log(`📄 Loading brands page: ${brandsUrl}`);
|
||||
|
||||
await page.goto(brandsUrl, {
|
||||
waitUntil: 'domcontentloaded',
|
||||
timeout: 60000
|
||||
});
|
||||
|
||||
console.log(`⏳ Waiting for brands to render...`);
|
||||
await page.waitForSelector('a[href*="/brands/"]', { timeout: 45000 });
|
||||
console.log(`✅ Brands appeared!`);
|
||||
await page.waitForTimeout(3000);
|
||||
|
||||
// Scroll to load all brands
|
||||
console.log(`📜 Scrolling to load all brands...`);
|
||||
let previousHeight = 0;
|
||||
let scrollAttempts = 0;
|
||||
const maxScrolls = 10;
|
||||
|
||||
while (scrollAttempts < maxScrolls) {
|
||||
await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
|
||||
await page.waitForTimeout(1500);
|
||||
|
||||
const currentHeight = await page.evaluate(() => document.body.scrollHeight);
|
||||
if (currentHeight === previousHeight) break;
|
||||
|
||||
previousHeight = currentHeight;
|
||||
scrollAttempts++;
|
||||
}
|
||||
|
||||
// Extract all brand links
|
||||
const brands = await page.evaluate(() => {
|
||||
const brandLinks = Array.from(document.querySelectorAll('a[href*="/brands/"]'));
|
||||
|
||||
const extracted = brandLinks.map(link => {
|
||||
const href = link.getAttribute('href') || '';
|
||||
const slug = href.split('/brands/')[1]?.replace(/\/$/, '') || '';
|
||||
|
||||
// Get brand name from the ContentWrapper div to avoid placeholder letter duplication
|
||||
const contentWrapper = link.querySelector('[class*="ContentWrapper"]');
|
||||
const name = contentWrapper?.textContent?.trim() || link.textContent?.trim() || slug;
|
||||
|
||||
return {
|
||||
slug,
|
||||
name,
|
||||
url: href.startsWith('http') ? href : href
|
||||
};
|
||||
});
|
||||
|
||||
// Filter out duplicates and invalid entries
|
||||
const seen = new Set();
|
||||
const unique = extracted.filter(b => {
|
||||
if (!b.slug || !b.name || seen.has(b.slug)) return false;
|
||||
seen.add(b.slug);
|
||||
return true;
|
||||
});
|
||||
|
||||
return unique;
|
||||
});
|
||||
|
||||
console.log(`✅ Found ${brands.length} total brands`);
|
||||
return brands;
|
||||
|
||||
} catch (error: any) {
|
||||
console.error(`❌ Error scraping brands list:`, error.message);
|
||||
return [];
|
||||
}
|
||||
}
|
||||
|
||||
async function main() {
|
||||
console.log(`\n${'='.repeat(60)}`);
|
||||
console.log(`🏭 POPULATING JOB QUEUE`);
|
||||
console.log(` Dispensary ID: ${dispensaryId}`);
|
||||
console.log(`${'='.repeat(60)}\n`);
|
||||
|
||||
// Get dispensary info
|
||||
const dispensaryResult = await pool.query(
|
||||
"SELECT id, name, menu_url FROM dispensaries WHERE id = $1",
|
||||
[dispensaryId]
|
||||
);
|
||||
|
||||
if (dispensaryResult.rows.length === 0) {
|
||||
console.error(`❌ Dispensary ID ${dispensaryId} not found`);
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const menuUrl = dispensaryResult.rows[0].menu_url;
|
||||
console.log(`✅ Dispensary: ${dispensaryResult.rows[0].name}`);
|
||||
console.log(` Menu URL: ${menuUrl}\n`);
|
||||
|
||||
// Get proxy
|
||||
const proxy = await getRandomProxy();
|
||||
if (!proxy) {
|
||||
console.log(`❌ No proxy available`);
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
console.log(`🔐 Using proxy: ${proxy.server}\n`);
|
||||
|
||||
// Launch browser
|
||||
const browser = await firefox.launch({ headless: true });
|
||||
|
||||
const context = await browser.newContext({
|
||||
viewport: { width: 1920, height: 1080 },
|
||||
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
|
||||
proxy: {
|
||||
server: proxy.server,
|
||||
username: proxy.username,
|
||||
password: proxy.password
|
||||
}
|
||||
});
|
||||
|
||||
const page = await context.newPage();
|
||||
|
||||
// Get all brands
|
||||
const allBrands = await scrapeBrandsList(menuUrl, context, page);
|
||||
|
||||
if (allBrands.length === 0) {
|
||||
console.log(`❌ No brands found`);
|
||||
await browser.close();
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
console.log(`\n📋 Found ${allBrands.length} brands. Populating job queue...\n`);
|
||||
|
||||
// Insert jobs into database
|
||||
let inserted = 0;
|
||||
let skipped = 0;
|
||||
|
||||
for (const brand of allBrands) {
|
||||
try {
|
||||
await pool.query(`
|
||||
INSERT INTO brand_scrape_jobs (dispensary_id, brand_slug, brand_name, status)
|
||||
VALUES ($1, $2, $3, 'pending')
|
||||
ON CONFLICT (dispensary_id, brand_slug) DO NOTHING
|
||||
`, [dispensaryId, brand.slug, brand.name]);
|
||||
|
||||
const result = await pool.query(
|
||||
'SELECT id FROM brand_scrape_jobs WHERE dispensary_id = $1 AND brand_slug = $2',
|
||||
[dispensaryId, brand.slug]
|
||||
);
|
||||
|
||||
if (result.rows.length > 0) {
|
||||
inserted++;
|
||||
if (inserted % 10 === 0) {
|
||||
console.log(` Inserted ${inserted}/${allBrands.length} jobs...`);
|
||||
}
|
||||
} else {
|
||||
skipped++;
|
||||
}
|
||||
} catch (error: any) {
|
||||
console.error(`❌ Error inserting job for ${brand.name}:`, error.message);
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`\n${'='.repeat(60)}`);
|
||||
console.log(`✅ JOB QUEUE POPULATED`);
|
||||
console.log(` Total brands: ${allBrands.length}`);
|
||||
console.log(` Jobs inserted: ${inserted}`);
|
||||
console.log(` Jobs skipped (already exist): ${skipped}`);
|
||||
console.log(`${'='.repeat(60)}\n`);
|
||||
|
||||
await browser.close();
|
||||
await pool.end();
|
||||
}
|
||||
|
||||
main().catch(console.error);
|
||||
155
backend/archive/populate-proxy-locations.ts
Normal file
155
backend/archive/populate-proxy-locations.ts
Normal file
@@ -0,0 +1,155 @@
|
||||
import { pool } from './src/db/migrate';
|
||||
import { logger } from './src/services/logger';
|
||||
|
||||
interface GeoLocation {
|
||||
status: string;
|
||||
country: string;
|
||||
countryCode: string;
|
||||
region: string;
|
||||
regionName: string;
|
||||
city: string;
|
||||
lat: number;
|
||||
lon: number;
|
||||
query: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* Fetch geolocation data from ip-api.com (free, 45 req/min)
|
||||
*/
|
||||
async function fetchGeoLocation(ip: string): Promise<GeoLocation | null> {
|
||||
try {
|
||||
const response = await fetch(`http://ip-api.com/json/${ip}?fields=status,country,countryCode,region,regionName,city,lat,lon,query`);
|
||||
const data = await response.json();
|
||||
|
||||
if (data.status === 'success') {
|
||||
return data as GeoLocation;
|
||||
}
|
||||
|
||||
logger.warn('geo', `Failed to lookup ${ip}: ${data.message || 'Unknown error'}`);
|
||||
return null;
|
||||
} catch (error) {
|
||||
logger.error('geo', `Error fetching geolocation for ${ip}: ${error}`);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Fetch geolocation data in batches (up to 100 IPs at once)
|
||||
*/
|
||||
async function fetchGeoLocationBatch(ips: string[]): Promise<Map<string, GeoLocation>> {
|
||||
try {
|
||||
const response = await fetch('http://ip-api.com/batch?fields=status,country,countryCode,region,regionName,city,lat,lon,query', {
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'Content-Type': 'application/json',
|
||||
},
|
||||
body: JSON.stringify(ips),
|
||||
});
|
||||
|
||||
const data = await response.json() as GeoLocation[];
|
||||
const results = new Map<string, GeoLocation>();
|
||||
|
||||
for (const item of data) {
|
||||
if (item.status === 'success') {
|
||||
results.set(item.query, item);
|
||||
}
|
||||
}
|
||||
|
||||
return results;
|
||||
} catch (error) {
|
||||
logger.error('geo', `Error fetching batch geolocation: ${error}`);
|
||||
return new Map();
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Sleep for specified milliseconds
|
||||
*/
|
||||
function sleep(ms: number): Promise<void> {
|
||||
return new Promise(resolve => setTimeout(resolve, ms));
|
||||
}
|
||||
|
||||
async function populateProxyLocations() {
|
||||
console.log('🌍 Populating proxy geolocation data...\n');
|
||||
|
||||
try {
|
||||
// Get all proxies that don't have location data
|
||||
const result = await pool.query(
|
||||
'SELECT id, host FROM proxies WHERE city IS NULL OR country IS NULL ORDER BY id'
|
||||
);
|
||||
|
||||
const proxies = result.rows;
|
||||
console.log(`Found ${proxies.length} proxies without geolocation data\n`);
|
||||
|
||||
if (proxies.length === 0) {
|
||||
console.log('✅ All proxies already have geolocation data!');
|
||||
await pool.end();
|
||||
return;
|
||||
}
|
||||
|
||||
// Process in batches of 100 (API limit)
|
||||
const batchSize = 100;
|
||||
let processed = 0;
|
||||
let updated = 0;
|
||||
|
||||
for (let i = 0; i < proxies.length; i += batchSize) {
|
||||
const batch = proxies.slice(i, i + batchSize);
|
||||
const ips = batch.map(p => p.host);
|
||||
|
||||
console.log(`📍 Processing batch ${Math.floor(i / batchSize) + 1}/${Math.ceil(proxies.length / batchSize)} (${ips.length} IPs)...`);
|
||||
|
||||
const geoData = await fetchGeoLocationBatch(ips);
|
||||
|
||||
// Update database for each successful lookup
|
||||
for (const proxy of batch) {
|
||||
const geo = geoData.get(proxy.host);
|
||||
|
||||
if (geo) {
|
||||
await pool.query(
|
||||
`UPDATE proxies
|
||||
SET city = $1, state = $2, country = $3, country_code = $4, latitude = $5, longitude = $6, updated_at = NOW()
|
||||
WHERE id = $7`,
|
||||
[geo.city, geo.regionName, geo.country, geo.countryCode, geo.lat, geo.lon, proxy.id]
|
||||
);
|
||||
updated++;
|
||||
console.log(` ✓ ${proxy.host} -> ${geo.city}, ${geo.regionName}, ${geo.country}`);
|
||||
} else {
|
||||
console.log(` ✗ ${proxy.host} -> Lookup failed`);
|
||||
}
|
||||
|
||||
processed++;
|
||||
}
|
||||
|
||||
// Rate limiting: wait 1.5 seconds between batches (40 requests/min max)
|
||||
if (i + batchSize < proxies.length) {
|
||||
console.log('⏳ Waiting 1.5s to respect rate limits...\n');
|
||||
await sleep(1500);
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`\n✅ Completed!`);
|
||||
console.log(` Processed: ${processed} proxies`);
|
||||
console.log(` Updated: ${updated} proxies`);
|
||||
console.log(` Failed: ${processed - updated} proxies\n`);
|
||||
|
||||
// Show location distribution
|
||||
const locationStats = await pool.query(`
|
||||
SELECT country, state, city, COUNT(*) as count
|
||||
FROM proxies
|
||||
WHERE country IS NOT NULL
|
||||
GROUP BY country, state, city
|
||||
ORDER BY count DESC
|
||||
LIMIT 20
|
||||
`);
|
||||
|
||||
console.log('📊 Top 20 proxy locations:');
|
||||
console.table(locationStats.rows);
|
||||
|
||||
} catch (error) {
|
||||
console.error('❌ Error:', error);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
populateProxyLocations();
|
||||
165
backend/archive/preview-azdhs-import.ts
Normal file
165
backend/archive/preview-azdhs-import.ts
Normal file
@@ -0,0 +1,165 @@
|
||||
import * as fs from 'fs';
|
||||
|
||||
async function previewImport() {
|
||||
console.log('📋 Preview of AZDHS import data...\n');
|
||||
|
||||
const fileContent = fs.readFileSync('/home/kelly/Documents/azdhs dispos', 'utf-8');
|
||||
const lines = fileContent.split('\n').map(l => l.trim()).filter(l => l.length > 0);
|
||||
|
||||
const dispensaries: any[] = [];
|
||||
let i = 0;
|
||||
|
||||
while (i < lines.length) {
|
||||
if (lines[i] === 'Get Details') {
|
||||
i++;
|
||||
continue;
|
||||
}
|
||||
|
||||
if (i + 1 < lines.length && lines[i + 1] === 'Get Details') {
|
||||
i += 2;
|
||||
continue;
|
||||
}
|
||||
|
||||
let name = lines[i];
|
||||
let companyName = '';
|
||||
let statusLine = '';
|
||||
let address = '';
|
||||
let linesConsumed = 0;
|
||||
|
||||
const nextLine = lines[i + 1] || '';
|
||||
if (nextLine.includes('Operating')) {
|
||||
companyName = name;
|
||||
statusLine = lines[i + 1];
|
||||
address = lines[i + 2];
|
||||
const getDetails = lines[i + 3];
|
||||
if (getDetails !== 'Get Details') {
|
||||
i++;
|
||||
continue;
|
||||
}
|
||||
linesConsumed = 4;
|
||||
} else {
|
||||
companyName = lines[i + 1];
|
||||
statusLine = lines[i + 2];
|
||||
address = lines[i + 3];
|
||||
const getDetails = lines[i + 4];
|
||||
if (getDetails !== 'Get Details') {
|
||||
i++;
|
||||
continue;
|
||||
}
|
||||
linesConsumed = 5;
|
||||
}
|
||||
|
||||
// Parse phone from status line
|
||||
let phone = '';
|
||||
const phoneMatch = statusLine.match(/(\(\d{3}\)\s*\d{3}-\d{4}|\d{3}-\d{3}-\d{4}|\d{10})/);
|
||||
if (phoneMatch) {
|
||||
phone = phoneMatch[1].replace(/\D/g, '');
|
||||
}
|
||||
|
||||
// Parse operating status and entity type
|
||||
const statusParts = statusLine.split('·').map(p => p.trim());
|
||||
const operatingStatus = statusParts[0] || '';
|
||||
const entityType = statusParts[1] || '';
|
||||
|
||||
// Parse address components
|
||||
let street = '', city = '', state = 'AZ', zip = '';
|
||||
|
||||
const addressParts = address.split(',').map(p => p.trim());
|
||||
|
||||
if (addressParts.length >= 3) {
|
||||
street = addressParts.slice(0, -2).join(', ');
|
||||
city = addressParts[addressParts.length - 2];
|
||||
const stateZip = addressParts[addressParts.length - 1];
|
||||
const stateZipMatch = stateZip.match(/([A-Z]{2})\s+(\d{5})/);
|
||||
if (stateZipMatch) {
|
||||
state = stateZipMatch[1];
|
||||
zip = stateZipMatch[2];
|
||||
}
|
||||
} else if (addressParts.length === 2) {
|
||||
street = addressParts[0];
|
||||
const cityStateZip = addressParts[1];
|
||||
const match = cityStateZip.match(/([^,]+),?\s+([A-Z]{2})\s+(\d{5})/);
|
||||
if (match) {
|
||||
city = match[1].trim();
|
||||
state = match[2];
|
||||
zip = match[3];
|
||||
}
|
||||
} else {
|
||||
street = address;
|
||||
const zipMatch = address.match(/\b(\d{5})\b/);
|
||||
if (zipMatch) zip = zipMatch[1];
|
||||
const cityMatch = address.match(/,\s*([A-Za-z\s]+),\s*AZ/);
|
||||
if (cityMatch) city = cityMatch[1].trim();
|
||||
}
|
||||
|
||||
dispensaries.push({
|
||||
name,
|
||||
companyName,
|
||||
phone,
|
||||
street,
|
||||
city,
|
||||
state,
|
||||
zip,
|
||||
operatingStatus,
|
||||
entityType
|
||||
});
|
||||
|
||||
i += linesConsumed;
|
||||
}
|
||||
|
||||
console.log(`✅ Found ${dispensaries.length} dispensaries\n`);
|
||||
|
||||
// Show first 10
|
||||
console.log('📋 First 10 dispensaries:\n');
|
||||
console.table(dispensaries.slice(0, 10).map(d => ({
|
||||
name: d.name.substring(0, 30),
|
||||
company: d.companyName.substring(0, 30),
|
||||
city: d.city,
|
||||
phone: d.phone,
|
||||
status: d.operatingStatus,
|
||||
type: d.entityType.substring(0, 20)
|
||||
})));
|
||||
|
||||
// Show last 10
|
||||
console.log('\n📋 Last 10 dispensaries:\n');
|
||||
console.table(dispensaries.slice(-10).map(d => ({
|
||||
name: d.name.substring(0, 30),
|
||||
company: d.companyName.substring(0, 30),
|
||||
city: d.city,
|
||||
phone: d.phone,
|
||||
status: d.operatingStatus,
|
||||
type: d.entityType.substring(0, 20)
|
||||
})));
|
||||
|
||||
// Show counts by city
|
||||
const cityCounts: any = {};
|
||||
dispensaries.forEach(d => {
|
||||
cityCounts[d.city] = (cityCounts[d.city] || 0) + 1;
|
||||
});
|
||||
|
||||
console.log('\n📊 Dispensaries by city (top 10):\n');
|
||||
const sortedCities = Object.entries(cityCounts)
|
||||
.sort((a: any, b: any) => b[1] - a[1])
|
||||
.slice(0, 10);
|
||||
console.table(sortedCities.map(([city, count]) => ({ city, count })));
|
||||
|
||||
// Show entity types
|
||||
const entityTypes: any = {};
|
||||
dispensaries.forEach(d => {
|
||||
entityTypes[d.entityType] = (entityTypes[d.entityType] || 0) + 1;
|
||||
});
|
||||
|
||||
console.log('\n📊 Dispensaries by entity type:\n');
|
||||
console.table(Object.entries(entityTypes).map(([type, count]) => ({ type, count })));
|
||||
|
||||
// Show operating status
|
||||
const statuses: any = {};
|
||||
dispensaries.forEach(d => {
|
||||
statuses[d.operatingStatus] = (statuses[d.operatingStatus] || 0) + 1;
|
||||
});
|
||||
|
||||
console.log('\n📊 Dispensaries by operating status:\n');
|
||||
console.table(Object.entries(statuses).map(([status, count]) => ({ status, count })));
|
||||
}
|
||||
|
||||
previewImport();
|
||||
29
backend/archive/reset-jobs.ts
Normal file
29
backend/archive/reset-jobs.ts
Normal file
@@ -0,0 +1,29 @@
|
||||
import { pool } from './src/db/migrate.js';
|
||||
|
||||
async function resetJobs() {
|
||||
const result = await pool.query(`
|
||||
UPDATE brand_scrape_jobs
|
||||
SET status = 'pending',
|
||||
worker_id = NULL,
|
||||
started_at = NULL,
|
||||
completed_at = NULL,
|
||||
products_found = 0,
|
||||
products_saved = 0,
|
||||
error_message = NULL,
|
||||
retry_count = 0
|
||||
WHERE dispensary_id = 112
|
||||
`);
|
||||
|
||||
console.log(`✅ Reset ${result.rowCount} jobs to pending`);
|
||||
|
||||
const count = await pool.query(
|
||||
'SELECT COUNT(*) FROM brand_scrape_jobs WHERE dispensary_id = 112 AND status = $1',
|
||||
['pending']
|
||||
);
|
||||
|
||||
console.log(`📊 Total pending jobs: ${count.rows[0].count}`);
|
||||
|
||||
await pool.end();
|
||||
}
|
||||
|
||||
resetJobs();
|
||||
43
backend/archive/reset-one-job.ts
Normal file
43
backend/archive/reset-one-job.ts
Normal file
@@ -0,0 +1,43 @@
|
||||
import { pool } from './src/db/migrate.js';
|
||||
|
||||
async function main() {
|
||||
// Check job status
|
||||
const statusResult = await pool.query(`
|
||||
SELECT status, COUNT(*) as count
|
||||
FROM brand_scrape_jobs
|
||||
WHERE dispensary_id = 112
|
||||
GROUP BY status
|
||||
ORDER BY status
|
||||
`);
|
||||
|
||||
console.log('\nJob Status Distribution:');
|
||||
statusResult.rows.forEach(row => {
|
||||
console.log(` ${row.status}: ${row.count}`);
|
||||
});
|
||||
|
||||
// Reset one completed job to pending
|
||||
const resetResult = await pool.query(`
|
||||
UPDATE brand_scrape_jobs
|
||||
SET status = 'pending',
|
||||
worker_id = NULL,
|
||||
started_at = NULL,
|
||||
completed_at = NULL,
|
||||
products_found = NULL,
|
||||
products_saved = NULL,
|
||||
updated_at = NOW()
|
||||
WHERE dispensary_id = 112
|
||||
AND status = 'completed'
|
||||
AND brand_slug = 'select'
|
||||
RETURNING id, brand_name, status
|
||||
`);
|
||||
|
||||
if (resetResult.rows.length > 0) {
|
||||
console.log(`\nReset job: ${resetResult.rows[0].brand_name} (ID: ${resetResult.rows[0].id})`);
|
||||
} else {
|
||||
console.log('\nNo job reset (Select brand not found in completed jobs)');
|
||||
}
|
||||
|
||||
await pool.end();
|
||||
}
|
||||
|
||||
main().catch(console.error);
|
||||
45
backend/archive/run-location-migration.js
Normal file
45
backend/archive/run-location-migration.js
Normal file
@@ -0,0 +1,45 @@
|
||||
const { Pool } = require('pg');
|
||||
|
||||
const pool = new Pool({
|
||||
connectionString: process.env.DATABASE_URL || 'postgresql://kelly:kelly@localhost:5432/hub'
|
||||
});
|
||||
|
||||
(async () => {
|
||||
try {
|
||||
console.log('🔄 Running location migration...');
|
||||
|
||||
await pool.query(`
|
||||
ALTER TABLE proxies
|
||||
ADD COLUMN IF NOT EXISTS city VARCHAR(100),
|
||||
ADD COLUMN IF NOT EXISTS state VARCHAR(100),
|
||||
ADD COLUMN IF NOT EXISTS country VARCHAR(100),
|
||||
ADD COLUMN IF NOT EXISTS country_code VARCHAR(2),
|
||||
ADD COLUMN IF NOT EXISTS location_updated_at TIMESTAMP
|
||||
`);
|
||||
|
||||
console.log('✅ Added columns to proxies table');
|
||||
|
||||
await pool.query(`
|
||||
CREATE INDEX IF NOT EXISTS idx_proxies_location ON proxies(country_code, state, city)
|
||||
`);
|
||||
|
||||
console.log('✅ Created location index');
|
||||
|
||||
await pool.query(`
|
||||
ALTER TABLE failed_proxies
|
||||
ADD COLUMN IF NOT EXISTS city VARCHAR(100),
|
||||
ADD COLUMN IF NOT EXISTS state VARCHAR(100),
|
||||
ADD COLUMN IF NOT EXISTS country VARCHAR(100),
|
||||
ADD COLUMN IF NOT EXISTS country_code VARCHAR(2),
|
||||
ADD COLUMN IF NOT EXISTS location_updated_at TIMESTAMP
|
||||
`);
|
||||
|
||||
console.log('✅ Added columns to failed_proxies table');
|
||||
console.log('✅ Migration complete!');
|
||||
|
||||
process.exit(0);
|
||||
} catch (error) {
|
||||
console.error('❌ Migration failed:', error);
|
||||
process.exit(1);
|
||||
}
|
||||
})();
|
||||
28
backend/archive/run-logo-migration.js
Normal file
28
backend/archive/run-logo-migration.js
Normal file
@@ -0,0 +1,28 @@
|
||||
const { Pool } = require('pg');
|
||||
const fs = require('fs');
|
||||
const path = require('path');
|
||||
|
||||
const pool = new Pool({
|
||||
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
|
||||
});
|
||||
|
||||
async function runMigration() {
|
||||
const client = await pool.connect();
|
||||
|
||||
try {
|
||||
const migrationPath = '/home/kelly/dutchie-menus/backend/migrations/010_store_logo.sql';
|
||||
const sql = fs.readFileSync(migrationPath, 'utf8');
|
||||
|
||||
console.log('Running migration: 010_store_logo.sql');
|
||||
await client.query(sql);
|
||||
console.log('✅ Migration completed successfully');
|
||||
|
||||
} catch (error) {
|
||||
console.error('❌ Migration failed:', error);
|
||||
} finally {
|
||||
client.release();
|
||||
pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
runMigration();
|
||||
18
backend/archive/run-migration.ts
Normal file
18
backend/archive/run-migration.ts
Normal file
@@ -0,0 +1,18 @@
|
||||
import { pool } from './src/db/migrate.js';
|
||||
import fs from 'fs';
|
||||
|
||||
async function runMigration() {
|
||||
try {
|
||||
const sql = fs.readFileSync('/home/kelly/dutchie-menus/backend/migrations/add-stock-columns.sql', 'utf8');
|
||||
console.log('Running migration: add-stock-columns.sql');
|
||||
await pool.query(sql);
|
||||
console.log('✅ Migration complete - stock_quantity and stock_status columns added');
|
||||
await pool.end();
|
||||
} catch (error: any) {
|
||||
console.error('❌ Migration failed:', error.message);
|
||||
await pool.end();
|
||||
process.exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
runMigration();
|
||||
18
backend/archive/run-scrape-jobs-migration.ts
Normal file
18
backend/archive/run-scrape-jobs-migration.ts
Normal file
@@ -0,0 +1,18 @@
|
||||
import { pool } from './src/db/migrate.js';
|
||||
import fs from 'fs';
|
||||
|
||||
async function runMigration() {
|
||||
try {
|
||||
const sql = fs.readFileSync('/home/kelly/dutchie-menus/backend/migrations/create-scrape-jobs.sql', 'utf8');
|
||||
console.log('Running migration: create-scrape-jobs.sql');
|
||||
await pool.query(sql);
|
||||
console.log('✅ Migration complete - brand_scrape_jobs table created');
|
||||
await pool.end();
|
||||
} catch (error: any) {
|
||||
console.error('❌ Migration failed:', error.message);
|
||||
await pool.end();
|
||||
process.exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
runMigration();
|
||||
4042
backend/archive/schema-dump.sql
Normal file
4042
backend/archive/schema-dump.sql
Normal file
File diff suppressed because it is too large
Load Diff
147
backend/archive/scrape-48th-brands.ts
Normal file
147
backend/archive/scrape-48th-brands.ts
Normal file
@@ -0,0 +1,147 @@
|
||||
import puppeteer from 'puppeteer-extra';
|
||||
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
|
||||
import { Pool } from 'pg';
|
||||
|
||||
puppeteer.use(StealthPlugin());
|
||||
|
||||
const pool = new Pool({
|
||||
connectionString: 'postgresql://sail:password@localhost:5432/dutchie_menus'
|
||||
});
|
||||
|
||||
async function main() {
|
||||
let browser;
|
||||
|
||||
try {
|
||||
console.log('STEP 2: Getting random proxy from pool...');
|
||||
const proxyResult = await pool.query(`
|
||||
SELECT host, port, protocol FROM proxies
|
||||
ORDER BY RANDOM() LIMIT 1
|
||||
`);
|
||||
|
||||
const proxy = proxyResult.rows[0];
|
||||
console.log(`✅ Selected proxy: ${proxy.host}:${proxy.port}\n`);
|
||||
|
||||
console.log('STEP 3: Launching browser with proxy + anti-fingerprint...');
|
||||
const proxyUrl = `${proxy.protocol}://${proxy.host}:${proxy.port}`;
|
||||
|
||||
browser = await puppeteer.launch({
|
||||
headless: true,
|
||||
args: [
|
||||
'--no-sandbox',
|
||||
'--disable-setuid-sandbox',
|
||||
`--proxy-server=${proxyUrl}`,
|
||||
'--disable-blink-features=AutomationControlled'
|
||||
]
|
||||
});
|
||||
|
||||
const page = await browser.newPage();
|
||||
|
||||
// Set Googlebot user-agent
|
||||
await page.setUserAgent('Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)');
|
||||
console.log('✅ Set UA to Googlebot\n');
|
||||
|
||||
// Anti-fingerprint: spoof timezone, geolocation, remove webdriver
|
||||
await page.evaluateOnNewDocument(() => {
|
||||
// Timezone (Arizona)
|
||||
Object.defineProperty(Intl.DateTimeFormat.prototype, 'resolvedOptions', {
|
||||
value: function() { return { timeZone: 'America/Phoenix' }; }
|
||||
});
|
||||
|
||||
// Geolocation (Phoenix)
|
||||
Object.defineProperty(navigator, 'geolocation', {
|
||||
get: () => ({
|
||||
getCurrentPosition: (success: any) => {
|
||||
setTimeout(() => success({
|
||||
coords: { latitude: 33.4484, longitude: -112.0740, accuracy: 100 }
|
||||
}), 100);
|
||||
}
|
||||
})
|
||||
});
|
||||
|
||||
// Remove webdriver
|
||||
Object.defineProperty(navigator, 'webdriver', { get: () => false });
|
||||
});
|
||||
console.log('✅ Fingerprint spoofed (timezone=Arizona, geo=Phoenix, webdriver=hidden)\n');
|
||||
|
||||
console.log('STEP 4: Navigating to Curaleaf Phoenix Airport brands page...');
|
||||
const url = 'https://curaleaf.com/stores/curaleaf-dispensary-phoenix-airport/brands';
|
||||
console.log(`URL: ${url}\n`);
|
||||
|
||||
await page.goto(url, { waitUntil: 'networkidle2', timeout: 60000 });
|
||||
await page.waitForTimeout(5000);
|
||||
|
||||
console.log('STEP 5: Scraping brand data from page...');
|
||||
|
||||
// Get page info for debugging
|
||||
const pageInfo = await page.evaluate(() => ({
|
||||
title: document.title,
|
||||
url: window.location.href,
|
||||
bodyLength: document.body.innerHTML.length
|
||||
}));
|
||||
|
||||
console.log(`Page title: "${pageInfo.title}"`);
|
||||
console.log(`Current URL: ${pageInfo.url}`);
|
||||
console.log(`Body HTML length: ${pageInfo.bodyLength} chars\n`);
|
||||
|
||||
// Scrape brands
|
||||
const brands = await page.evaluate(() => {
|
||||
// Try multiple selectors
|
||||
const selectors = [
|
||||
'[data-testid*="brand"]',
|
||||
'[class*="Brand"]',
|
||||
'[class*="brand"]',
|
||||
'a[href*="/brand/"]',
|
||||
'.brand-card',
|
||||
'.brand-item'
|
||||
];
|
||||
|
||||
const found = new Set<string>();
|
||||
|
||||
selectors.forEach(selector => {
|
||||
document.querySelectorAll(selector).forEach(el => {
|
||||
const text = el.textContent?.trim();
|
||||
if (text && text.length > 0 && text.length < 50) {
|
||||
found.add(text);
|
||||
}
|
||||
});
|
||||
});
|
||||
|
||||
return Array.from(found);
|
||||
});
|
||||
|
||||
console.log(`✅ Found ${brands.length} brands:\n`);
|
||||
brands.forEach((b, i) => console.log(` ${i + 1}. ${b}`));
|
||||
|
||||
if (brands.length === 0) {
|
||||
console.log('\n⚠️ No brands found. Possible reasons:');
|
||||
console.log(' - IP/proxy is blocked');
|
||||
console.log(' - Page requires different selectors');
|
||||
console.log(' - Brands load asynchronously');
|
||||
return;
|
||||
}
|
||||
|
||||
console.log('\n\nSTEP 6: Saving brands to database...');
|
||||
|
||||
let saved = 0;
|
||||
for (const brand of brands) {
|
||||
try {
|
||||
await pool.query(`
|
||||
INSERT INTO products (store_id, name, brand, dutchie_url, in_stock)
|
||||
VALUES (1, $1, $2, $3, true)
|
||||
ON CONFLICT (store_id, name, brand) DO NOTHING
|
||||
`, [`${brand} Product`, brand, url]);
|
||||
saved++;
|
||||
} catch (e) {}
|
||||
}
|
||||
|
||||
console.log(`✅ Saved ${saved} brands to database\n`);
|
||||
|
||||
} catch (error: any) {
|
||||
console.error('❌ ERROR:', error.message);
|
||||
} finally {
|
||||
if (browser) await browser.close();
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
main();
|
||||
212
backend/archive/scrape-azdhs-api.ts
Normal file
212
backend/archive/scrape-azdhs-api.ts
Normal file
@@ -0,0 +1,212 @@
|
||||
import { chromium } from 'playwright-extra';
|
||||
import stealth from 'puppeteer-extra-plugin-stealth';
|
||||
import { pool } from './src/db/migrate';
|
||||
|
||||
chromium.use(stealth());
|
||||
|
||||
async function scrapeAZDHSAPI() {
|
||||
console.log('🏛️ Scraping AZDHS via API interception...\n');
|
||||
|
||||
const browser = await chromium.launch({
|
||||
headless: false,
|
||||
});
|
||||
|
||||
const context = await browser.newContext({
|
||||
viewport: { width: 1920, height: 1080 },
|
||||
});
|
||||
|
||||
const page = await context.newPage();
|
||||
|
||||
// Capture ALL API responses
|
||||
const allResponses: any[] = [];
|
||||
|
||||
page.on('response', async (response) => {
|
||||
const url = response.url();
|
||||
const contentType = response.headers()['content-type'] || '';
|
||||
|
||||
// Only capture JSON responses from azcarecheck domain
|
||||
if (url.includes('azcarecheck.azdhs.gov') && contentType.includes('json')) {
|
||||
try {
|
||||
const json = await response.json();
|
||||
console.log(`📡 Captured JSON from: ${url.substring(0, 80)}...`);
|
||||
allResponses.push({
|
||||
url,
|
||||
data: json,
|
||||
status: response.status()
|
||||
});
|
||||
} catch (e) {
|
||||
// Not valid JSON or couldn't parse
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
try {
|
||||
console.log('📄 Loading AZDHS page...');
|
||||
|
||||
await page.goto('https://azcarecheck.azdhs.gov/s/?facilityId=001t000000L0TApAAN', {
|
||||
waitUntil: 'networkidle',
|
||||
timeout: 120000
|
||||
});
|
||||
|
||||
console.log('⏳ Waiting 60 seconds to capture all API calls...\n');
|
||||
await page.waitForTimeout(60000);
|
||||
|
||||
console.log(`\n📊 Captured ${allResponses.length} JSON API responses\n`);
|
||||
|
||||
// Analyze responses to find dispensary data
|
||||
let dispensaryData: any[] = [];
|
||||
|
||||
for (const resp of allResponses) {
|
||||
const data = resp.data;
|
||||
|
||||
// Look for arrays that might contain dispensary data
|
||||
const checkForDispensaries = (obj: any, path = ''): any[] => {
|
||||
if (Array.isArray(obj) && obj.length > 50) {
|
||||
// Check if array items look like dispensaries
|
||||
const sample = obj[0];
|
||||
if (sample && typeof sample === 'object') {
|
||||
const keys = Object.keys(sample);
|
||||
if (keys.some(k => k.toLowerCase().includes('name') ||
|
||||
k.toLowerCase().includes('address') ||
|
||||
k.toLowerCase().includes('facility'))) {
|
||||
console.log(` ✅ Found potential dispensary array at ${path} with ${obj.length} items`);
|
||||
console.log(` Sample keys: ${keys.slice(0, 10).join(', ')}`);
|
||||
return obj;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (typeof obj === 'object' && obj !== null) {
|
||||
for (const [key, value] of Object.entries(obj)) {
|
||||
const result = checkForDispensaries(value, `${path}.${key}`);
|
||||
if (result.length > 0) return result;
|
||||
}
|
||||
}
|
||||
|
||||
return [];
|
||||
};
|
||||
|
||||
const found = checkForDispensaries(data);
|
||||
if (found.length > 0) {
|
||||
dispensaryData = found;
|
||||
console.log(`\n🎯 Found dispensary data! ${found.length} entries`);
|
||||
console.log(` URL: ${resp.url}\n`);
|
||||
|
||||
// Show sample of first entry
|
||||
console.log('📋 Sample entry:');
|
||||
console.log(JSON.stringify(found[0], null, 2).substring(0, 500));
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
if (dispensaryData.length === 0) {
|
||||
console.log('❌ Could not find dispensary data in API responses\n');
|
||||
console.log('🔍 All captured URLs:');
|
||||
allResponses.forEach((r, i) => {
|
||||
console.log(` ${i + 1}. ${r.url}`);
|
||||
});
|
||||
|
||||
// Save raw responses for manual inspection
|
||||
console.log('\n💾 Saving raw API responses to /tmp/azdhs-api-responses.json for inspection...');
|
||||
const fs = require('fs');
|
||||
fs.writeFileSync('/tmp/azdhs-api-responses.json', JSON.stringify(allResponses, null, 2));
|
||||
|
||||
await browser.close();
|
||||
await pool.end();
|
||||
return;
|
||||
}
|
||||
|
||||
// Save to database
|
||||
console.log('\n💾 Saving AZDHS dispensaries to database...\n');
|
||||
|
||||
let savedCount = 0;
|
||||
let updatedCount = 0;
|
||||
let skippedCount = 0;
|
||||
|
||||
for (const item of dispensaryData) {
|
||||
try {
|
||||
// Extract fields - need to inspect the actual structure
|
||||
// Common Salesforce field patterns: Name, Name__c, FacilityName, etc.
|
||||
const name = item.Name || item.name || item.FacilityName || item.facility_name ||
|
||||
item.Name__c || item.dispensaryName || item.BusinessName;
|
||||
|
||||
const address = item.Address || item.address || item.Street || item.street ||
|
||||
item.Address__c || item.StreetAddress || item.street_address;
|
||||
|
||||
const city = item.City || item.city || item.City__c;
|
||||
const state = item.State || item.state || item.State__c || 'AZ';
|
||||
const zip = item.Zip || item.zip || item.ZipCode || item.zip_code || item.PostalCode || item.Zip__c;
|
||||
const phone = item.Phone || item.phone || item.PhoneNumber || item.phone_number || item.Phone__c;
|
||||
const email = item.Email || item.email || item.Email__c;
|
||||
const lat = item.Latitude || item.latitude || item.lat || item.Latitude__c;
|
||||
const lng = item.Longitude || item.longitude || item.lng || item.lon || item.Longitude__c;
|
||||
|
||||
if (!name || name.length < 3) {
|
||||
skippedCount++;
|
||||
continue;
|
||||
}
|
||||
|
||||
// Check if exists
|
||||
const existing = await pool.query(
|
||||
'SELECT id FROM stores WHERE LOWER(name) = LOWER($1) AND state = $2 AND data_source = $3',
|
||||
[name, state, 'azdhs']
|
||||
);
|
||||
|
||||
const slug = name.toLowerCase().replace(/[^a-z0-9]+/g, '-');
|
||||
const dutchieUrl = `https://azcarecheck.azdhs.gov/s/?name=${encodeURIComponent(name)}`;
|
||||
|
||||
if (existing.rows.length > 0) {
|
||||
await pool.query(`
|
||||
UPDATE stores SET
|
||||
address = COALESCE($1, address),
|
||||
city = COALESCE($2, city),
|
||||
zip = COALESCE($3, zip),
|
||||
phone = COALESCE($4, phone),
|
||||
email = COALESCE($5, email),
|
||||
latitude = COALESCE($6, latitude),
|
||||
longitude = COALESCE($7, longitude),
|
||||
updated_at = CURRENT_TIMESTAMP
|
||||
WHERE id = $8
|
||||
`, [address, city, zip, phone, email, lat, lng, existing.rows[0].id]);
|
||||
updatedCount++;
|
||||
} else {
|
||||
await pool.query(`
|
||||
INSERT INTO stores (
|
||||
name, slug, dutchie_url, address, city, state, zip, phone, email,
|
||||
latitude, longitude, data_source, active, created_at, updated_at
|
||||
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, 'azdhs', true, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP)
|
||||
`, [name, slug, dutchieUrl, address, city, state, zip, phone, email, lat, lng]);
|
||||
savedCount++;
|
||||
}
|
||||
} catch (error) {
|
||||
console.error(`Error saving: ${error}`);
|
||||
skippedCount++;
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`\n✅ Saved ${savedCount} new AZDHS dispensaries`);
|
||||
console.log(`✅ Updated ${updatedCount} existing AZDHS dispensaries`);
|
||||
if (skippedCount > 0) console.log(`⚠️ Skipped ${skippedCount} entries`);
|
||||
|
||||
// Show totals by source
|
||||
const totals = await pool.query(`
|
||||
SELECT data_source, COUNT(*) as count
|
||||
FROM stores
|
||||
WHERE state = 'AZ'
|
||||
GROUP BY data_source
|
||||
ORDER BY data_source
|
||||
`);
|
||||
|
||||
console.log('\n📊 Arizona dispensaries by source:');
|
||||
console.table(totals.rows);
|
||||
|
||||
} catch (error) {
|
||||
console.error(`❌ Error: ${error}`);
|
||||
throw error;
|
||||
} finally {
|
||||
await browser.close();
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
scrapeAZDHSAPI();
|
||||
193
backend/archive/scrape-azdhs-auto.ts
Normal file
193
backend/archive/scrape-azdhs-auto.ts
Normal file
@@ -0,0 +1,193 @@
|
||||
import { chromium } from 'playwright-extra';
|
||||
import stealth from 'puppeteer-extra-plugin-stealth';
|
||||
import { pool } from './src/db/migrate';
|
||||
|
||||
chromium.use(stealth());
|
||||
|
||||
async function scrapeAZDHSAuto() {
|
||||
console.log('🏛️ Scraping AZDHS - Automatic Mode\n');
|
||||
|
||||
const browser = await chromium.launch({
|
||||
headless: false, // Visible so you can see it working
|
||||
});
|
||||
|
||||
const context = await browser.newContext({
|
||||
viewport: { width: 1920, height: 1080 },
|
||||
});
|
||||
|
||||
const page = await context.newPage();
|
||||
|
||||
try {
|
||||
console.log('📄 Loading AZDHS page...');
|
||||
|
||||
await page.goto('https://azcarecheck.azdhs.gov/s/?facilityId=001t000000L0TApAAN', {
|
||||
waitUntil: 'domcontentloaded',
|
||||
timeout: 60000
|
||||
});
|
||||
|
||||
console.log('⏳ Waiting 30 seconds for page to fully load and for you to scroll...\n');
|
||||
await page.waitForTimeout(30000);
|
||||
|
||||
console.log('📦 Extracting all dispensaries from the page...\n');
|
||||
|
||||
// Extract all dispensaries
|
||||
const dispensaries = await page.evaluate(() => {
|
||||
const results: any[] = [];
|
||||
|
||||
// Look for all possible dispensary container elements
|
||||
const containers = document.querySelectorAll(
|
||||
'article, [class*="facility"], [class*="dispensary"], [class*="location"], ' +
|
||||
'.slds-card, lightning-card, [data-id], [data-facility-id]'
|
||||
);
|
||||
|
||||
containers.forEach((card) => {
|
||||
const disp: any = {};
|
||||
|
||||
// Get all text from the card
|
||||
const fullText = card.textContent?.trim() || '';
|
||||
disp.rawText = fullText.substring(0, 500);
|
||||
|
||||
// Try various selectors for name
|
||||
const nameSelectors = ['h3', 'h2', 'h4', '[class*="title"]', '[class*="name"]', 'strong', 'b'];
|
||||
for (const selector of nameSelectors) {
|
||||
const el = card.querySelector(selector);
|
||||
if (el && el.textContent && el.textContent.trim().length > 3) {
|
||||
disp.name = el.textContent.trim();
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
// Extract phone
|
||||
const phoneLink = card.querySelector('a[href^="tel:"]');
|
||||
if (phoneLink) {
|
||||
disp.phone = phoneLink.getAttribute('href')?.replace('tel:', '').replace(/\D/g, '');
|
||||
} else {
|
||||
// Look for phone pattern in text
|
||||
const phoneMatch = fullText.match(/(\d{3}[-.]?\d{3}[-.]?\d{4})/);
|
||||
if (phoneMatch) disp.phone = phoneMatch[1];
|
||||
}
|
||||
|
||||
// Extract email
|
||||
const emailLink = card.querySelector('a[href^="mailto:"]');
|
||||
if (emailLink) {
|
||||
disp.email = emailLink.getAttribute('href')?.replace('mailto:', '');
|
||||
}
|
||||
|
||||
// Extract address - look for street address pattern
|
||||
const addressMatch = fullText.match(/(\d+\s+[A-Za-z0-9\s]+(?:Street|St|Avenue|Ave|Road|Rd|Boulevard|Blvd|Drive|Dr|Lane|Ln|Way|Circle|Court|Parkway)\.?(?:\s+(?:Suite|Ste|Unit|#)\s*[\w-]+)?)/i);
|
||||
if (addressMatch) {
|
||||
disp.address = addressMatch[1].trim();
|
||||
}
|
||||
|
||||
// Extract city
|
||||
const cityMatch = fullText.match(/([A-Z][a-z]+(?:\s+[A-Z][a-z]+)*),\s*AZ/);
|
||||
if (cityMatch) {
|
||||
disp.city = cityMatch[1];
|
||||
}
|
||||
|
||||
// Extract ZIP
|
||||
const zipMatch = fullText.match(/\b(\d{5})(?:-\d{4})?\b/);
|
||||
if (zipMatch) {
|
||||
disp.zip = zipMatch[1];
|
||||
}
|
||||
|
||||
// Only add if we found at least a name
|
||||
if (disp.name && disp.name.length > 3) {
|
||||
results.push(disp);
|
||||
}
|
||||
});
|
||||
|
||||
return results;
|
||||
});
|
||||
|
||||
console.log(`✅ Found ${dispensaries.length} dispensary entries!\n`);
|
||||
|
||||
if (dispensaries.length > 0) {
|
||||
console.log('📋 Sample of first 5:');
|
||||
console.table(dispensaries.slice(0, 5).map(d => ({
|
||||
name: d.name?.substring(0, 40),
|
||||
phone: d.phone,
|
||||
city: d.city,
|
||||
})));
|
||||
}
|
||||
|
||||
// Save to database
|
||||
console.log('\n💾 Saving to database with data_source="azdhs"...\n');
|
||||
|
||||
let savedCount = 0;
|
||||
let updatedCount = 0;
|
||||
let skippedCount = 0;
|
||||
|
||||
for (const disp of dispensaries) {
|
||||
if (!disp.name) {
|
||||
skippedCount++;
|
||||
continue;
|
||||
}
|
||||
|
||||
try {
|
||||
// Check if exists by name
|
||||
const existing = await pool.query(
|
||||
'SELECT id FROM stores WHERE LOWER(name) = LOWER($1) AND state = $2 AND data_source = $3',
|
||||
[disp.name, 'AZ', 'azdhs']
|
||||
);
|
||||
|
||||
const slug = disp.name.toLowerCase().replace(/[^a-z0-9]+/g, '-');
|
||||
const dutchieUrl = `https://azcarecheck.azdhs.gov/s/?name=${encodeURIComponent(disp.name)}`;
|
||||
|
||||
if (existing.rows.length > 0) {
|
||||
await pool.query(`
|
||||
UPDATE stores SET
|
||||
address = COALESCE($1, address),
|
||||
city = COALESCE($2, city),
|
||||
zip = COALESCE($3, zip),
|
||||
phone = COALESCE($4, phone),
|
||||
email = COALESCE($5, email),
|
||||
updated_at = CURRENT_TIMESTAMP
|
||||
WHERE id = $6
|
||||
`, [disp.address, disp.city, disp.zip, disp.phone, disp.email, existing.rows[0].id]);
|
||||
updatedCount++;
|
||||
} else {
|
||||
await pool.query(`
|
||||
INSERT INTO stores (
|
||||
name, slug, dutchie_url, address, city, state, zip, phone, email,
|
||||
data_source, active, created_at, updated_at
|
||||
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, 'azdhs', true, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP)
|
||||
`, [disp.name, slug, dutchieUrl, disp.address, disp.city, 'AZ', disp.zip, disp.phone, disp.email]);
|
||||
savedCount++;
|
||||
}
|
||||
} catch (error) {
|
||||
console.error(`Error saving ${disp.name}: ${error}`);
|
||||
skippedCount++;
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`\n✅ Saved ${savedCount} new AZDHS dispensaries`);
|
||||
console.log(`✅ Updated ${updatedCount} existing AZDHS dispensaries`);
|
||||
if (skippedCount > 0) console.log(`⚠️ Skipped ${skippedCount} entries`);
|
||||
|
||||
// Show totals by source
|
||||
const totals = await pool.query(`
|
||||
SELECT data_source, COUNT(*) as count
|
||||
FROM stores
|
||||
WHERE state = 'AZ'
|
||||
GROUP BY data_source
|
||||
ORDER BY data_source
|
||||
`);
|
||||
|
||||
console.log('\n📊 Arizona dispensaries by source:');
|
||||
console.table(totals.rows);
|
||||
|
||||
console.log('\n✅ AZDHS scraping complete!');
|
||||
|
||||
} catch (error) {
|
||||
console.error(`❌ Error: ${error}`);
|
||||
throw error;
|
||||
} finally {
|
||||
console.log('\n👉 Browser will close in 5 seconds...');
|
||||
await page.waitForTimeout(5000);
|
||||
await browser.close();
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
scrapeAZDHSAuto();
|
||||
108
backend/archive/scrape-azdhs-better.ts
Normal file
108
backend/archive/scrape-azdhs-better.ts
Normal file
@@ -0,0 +1,108 @@
|
||||
import { chromium } from 'playwright-extra';
|
||||
import stealth from 'puppeteer-extra-plugin-stealth';
|
||||
import { pool } from './src/db/migrate';
|
||||
|
||||
chromium.use(stealth());
|
||||
|
||||
async function scrapeAZDHSBetter() {
|
||||
console.log('🏛️ Scraping AZDHS official map (improved approach)...\n');
|
||||
|
||||
const browser = await chromium.launch({
|
||||
headless: false,
|
||||
});
|
||||
|
||||
const context = await browser.newContext({
|
||||
viewport: { width: 1920, height: 1080 },
|
||||
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
|
||||
});
|
||||
|
||||
const page = await context.newPage();
|
||||
|
||||
// Capture API requests
|
||||
const apiData: any[] = [];
|
||||
page.on('response', async (response) => {
|
||||
const url = response.url();
|
||||
if (url.includes('dispensar') || url.includes('facility') || url.includes('location')) {
|
||||
try {
|
||||
const json = await response.json();
|
||||
console.log(`📡 Captured API response from: ${url.substring(0, 100)}...`);
|
||||
apiData.push({ url, data: json });
|
||||
} catch (e) {
|
||||
// Not JSON
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
try {
|
||||
console.log('📄 Loading AZDHS page (waiting up to 60s for JavaScript)...');
|
||||
|
||||
await page.goto('https://azcarecheck.azdhs.gov/s/?facilityId=001t000000L0TApAAN', {
|
||||
waitUntil: 'domcontentloaded',
|
||||
timeout: 60000
|
||||
});
|
||||
|
||||
// Wait longer for JavaScript to execute
|
||||
console.log('⏳ Waiting 20 seconds for Salesforce to fully load...');
|
||||
await page.waitForTimeout(20000);
|
||||
|
||||
// Try to find and click "View All" or expand the map
|
||||
console.log('🔍 Looking for buttons to expand results...');
|
||||
|
||||
const viewAllButton = page.locator('button:has-text("View All"), button:has-text("Show All"), a:has-text("View All")').first();
|
||||
if (await viewAllButton.isVisible().catch(() => false)) {
|
||||
console.log(' ✅ Found View All button, clicking...');
|
||||
await viewAllButton.click();
|
||||
await page.waitForTimeout(5000);
|
||||
}
|
||||
|
||||
// Try extracting data directly from page
|
||||
console.log('\n📦 Extracting dispensary data from page...');
|
||||
|
||||
const dispensaries = await page.evaluate(() => {
|
||||
const results: any[] = [];
|
||||
|
||||
// Look for various data patterns
|
||||
const elements = document.querySelectorAll('[data-facility], [data-location], article, .facility, .location, .dispensary');
|
||||
|
||||
elements.forEach((el) => {
|
||||
const text = el.textContent || '';
|
||||
|
||||
// Try to extract structured data
|
||||
if (text.length > 20 && text.length < 500) {
|
||||
// Look for name patterns
|
||||
const nameMatch = text.match(/([A-Z][a-z]+(?:\s+[A-Z][a-z]+){1,5})/);
|
||||
if (nameMatch) {
|
||||
results.push({
|
||||
rawText: text.substring(0, 200),
|
||||
element: el.className,
|
||||
});
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
return results;
|
||||
});
|
||||
|
||||
console.log(`\n📊 Found ${dispensaries.length} potential dispensary elements`);
|
||||
console.log(`📊 Captured ${apiData.length} API responses`);
|
||||
|
||||
if (apiData.length > 0) {
|
||||
console.log('\n🎯 Analyzing API data...');
|
||||
console.log(JSON.stringify(apiData[0], null, 2).substring(0, 1000));
|
||||
}
|
||||
|
||||
if (dispensaries.length > 0) {
|
||||
console.log('\n📋 Sample dispensary elements:');
|
||||
console.log(dispensaries.slice(0, 3));
|
||||
}
|
||||
|
||||
} catch (error) {
|
||||
console.error(`❌ Error: ${error}`);
|
||||
throw error;
|
||||
} finally {
|
||||
await browser.close();
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
scrapeAZDHSBetter();
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user