6.7 KiB
Session Summary - Age Gate Bypass Implementation
Problem
Brands weren't populating when clicking "Scrape Store" for Curaleaf dispensaries. Root cause: category discovery was finding 0 categories because age verification gates were blocking access to the Dutchie menu.
What Was Accomplished
1. Created Universal Age Gate Bypass System
File: /home/kelly/dutchie-menus/backend/src/utils/age-gate.ts
Three main functions:
setAgeGateCookies(page, url, state)- Sets age gate bypass cookies BEFORE navigationbypassAgeGate(page, state)- Attempts to bypass age gate AFTER page load using multiple methodsdetectStateFromUrl(url)- Auto-detects state from URL patterns
Key Features:
- Multiple bypass strategies: custom dropdowns, standard selects, buttons, state cards
- Enhanced React event dispatching (mousedown, mouseup, click, change, input)
- Proper failure detection - checks final URL to verify success
- Cookie-based approach as primary method
2. Updated Category Discovery
File: /home/kelly/dutchie-menus/backend/src/services/category-discovery.ts
Changes at lines 120-129:
// Set age gate bypass cookies BEFORE navigation
const state = detectStateFromUrl(baseUrl);
await setAgeGateCookies(page, baseUrl, state);
await page.goto(baseUrl, { waitUntil: 'domcontentloaded', timeout: 60000 });
// If age gate still appears, try to bypass it
await bypassAgeGate(page, state);
3. Updated Scraper
File: /home/kelly/dutchie-menus/backend/src/services/scraper.ts
Changes at lines 379-392:
- Imports setAgeGateCookies (line 10)
- Sets cookies before navigation (line 381)
- Attempts bypass if age gate still appears (line 392)
4. Test Scripts Created
test-improved-age-gate.ts - Tests cookie-based bypass approach
capture-age-gate-cookies.ts - Opens visible browser to manually capture real cookies (requires X11)
debug-age-gate-detailed.ts - Visible browser debugging
debug-after-state-select.ts - Checks state after selecting Arizona
Current Status
✅ Working
- Age gate detection properly identifies gates
- Cookie setting function works (cookies are set correctly)
- Multiple bypass methods attempt to click through gates
- Failure detection now accurately reports when bypass fails
- Category discovery successfully created 10 categories for Curaleaf store (ID 18)
❌ Not Working
Curaleaf's Specific Age Gate:
- URL:
https://curaleaf.com/age-gate?returnurl=... - Uses React-based custom dropdown (shadcn/radix UI)
- Ignores cookies we set
- Doesn't respond to automated clicks
- Doesn't trigger navigation after state selection
Why It Fails:
- Curaleaf's React implementation likely checks for:
- Real user interaction patterns
- Additional signals beyond cookies (session storage, local storage)
- Specific event sequences that automation doesn't replicate
- Automated clicks don't trigger the React state changes needed
Test Results
# Latest test - Cookie approach
DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npx tsx test-improved-age-gate.ts
Result: ❌ FAILED
- Cookies set successfully
- Page still redirects to
/age-gate - Click automation finds elements but navigation doesn't occur
Database State
Categories created: 10 categories for store_id 18 (Curaleaf - 48th Street)
- Categories are in the database and ready
- Dutchie menu detection works
- Category URLs are properly formatted
Brands: Not tested yet (can't scrape without bypassing age gate)
Files Modified
/home/kelly/dutchie-menus/backend/src/utils/age-gate.ts- Created/home/kelly/dutchie-menus/backend/src/services/category-discovery.ts- Updated/home/kelly/dutchie-menus/backend/src/services/scraper.ts- Updated/home/kelly/dutchie-menus/backend/curaleaf-cookies.json- Test cookies (don't work)
Next Steps / Options
Option 1: Test with Different Store
Find a Dutchie dispensary with a simpler age gate that responds to automation.
Option 2: Get Real Cookies
Run capture-age-gate-cookies.ts on a machine with display/X11:
# On machine with display
DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npx tsx capture-age-gate-cookies.ts
# Manually complete age gate in browser
# Press ENTER to capture real cookies
# Copy cookies to production server
Option 3: Switch to Playwright
Playwright may handle React apps better than Puppeteer. Consider migrating the age gate bypass to use Playwright.
Option 4: Use Proxy with Session Persistence
Use a rotating proxy service that maintains session state between requests.
Option 5: Focus on Other Stores First
Skip Curaleaf for now and implement scraping for dispensaries without complex age gates, then circle back.
How to Continue
-
Check if categories exist:
SELECT * FROM categories WHERE store_id = 18; -
Test scraping (will fail at age gate):
# Via UI: Click "Scrape Store" button for Curaleaf # OR via test script -
Try different store:
- Find another Dutchie dispensary
- Add it to the stores table
- Run category discovery
- Test scraping
Important Notes
- All cannabis dispensary sites will have age gates (as you correctly noted)
- The bypass infrastructure is solid and should work for simpler gates
- Curaleaf specifically uses advanced React patterns that resist automation
- Categories ARE created successfully despite age gate (using fallback detection)
- The scraper WILL work once we can bypass the age gate
Key Insight
Category discovery succeeded because it checks the page source to detect Dutchie, then uses predefined categories. Product scraping requires actually viewing the product listings, which can't happen while stuck at the age gate.
Commands to Resume
# Start backend (if not running)
cd /home/kelly/dutchie-menus/backend
PORT=3012 DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npm run dev
# Check categories
DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npx tsx -e "
import { pool } from './src/db/migrate';
const result = await pool.query('SELECT * FROM categories WHERE store_id = 18');
console.log(result.rows);
process.exit(0);
"
# Test age gate bypass
DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npx tsx test-improved-age-gate.ts
What to Tell Claude When Resuming
"Continue working on the age gate bypass for Curaleaf. We have categories created but can't scrape products because the age gate blocks us. The cookie approach didn't work. Options: try a different store, get real cookies, or switch to Playwright."