Files
cannaiq/backend/SESSION_SUMMARY.md
2025-11-28 19:45:44 -07:00

6.7 KiB

Session Summary - Age Gate Bypass Implementation

Problem

Brands weren't populating when clicking "Scrape Store" for Curaleaf dispensaries. Root cause: category discovery was finding 0 categories because age verification gates were blocking access to the Dutchie menu.

What Was Accomplished

1. Created Universal Age Gate Bypass System

File: /home/kelly/dutchie-menus/backend/src/utils/age-gate.ts

Three main functions:

  • setAgeGateCookies(page, url, state) - Sets age gate bypass cookies BEFORE navigation
  • bypassAgeGate(page, state) - Attempts to bypass age gate AFTER page load using multiple methods
  • detectStateFromUrl(url) - Auto-detects state from URL patterns

Key Features:

  • Multiple bypass strategies: custom dropdowns, standard selects, buttons, state cards
  • Enhanced React event dispatching (mousedown, mouseup, click, change, input)
  • Proper failure detection - checks final URL to verify success
  • Cookie-based approach as primary method

2. Updated Category Discovery

File: /home/kelly/dutchie-menus/backend/src/services/category-discovery.ts

Changes at lines 120-129:

// Set age gate bypass cookies BEFORE navigation
const state = detectStateFromUrl(baseUrl);
await setAgeGateCookies(page, baseUrl, state);

await page.goto(baseUrl, { waitUntil: 'domcontentloaded', timeout: 60000 });

// If age gate still appears, try to bypass it
await bypassAgeGate(page, state);

3. Updated Scraper

File: /home/kelly/dutchie-menus/backend/src/services/scraper.ts

Changes at lines 379-392:

  • Imports setAgeGateCookies (line 10)
  • Sets cookies before navigation (line 381)
  • Attempts bypass if age gate still appears (line 392)

4. Test Scripts Created

test-improved-age-gate.ts - Tests cookie-based bypass approach capture-age-gate-cookies.ts - Opens visible browser to manually capture real cookies (requires X11) debug-age-gate-detailed.ts - Visible browser debugging debug-after-state-select.ts - Checks state after selecting Arizona

Current Status

Working

  1. Age gate detection properly identifies gates
  2. Cookie setting function works (cookies are set correctly)
  3. Multiple bypass methods attempt to click through gates
  4. Failure detection now accurately reports when bypass fails
  5. Category discovery successfully created 10 categories for Curaleaf store (ID 18)

Not Working

Curaleaf's Specific Age Gate:

  • URL: https://curaleaf.com/age-gate?returnurl=...
  • Uses React-based custom dropdown (shadcn/radix UI)
  • Ignores cookies we set
  • Doesn't respond to automated clicks
  • Doesn't trigger navigation after state selection

Why It Fails:

  • Curaleaf's React implementation likely checks for:
    • Real user interaction patterns
    • Additional signals beyond cookies (session storage, local storage)
    • Specific event sequences that automation doesn't replicate
  • Automated clicks don't trigger the React state changes needed

Test Results

# Latest test - Cookie approach
DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npx tsx test-improved-age-gate.ts

Result: FAILED

  • Cookies set successfully
  • Page still redirects to /age-gate
  • Click automation finds elements but navigation doesn't occur

Database State

Categories created: 10 categories for store_id 18 (Curaleaf - 48th Street)

  • Categories are in the database and ready
  • Dutchie menu detection works
  • Category URLs are properly formatted

Brands: Not tested yet (can't scrape without bypassing age gate)

Files Modified

  1. /home/kelly/dutchie-menus/backend/src/utils/age-gate.ts - Created
  2. /home/kelly/dutchie-menus/backend/src/services/category-discovery.ts - Updated
  3. /home/kelly/dutchie-menus/backend/src/services/scraper.ts - Updated
  4. /home/kelly/dutchie-menus/backend/curaleaf-cookies.json - Test cookies (don't work)

Next Steps / Options

Option 1: Test with Different Store

Find a Dutchie dispensary with a simpler age gate that responds to automation.

Option 2: Get Real Cookies

Run capture-age-gate-cookies.ts on a machine with display/X11:

# On machine with display
DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npx tsx capture-age-gate-cookies.ts
# Manually complete age gate in browser
# Press ENTER to capture real cookies
# Copy cookies to production server

Option 3: Switch to Playwright

Playwright may handle React apps better than Puppeteer. Consider migrating the age gate bypass to use Playwright.

Option 4: Use Proxy with Session Persistence

Use a rotating proxy service that maintains session state between requests.

Option 5: Focus on Other Stores First

Skip Curaleaf for now and implement scraping for dispensaries without complex age gates, then circle back.

How to Continue

  1. Check if categories exist:

    SELECT * FROM categories WHERE store_id = 18;
    
  2. Test scraping (will fail at age gate):

    # Via UI: Click "Scrape Store" button for Curaleaf
    # OR via test script
    
  3. Try different store:

    • Find another Dutchie dispensary
    • Add it to the stores table
    • Run category discovery
    • Test scraping

Important Notes

  • All cannabis dispensary sites will have age gates (as you correctly noted)
  • The bypass infrastructure is solid and should work for simpler gates
  • Curaleaf specifically uses advanced React patterns that resist automation
  • Categories ARE created successfully despite age gate (using fallback detection)
  • The scraper WILL work once we can bypass the age gate

Key Insight

Category discovery succeeded because it checks the page source to detect Dutchie, then uses predefined categories. Product scraping requires actually viewing the product listings, which can't happen while stuck at the age gate.

Commands to Resume

# Start backend (if not running)
cd /home/kelly/dutchie-menus/backend
PORT=3012 DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npm run dev

# Check categories
DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npx tsx -e "
  import { pool } from './src/db/migrate';
  const result = await pool.query('SELECT * FROM categories WHERE store_id = 18');
  console.log(result.rows);
  process.exit(0);
"

# Test age gate bypass
DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npx tsx test-improved-age-gate.ts

What to Tell Claude When Resuming

"Continue working on the age gate bypass for Curaleaf. We have categories created but can't scrape products because the age gate blocks us. The cookie approach didn't work. Options: try a different store, get real cookies, or switch to Playwright."