- Add backend stale process monitoring API (/api/stale-processes) - Add users management route - Add frontend landing page and stale process monitor UI on /scraper-tools - Move old development scripts to backend/archive/ - Update frontend build with new features 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
181 lines
6.7 KiB
Markdown
181 lines
6.7 KiB
Markdown
# Session Summary - Age Gate Bypass Implementation
|
|
|
|
## Problem
|
|
Brands weren't populating when clicking "Scrape Store" for Curaleaf dispensaries. Root cause: category discovery was finding 0 categories because age verification gates were blocking access to the Dutchie menu.
|
|
|
|
## What Was Accomplished
|
|
|
|
### 1. Created Universal Age Gate Bypass System
|
|
**File:** `/home/kelly/dutchie-menus/backend/src/utils/age-gate.ts`
|
|
|
|
Three main functions:
|
|
- `setAgeGateCookies(page, url, state)` - Sets age gate bypass cookies BEFORE navigation
|
|
- `bypassAgeGate(page, state)` - Attempts to bypass age gate AFTER page load using multiple methods
|
|
- `detectStateFromUrl(url)` - Auto-detects state from URL patterns
|
|
|
|
**Key Features:**
|
|
- Multiple bypass strategies: custom dropdowns, standard selects, buttons, state cards
|
|
- Enhanced React event dispatching (mousedown, mouseup, click, change, input)
|
|
- Proper failure detection - checks final URL to verify success
|
|
- Cookie-based approach as primary method
|
|
|
|
### 2. Updated Category Discovery
|
|
**File:** `/home/kelly/dutchie-menus/backend/src/services/category-discovery.ts`
|
|
|
|
Changes at lines 120-129:
|
|
```typescript
|
|
// Set age gate bypass cookies BEFORE navigation
|
|
const state = detectStateFromUrl(baseUrl);
|
|
await setAgeGateCookies(page, baseUrl, state);
|
|
|
|
await page.goto(baseUrl, { waitUntil: 'domcontentloaded', timeout: 60000 });
|
|
|
|
// If age gate still appears, try to bypass it
|
|
await bypassAgeGate(page, state);
|
|
```
|
|
|
|
### 3. Updated Scraper
|
|
**File:** `/home/kelly/dutchie-menus/backend/src/services/scraper.ts`
|
|
|
|
Changes at lines 379-392:
|
|
- Imports setAgeGateCookies (line 10)
|
|
- Sets cookies before navigation (line 381)
|
|
- Attempts bypass if age gate still appears (line 392)
|
|
|
|
### 4. Test Scripts Created
|
|
|
|
**`test-improved-age-gate.ts`** - Tests cookie-based bypass approach
|
|
**`capture-age-gate-cookies.ts`** - Opens visible browser to manually capture real cookies (requires X11)
|
|
**`debug-age-gate-detailed.ts`** - Visible browser debugging
|
|
**`debug-after-state-select.ts`** - Checks state after selecting Arizona
|
|
|
|
## Current Status
|
|
|
|
### ✅ Working
|
|
1. Age gate detection properly identifies gates
|
|
2. Cookie setting function works (cookies are set correctly)
|
|
3. Multiple bypass methods attempt to click through gates
|
|
4. Failure detection now accurately reports when bypass fails
|
|
5. Category discovery successfully created 10 categories for Curaleaf store (ID 18)
|
|
|
|
### ❌ Not Working
|
|
**Curaleaf's Specific Age Gate:**
|
|
- URL: `https://curaleaf.com/age-gate?returnurl=...`
|
|
- Uses React-based custom dropdown (shadcn/radix UI)
|
|
- Ignores cookies we set
|
|
- Doesn't respond to automated clicks
|
|
- Doesn't trigger navigation after state selection
|
|
|
|
**Why It Fails:**
|
|
- Curaleaf's React implementation likely checks for:
|
|
- Real user interaction patterns
|
|
- Additional signals beyond cookies (session storage, local storage)
|
|
- Specific event sequences that automation doesn't replicate
|
|
- Automated clicks don't trigger the React state changes needed
|
|
|
|
## Test Results
|
|
|
|
```bash
|
|
# Latest test - Cookie approach
|
|
DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npx tsx test-improved-age-gate.ts
|
|
```
|
|
|
|
**Result:** ❌ FAILED
|
|
- Cookies set successfully
|
|
- Page still redirects to `/age-gate`
|
|
- Click automation finds elements but navigation doesn't occur
|
|
|
|
## Database State
|
|
|
|
**Categories created:** 10 categories for store_id 18 (Curaleaf - 48th Street)
|
|
- Categories are in the database and ready
|
|
- Dutchie menu detection works
|
|
- Category URLs are properly formatted
|
|
|
|
**Brands:** Not tested yet (can't scrape without bypassing age gate)
|
|
|
|
## Files Modified
|
|
|
|
1. `/home/kelly/dutchie-menus/backend/src/utils/age-gate.ts` - Created
|
|
2. `/home/kelly/dutchie-menus/backend/src/services/category-discovery.ts` - Updated
|
|
3. `/home/kelly/dutchie-menus/backend/src/services/scraper.ts` - Updated
|
|
4. `/home/kelly/dutchie-menus/backend/curaleaf-cookies.json` - Test cookies (don't work)
|
|
|
|
## Next Steps / Options
|
|
|
|
### Option 1: Test with Different Store
|
|
Find a Dutchie dispensary with a simpler age gate that responds to automation.
|
|
|
|
### Option 2: Get Real Cookies
|
|
Run `capture-age-gate-cookies.ts` on a machine with display/X11:
|
|
```bash
|
|
# On machine with display
|
|
DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npx tsx capture-age-gate-cookies.ts
|
|
# Manually complete age gate in browser
|
|
# Press ENTER to capture real cookies
|
|
# Copy cookies to production server
|
|
```
|
|
|
|
### Option 3: Switch to Playwright
|
|
Playwright may handle React apps better than Puppeteer. Consider migrating the age gate bypass to use Playwright.
|
|
|
|
### Option 4: Use Proxy with Session Persistence
|
|
Use a rotating proxy service that maintains session state between requests.
|
|
|
|
### Option 5: Focus on Other Stores First
|
|
Skip Curaleaf for now and implement scraping for dispensaries without complex age gates, then circle back.
|
|
|
|
## How to Continue
|
|
|
|
1. **Check if categories exist:**
|
|
```sql
|
|
SELECT * FROM categories WHERE store_id = 18;
|
|
```
|
|
|
|
2. **Test scraping (will fail at age gate):**
|
|
```bash
|
|
# Via UI: Click "Scrape Store" button for Curaleaf
|
|
# OR via test script
|
|
```
|
|
|
|
3. **Try different store:**
|
|
- Find another Dutchie dispensary
|
|
- Add it to the stores table
|
|
- Run category discovery
|
|
- Test scraping
|
|
|
|
## Important Notes
|
|
|
|
- All cannabis dispensary sites will have age gates (as you correctly noted)
|
|
- The bypass infrastructure is solid and should work for simpler gates
|
|
- Curaleaf specifically uses advanced React patterns that resist automation
|
|
- Categories ARE created successfully despite age gate (using fallback detection)
|
|
- The scraper WILL work once we can bypass the age gate
|
|
|
|
## Key Insight
|
|
|
|
Category discovery succeeded because it checks the page source to detect Dutchie, then uses predefined categories. Product scraping requires actually viewing the product listings, which can't happen while stuck at the age gate.
|
|
|
|
## Commands to Resume
|
|
|
|
```bash
|
|
# Start backend (if not running)
|
|
cd /home/kelly/dutchie-menus/backend
|
|
PORT=3012 DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npm run dev
|
|
|
|
# Check categories
|
|
DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npx tsx -e "
|
|
import { pool } from './src/db/migrate';
|
|
const result = await pool.query('SELECT * FROM categories WHERE store_id = 18');
|
|
console.log(result.rows);
|
|
process.exit(0);
|
|
"
|
|
|
|
# Test age gate bypass
|
|
DATABASE_URL="postgresql://dutchie:dutchie_local_pass@localhost:54320/dutchie_menus" npx tsx test-improved-age-gate.ts
|
|
```
|
|
|
|
## What to Tell Claude When Resuming
|
|
|
|
"Continue working on the age gate bypass for Curaleaf. We have categories created but can't scrape products because the age gate blocks us. The cookie approach didn't work. Options: try a different store, get real cookies, or switch to Playwright."
|