Automated eBoard Scraping Solutions
This guide covers fully automated solutions to bypass Incapsula protection without manual cookie extraction.
Summary of Options
| Solution | Cost | Difficulty | Success Rate | Speed |
|---|---|---|---|---|
| 1. Undetected ChromeDriver | Free | Easy | 70-85% | Medium |
| 2. Playwright + Residential Proxies | $10-50/month | Medium | 90-95% | Fast |
| 3. Browser Automation Services | $30-100/month | Easy | 95-99% | Fast |
| 4. Captcha Solving Service | $1-3/1000 solves | Medium | 85-90% | Slow |
Option 1: Undetected ChromeDriver (Recommended for Free Solution)
Why It Works
undetected-chromedriver patches Selenium to bypass bot detection:
- Removes
navigator.webdriverflag - Uses real Chrome binary (not ChromeDriver)
- Randomizes browser fingerprints
- Avoids common detection patterns
Installation
source .venv/bin/activate
pip install undetected-chromedriver
Usage
# Run the new scraper
python agents/scraper_undetected.py
Or integrate into main scraper:
python main.py scrape \
--state AL \
--municipality "Tuscaloosa City Schools" \
--url http://simbli.eboardsolutions.com/index.aspx?s=2088 \
--platform eboard \
--use-undetected \
--max-events 0
Pros
- ✅ Free
- ✅ No external services required
- ✅ Works for most Incapsula sites
- ✅ Easy to implement
Cons
- ❌ May still fail on very strict Incapsula settings
- ❌ Requires GUI environment (can't run headless on some systems)
- ❌ Slower than Playwright
Option 2: Residential Proxies (Best Success Rate)
Why It Works
Incapsula detects datacenter IPs. Residential proxies route through real home IPs that appear legitimate.
Recommended Providers
BrightData (formerly Luminati)
- Cost: ~$15/GB or $500/month unlimited
- Success rate: 95%+
- Rotating residential IPs
- https://brightdata.com
SmartProxy
- Cost: $75/month for 5GB
- Easy to use
- Good for small projects
- https://smartproxy.com
Oxylabs
- Cost: $15/GB
- Enterprise-grade
- https://oxylabs.io
Implementation
# Install
pip install playwright
# Configure proxy in scraper
async with async_playwright() as p:
browser = await p.chromium.launch(
proxy={
'server': 'http://proxy.smartproxy.com:10000',
'username': 'your_username',
'password': 'your_password'
}
)
# ... rest of scraping code
Add to agents/scraper.py
# In _scrape_eboard method, add:
import os
proxy_config = None
if os.getenv('RESIDENTIAL_PROXY_URL'):
proxy_config = {
'server': os.getenv('RESIDENTIAL_PROXY_URL'),
'username': os.getenv('PROXY_USERNAME'),
'password': os.getenv('PROXY_PASSWORD')
}
browser = await p.chromium.launch(
proxy=proxy_config,
headless=True
)
.env Configuration
# Add to .env file
RESIDENTIAL_PROXY_URL=http://proxy.smartproxy.com:10000
PROXY_USERNAME=your_username
PROXY_PASSWORD=your_password
Pros
- ✅ Highest success rate (95%+)
- ✅ Works on any Incapsula configuration
- ✅ Can run headless
- ✅ Fast and reliable
Cons
- ❌ Costs money ($10-50/month for small projects)
- ❌ Requires account setup
- ❌ May have usage limits
Option 3: Browser Automation Services (Easiest)
Why It Works
These services run real browsers in the cloud and handle all anti-bot evasion automatically.
Recommended Services
Browserless.io
- Cost: $40/month for 20 hours
- Managed Playwright/Puppeteer
- Built-in proxy rotation
- https://browserless.io
from playwright.async_api import async_playwright
async with async_playwright() as p:
browser = await p.chromium.connect(
'wss://chrome.browserless.io?token=YOUR_TOKEN'
)
page = await browser.new_page()
await page.goto('https://simbli.eboardsolutions.com/...')
ScrapingBee
- Cost: $49/month for 100k credits
- Handles all anti-bot automatically
- Simple REST API
- https://scrapingbee.com
import requests
response = requests.get(
'https://app.scrapingbee.com/api/v1/',
params={
'api_key': 'YOUR_API_KEY',
'url': 'https://simbli.eboardsolutions.com/...',
'render_js': 'true',
'premium_proxy': 'true'
}
)
content = response.text
Apify
- Cost: $49/month
- Pre-built scrapers for common sites
- Can create custom scrapers
- https://apify.com
Pros
- ✅ Fully managed (no maintenance)
- ✅ Very high success rate
- ✅ Handles updates to anti-bot automatically
- ✅ Can scale easily
Cons
- ❌ Most expensive option
- ❌ Requires external service dependency
- ❌ May have rate limits
Option 4: Captcha Solving Service
Why It Works
If Incapsula shows a CAPTCHA, these services solve it automatically using AI or human workers.
Recommended Services
2Captcha
- Cost: $2.99 per 1000 CAPTCHAs
- Supports reCAPTCHA, hCaptcha, Incapsula
- https://2captcha.com
Anti-Captcha
- Cost: $2 per 1000 CAPTCHAs
- Fast (10-30 seconds)
- https://anti-captcha.com
Implementation
pip install 2captcha-python
from twocaptcha import TwoCaptcha
import os
solver = TwoCaptcha(os.getenv('2CAPTCHA_API_KEY'))
# When Incapsula shows CAPTCHA
try:
result = solver.recaptcha(
sitekey='SITE_KEY_FROM_PAGE',
url='https://simbli.eboardsolutions.com/...'
)
# Inject solution into page
await page.evaluate(f'document.getElementById("g-recaptcha-response").innerHTML="{result["code"]}";')
await page.click('button[type="submit"]')
except Exception as e:
logger.error(f"CAPTCHA solving failed: {e}")
Pros
- ✅ Solves CAPTCHAs automatically
- ✅ Relatively cheap
- ✅ Works with existing scraper
Cons
- ❌ Only useful if CAPTCHA appears
- ❌ Slower (10-30 seconds per solve)
- ❌ Not 100% success rate
- ❌ Costs money per use
Option 5: Reverse Engineer the API
Why It Works
eBoard likely has backend APIs that mobile apps or internal tools use. These APIs may have weaker protection.
How to Find APIs
-
Use browser DevTools:
# Open eBoard site in Chrome# Press F12 → Network tab# Look for XHR/Fetch requests# Check requests to:# - /api/# - .ashx files# - .asmx files (SOAP endpoints) -
Check for mobile app:
- Search App Store / Google Play for "eBoard Solutions"
- Decompile APK to find API endpoints
- Use mitmproxy to intercept app traffic
-
Look for GraphQL/REST endpoints:
curl -I https://simbli.eboardsolutions.com/api/meetingscurl -I https://simbli.eboardsolutions.com/graphql
Example (if API exists)
import httpx
# Hypothetical API endpoint
async with httpx.AsyncClient() as client:
response = await client.get(
'https://simbli.eboardsolutions.com/api/v1/meetings',
params={'school_id': 2088},
headers={'User-Agent': 'eBoard-Mobile/1.0'}
)
meetings = response.json()
Pros
- ✅ Fastest option
- ✅ No bot detection
- ✅ Free
- ✅ Most reliable
Cons
- ❌ Requires reverse engineering skills
- ❌ API may not exist
- ❌ API may require authentication
- ❌ May violate Terms of Service
Recommended Approach
For Personal/Research Projects (Free)
Start with Option 1 (Undetected ChromeDriver)
# Install
pip install undetected-chromedriver
# Run test
python agents/scraper_undetected.py
If that fails, use manual cookies (current approach) as fallback.
For Production/Reliable Scraping ($)
Use Option 2 (Residential Proxies)
Budget: ~$15-75/month depending on volume
Best provider for this use case: SmartProxy ($75/month for 5GB)
# Sign up at smartproxy.com
# Add credentials to .env
# Enable proxy in scraper
RESIDENTIAL_PROXY_URL=http://proxy.smartproxy.com:10000
PROXY_USERNAME=your_username
PROXY_PASSWORD=your_password
For Large Scale / Enterprise
Use Option 3 (Browserless.io or ScrapingBee)
Budget: $40-100/month
Most reliable, fully managed solution.
Implementation Plan
Phase 1: Try Free Options
- ✅ Install undetected-chromedriver
- ✅ Test on Tuscaloosa City Schools
- ✅ Measure success rate over 10 runs
- If success rate > 80%, use this going forward
Phase 2: Add Proxy Support (If Phase 1 Fails)
- Add proxy configuration to existing Playwright scraper
- Sign up for SmartProxy trial
- Test with residential proxy
- If successful, add to production
Phase 3: Optimize
- Add retry logic with exponential backoff
- Rotate between different methods
- Cache successful cookies for reuse
- Monitor success rate and adjust
Next Steps
Would you like me to:
- Integrate undetected-chromedriver into the main scraper (1-click solution)
- Add residential proxy support to existing code (requires proxy account)
- Try to reverse engineer the eBoard API (advanced, may take time)
- Create a hybrid approach that tries multiple methods automatically
Let me know which direction you'd prefer!