Skip to main content

eBoard Platform Manual Download Guide

Issue: Incapsula Bot Protection

eBoard Solutions (https://simbli.eboardsolutions.com) uses Incapsula anti-bot protection that blocks automated scraping, even with advanced tools like Playwright. The platform requires manual interaction to access meeting documents.

Affected School Districts

Tuscaloosa City Schools

Tuscaloosa County Schools

Manual Download Steps

1. Access Meeting Listings

  1. Visit the meetings URL above in your browser
  2. You'll see a calendar or list of board meetings
  3. Each meeting shows the date and has document links

2. Download Documents

For each meeting:

  • Click on the meeting date to view details
  • Look for:
    • Agenda (usually PDF)
    • Minutes (usually PDF)
    • Packets (supporting materials)
  • Right-click each document → "Save As"

3. Organize Downloads

Save files with naming pattern:

tuscaloosa_city_schools_YYYY-MM-DD_agenda.pdf
tuscaloosa_city_schools_YYYY-MM-DD_minutes.pdf

4. Import into System

Once downloaded, you can import them manually:

from pipeline.delta_lake import DeltaLakePipeline
from agents.scraper import ScraperAgent
import asyncio

async def import_manual_pdfs(pdf_directory: str):
"""Import manually downloaded PDFs into the system."""
scraper = ScraperAgent()
async with scraper:
documents = []

for pdf_path in Path(pdf_directory).glob("*.pdf"):
# Extract content from PDF
content = await scraper._scrape_pdf_document(str(pdf_path))

if content:
# Parse filename for metadata
parts = pdf_path.stem.split('_')
date_str = parts[2] if len(parts) > 2 else ""
doc_type = parts[3] if len(parts) > 3 else "document"

doc = {
'document_id': hashlib.md5(str(pdf_path).encode()).hexdigest(),
'source_url': f'file://{pdf_path}',
'municipality': 'Tuscaloosa City Schools',
'state': 'AL',
'meeting_date': date_str,
'meeting_type': 'Board Meeting',
'title': pdf_path.stem,
'content': content,
'metadata': {'source': 'manual_download', 'platform': 'eboard'}
}
documents.append(doc)

# Write to Delta Lake
pipeline = DeltaLakePipeline()
pipeline.write_raw_documents(documents)

return documents

# Usage:
# asyncio.run(import_manual_pdfs('/path/to/downloaded/pdfs'))

Alternative: RSS Feeds

Some eBoard installations offer RSS feeds or calendar exports:

  1. Look for RSS icon on meetings page
  2. Look for "Subscribe" or "Export to Calendar" options
  3. These may bypass the web interface restrictions

Future Enhancement Ideas

  1. Browser Extension: Create a Chrome extension that scrapes while you browse
  2. API Discovery: Research if eBoard has any undocumented APIs
  3. Selenium Grid: Use residential proxy services for more sophisticated bot evasion
  4. Contact District: Request bulk export of meeting documents directly

Why Automation Fails

eBoard's Incapsula protection includes:

  • Browser fingerprinting (detects headless browsers)
  • IP reputation checking
  • JavaScript challenges (requires full browser execution)
  • Session tracking (blocks rapid sequential requests)
  • Rate limiting per IP address

Even with Playwright running in visible mode, subsequent page navigations get blocked once the system detects automated patterns.

For comprehensive school district data:

  1. Prioritize: Focus on city government data (working well)
  2. Manual collection: Download key school board meetings manually
  3. Selective import: Import only the most relevant documents
  4. Direct contact: Reach out to school district IT for data sharing agreement

Status

  • Tuscaloosa City Government: Automated scraping works (SuiteOne Media platform)
  • Tuscaloosa City Schools: Manual download required (eBoard + Incapsula)
  • Tuscaloosa County Schools: Manual download required (eBoard + Incapsula)