eBoard Platform Manual Download Guide
Issue: Incapsula Bot Protection
eBoard Solutions (https://simbli.eboardsolutions.com) uses Incapsula anti-bot protection that blocks automated scraping, even with advanced tools like Playwright. The platform requires manual interaction to access meeting documents.
Affected School Districts
Tuscaloosa City Schools
- URL: http://simbli.eboardsolutions.com/index.aspx?s=2088
- Meetings: http://simbli.eboardsolutions.com/SB_Meetings/SB_MeetingListing.aspx?S=2088
Tuscaloosa County Schools
- URL: https://simbli.eboardsolutions.com/SB_Meetings/SB_MeetingListing.aspx?S=2092
- Website: https://www.tcss.net/board-of-education (links to eBoard)
Manual Download Steps
1. Access Meeting Listings
- Visit the meetings URL above in your browser
- You'll see a calendar or list of board meetings
- Each meeting shows the date and has document links
2. Download Documents
For each meeting:
- Click on the meeting date to view details
- Look for:
- Agenda (usually PDF)
- Minutes (usually PDF)
- Packets (supporting materials)
- Right-click each document → "Save As"
3. Organize Downloads
Save files with naming pattern:
tuscaloosa_city_schools_YYYY-MM-DD_agenda.pdf
tuscaloosa_city_schools_YYYY-MM-DD_minutes.pdf
4. Import into System
Once downloaded, you can import them manually:
from pipeline.delta_lake import DeltaLakePipeline
from agents.scraper import ScraperAgent
import asyncio
async def import_manual_pdfs(pdf_directory: str):
"""Import manually downloaded PDFs into the system."""
scraper = ScraperAgent()
async with scraper:
documents = []
for pdf_path in Path(pdf_directory).glob("*.pdf"):
# Extract content from PDF
content = await scraper._scrape_pdf_document(str(pdf_path))
if content:
# Parse filename for metadata
parts = pdf_path.stem.split('_')
date_str = parts[2] if len(parts) > 2 else ""
doc_type = parts[3] if len(parts) > 3 else "document"
doc = {
'document_id': hashlib.md5(str(pdf_path).encode()).hexdigest(),
'source_url': f'file://{pdf_path}',
'municipality': 'Tuscaloosa City Schools',
'state': 'AL',
'meeting_date': date_str,
'meeting_type': 'Board Meeting',
'title': pdf_path.stem,
'content': content,
'metadata': {'source': 'manual_download', 'platform': 'eboard'}
}
documents.append(doc)
# Write to Delta Lake
pipeline = DeltaLakePipeline()
pipeline.write_raw_documents(documents)
return documents
# Usage:
# asyncio.run(import_manual_pdfs('/path/to/downloaded/pdfs'))
Alternative: RSS Feeds
Some eBoard installations offer RSS feeds or calendar exports:
- Look for RSS icon on meetings page
- Look for "Subscribe" or "Export to Calendar" options
- These may bypass the web interface restrictions
Future Enhancement Ideas
- Browser Extension: Create a Chrome extension that scrapes while you browse
- API Discovery: Research if eBoard has any undocumented APIs
- Selenium Grid: Use residential proxy services for more sophisticated bot evasion
- Contact District: Request bulk export of meeting documents directly
Why Automation Fails
eBoard's Incapsula protection includes:
- Browser fingerprinting (detects headless browsers)
- IP reputation checking
- JavaScript challenges (requires full browser execution)
- Session tracking (blocks rapid sequential requests)
- Rate limiting per IP address
Even with Playwright running in visible mode, subsequent page navigations get blocked once the system detects automated patterns.
Recommended Approach
For comprehensive school district data:
- Prioritize: Focus on city government data (working well)
- Manual collection: Download key school board meetings manually
- Selective import: Import only the most relevant documents
- Direct contact: Reach out to school district IT for data sharing agreement
Status
- ✅ Tuscaloosa City Government: Automated scraping works (SuiteOne Media platform)
- ❌ Tuscaloosa City Schools: Manual download required (eBoard + Incapsula)
- ❌ Tuscaloosa County Schools: Manual download required (eBoard + Incapsula)