eBoard Platform Manual Download Guide
Issue: Incapsula Bot Protectionโ
eBoard Solutions (https://simbli.eboardsolutions.com) uses Incapsula anti-bot protection that blocks automated scraping, even with advanced tools like Playwright. The platform requires manual interaction to access meeting documents.
Affected School Districtsโ
| District (AL) | jurisdiction_id | Public hub / board page | Simbli agendas & minutes |
|---|---|---|---|
| Tuscaloosa City School District | school_district_0103360 | Board of Education (Finalsite) | Simbli meeting listing S=2088 ยท index s=2088 |
| Tuscaloosa County School District (TCSS) | school_district_0103390 | Board of Education (Finalsite; links to Simbli) | Simbli meeting listing S=2092 ยท index s=2092 |
Curated Tuscaloosa city (school_district_0103360) and county (school_district_0103390) hub + Simbli URLs are in the dbt seed jurisdiction_website_url_overrides.csv. NCES does not emit Simbli links, so %simbli% appears only via these overrides after dbt seed + dbt run --select int_jurisdiction_websites.
Query from Postgresโ
After dbt seed and dbt run --select int_jurisdiction_websites, use intermediate.int_jurisdiction_websites (see dbt_project/models/intermediate/int_jurisdiction_websites.sql โ public.* may be wrong or stale).
Tuscaloosa County School District โ hub + Simbli:
SELECT
jurisdiction_id,
organization_name,
website_source,
website_url
FROM intermediate.int_jurisdiction_websites
WHERE jurisdiction_id = 'school_district_0103390'
AND (
website_url ILIKE '%tcss.net%'
OR website_url ILIKE '%simbli.eboardsolutions.com%'
)
ORDER BY website_url;
Tuscaloosa City School District โ board hub + Simbli:
SELECT
jurisdiction_id,
organization_name,
website_source,
website_url
FROM intermediate.int_jurisdiction_websites
WHERE jurisdiction_id = 'school_district_0103360'
AND (
website_url ILIKE '%tuscaloosacityschools.com%'
OR website_url ILIKE '%simbli.eboardsolutions.com%'
)
ORDER BY website_url;
Simbli URLs only (Tuscaloosa City) โ prefer trim(website_url) and path patterns so you still match Simbli if the host string differs slightly:
SELECT
jurisdiction_id,
organization_name,
website_source,
trim(website_url) AS website_url
FROM intermediate.int_jurisdiction_websites
WHERE jurisdiction_id = 'school_district_0103360'
AND (
trim(website_url) ILIKE '%simbli%'
OR trim(website_url) ILIKE '%SB_MeetingListing.aspx%'
OR trim(website_url) ILIKE '%/SB_Meetings/%'
)
ORDER BY website_url;
If this returns nothing, confirm the seed row exists and rebuild int_jurisdiction_websites (dbt seed + dbt run).
Both districts (debug):
SELECT jurisdiction_id, organization_name, website_source, website_url
FROM intermediate.int_jurisdiction_websites
WHERE jurisdiction_id IN ('school_district_0103360', 'school_district_0103390')
ORDER BY jurisdiction_id, website_url;
Seed table only:
SELECT jurisdiction_id, website_url
FROM seeds.jurisdiction_website_url_overrides
WHERE jurisdiction_id IN ('school_district_0103360', 'school_district_0103390')
ORDER BY jurisdiction_id, website_url;
Manual Download Stepsโ
1. Access Meeting Listingsโ
- Visit the meetings URL above in your browser
- You'll see a calendar or list of board meetings
- Each meeting shows the date and has document links
2. Download Documentsโ
For each meeting:
- Click on the meeting date to view details
- Look for:
- Agenda (usually PDF)
- Minutes (usually PDF)
- Packets (supporting materials)
- Right-click each document โ "Save As"
3. Organize Downloadsโ
Save files with naming pattern:
tuscaloosa_city_schools_YYYY-MM-DD_agenda.pdf
tuscaloosa_city_schools_YYYY-MM-DD_minutes.pdf
4. Import into Systemโ
Once downloaded, you can import them manually:
from pipeline.delta_lake import DeltaLakePipeline
from agents.scraper import ScraperAgent
import asyncio
async def import_manual_pdfs(pdf_directory: str):
"""Import manually downloaded PDFs into the system."""
scraper = ScraperAgent()
async with scraper:
documents = []
for pdf_path in Path(pdf_directory).glob("*.pdf"):
# Extract content from PDF
content = await scraper._scrape_pdf_document(str(pdf_path))
if content:
# Parse filename for metadata
parts = pdf_path.stem.split('_')
date_str = parts[2] if len(parts) > 2 else ""
doc_type = parts[3] if len(parts) > 3 else "document"
doc = {
'document_id': hashlib.md5(str(pdf_path).encode()).hexdigest(),
'source_url': f'file://{pdf_path}',
'municipality': 'Tuscaloosa City Schools',
'state': 'AL',
'meeting_date': date_str,
'meeting_type': 'Board Meeting',
'title': pdf_path.stem,
'content': content,
'metadata': {'source': 'manual_download', 'platform': 'eboard'}
}
documents.append(doc)
# Write to Delta Lake
pipeline = DeltaLakePipeline()
pipeline.write_raw_documents(documents)
return documents
# Usage:
# asyncio.run(import_manual_pdfs('/path/to/downloaded/pdfs'))
Alternative: RSS Feedsโ
Some eBoard installations offer RSS feeds or calendar exports:
- Look for RSS icon on meetings page
- Look for "Subscribe" or "Export to Calendar" options
- These may bypass the web interface restrictions
Future Enhancement Ideasโ
- Browser Extension: Create a Chrome extension that scrapes while you browse
- API Discovery: Research if eBoard has any undocumented APIs
- Selenium Grid: Use residential proxy services for more sophisticated bot evasion
- Contact District: Request bulk export of meeting documents directly
Why Automation Failsโ
eBoard's Incapsula protection includes:
- Browser fingerprinting (detects headless browsers)
- IP reputation checking
- JavaScript challenges (requires full browser execution)
- Session tracking (blocks rapid sequential requests)
- Rate limiting per IP address
Even with Playwright running in visible mode, subsequent page navigations get blocked once the system detects automated patterns.
Recommended Approachโ
For comprehensive school district data:
- Prioritize: Focus on city government data (working well)
- Manual collection: Download key school board meetings manually
- Selective import: Import only the most relevant documents
- Direct contact: Reach out to school district IT for data sharing agreement
Statusโ
- โ Tuscaloosa City Government: Automated scraping works (SuiteOne Media platform)
- โ Tuscaloosa City Schools: Manual download required (eBoard + Incapsula)
- โ Tuscaloosa County Schools: Manual download required (eBoard + Incapsula)