eBoard Platform Manual Download Guide

Issue: Incapsula Bot Protection

eBoard Solutions (https://simbli.eboardsolutions.com) uses Incapsula anti-bot protection that blocks automated scraping, even with advanced tools like Playwright. The platform requires manual interaction to access meeting documents.

Affected School Districts

District (AL)	`jurisdiction_id`	Public hub / board page	Simbli agendas & minutes
Tuscaloosa City School District	`school_district_0103360`	Board of Education (Finalsite)	Simbli meeting listing `S=2088` · index `s=2088`
Tuscaloosa County School District (TCSS)	`school_district_0103390`	Board of Education (Finalsite; links to Simbli)	Simbli meeting listing `S=2092` · index `s=2092`

Curated Tuscaloosa city (school_district_0103360) and county (school_district_0103390) hub + Simbli URLs are in the dbt seed jurisdiction_website_url_overrides.csv. NCES does not emit Simbli links, so %simbli% appears only via these overrides after dbt seed + dbt run --select int_jurisdiction_websites.

Query from Postgres

After dbt seed and dbt run --select int_jurisdiction_websites, use intermediate.int_jurisdiction_websites (see dbt_project/models/intermediate/int_jurisdiction_websites.sql — public.* may be wrong or stale).

Tuscaloosa County School District — hub + Simbli:

SELECT
  jurisdiction_id,
  organization_name,
  website_source,
  website_url
FROM intermediate.int_jurisdiction_websites
WHERE jurisdiction_id = 'school_district_0103390'
  AND (
    website_url ILIKE '%tcss.net%'
    OR website_url ILIKE '%simbli.eboardsolutions.com%'
  )
ORDER BY website_url;

Tuscaloosa City School District — board hub + Simbli:

SELECT
  jurisdiction_id,
  organization_name,
  website_source,
  website_url
FROM intermediate.int_jurisdiction_websites
WHERE jurisdiction_id = 'school_district_0103360'
  AND (
    website_url ILIKE '%tuscaloosacityschools.com%'
    OR website_url ILIKE '%simbli.eboardsolutions.com%'
  )
ORDER BY website_url;

Simbli URLs only (Tuscaloosa City) — prefer trim(website_url) and path patterns so you still match Simbli if the host string differs slightly:

SELECT
  jurisdiction_id,
  organization_name,
  website_source,
  trim(website_url) AS website_url
FROM intermediate.int_jurisdiction_websites
WHERE jurisdiction_id = 'school_district_0103360'
  AND (
    trim(website_url) ILIKE '%simbli%'
    OR trim(website_url) ILIKE '%SB_MeetingListing.aspx%'
    OR trim(website_url) ILIKE '%/SB_Meetings/%'
  )
ORDER BY website_url;

If this returns nothing, confirm the seed row exists and rebuild int_jurisdiction_websites (dbt seed + dbt run).

Both districts (debug):

SELECT jurisdiction_id, organization_name, website_source, website_url
FROM intermediate.int_jurisdiction_websites
WHERE jurisdiction_id IN ('school_district_0103360', 'school_district_0103390')
ORDER BY jurisdiction_id, website_url;

Seed table only:

SELECT jurisdiction_id, website_url
FROM seeds.jurisdiction_website_url_overrides
WHERE jurisdiction_id IN ('school_district_0103360', 'school_district_0103390')
ORDER BY jurisdiction_id, website_url;

Manual Download Steps

1. Access Meeting Listings

Visit the meetings URL above in your browser
You'll see a calendar or list of board meetings
Each meeting shows the date and has document links

2. Download Documents

For each meeting:

Click on the meeting date to view details
Look for:
- Agenda (usually PDF)
- Minutes (usually PDF)
- Packets (supporting materials)
Right-click each document → "Save As"

3. Organize Downloads

Save files with naming pattern:

tuscaloosa_city_schools_YYYY-MM-DD_agenda.pdf
tuscaloosa_city_schools_YYYY-MM-DD_minutes.pdf

4. Import into System

Once downloaded, you can import them manually:

from pipeline.delta_lake import DeltaLakePipeline
from agents.scraper import ScraperAgent
import asyncio

async def import_manual_pdfs(pdf_directory: str):
    """Import manually downloaded PDFs into the system."""
    scraper = ScraperAgent()
    async with scraper:
        documents = []
        
        for pdf_path in Path(pdf_directory).glob("*.pdf"):
            # Extract content from PDF
            content = await scraper._scrape_pdf_document(str(pdf_path))
            
            if content:
                # Parse filename for metadata
                parts = pdf_path.stem.split('_')
                date_str = parts[2] if len(parts) > 2 else ""
                doc_type = parts[3] if len(parts) > 3 else "document"
                
                doc = {
                    'document_id': hashlib.md5(str(pdf_path).encode()).hexdigest(),
                    'source_url': f'file://{pdf_path}',
                    'municipality': 'Tuscaloosa City Schools',
                    'state': 'AL',
                    'meeting_date': date_str,
                    'meeting_type': 'Board Meeting',
                    'title': pdf_path.stem,
                    'content': content,
                    'metadata': {'source': 'manual_download', 'platform': 'eboard'}
                }
                documents.append(doc)
        
        # Write to Delta Lake
        pipeline = DeltaLakePipeline()
        pipeline.write_raw_documents(documents)
        
        return documents

# Usage:
# asyncio.run(import_manual_pdfs('/path/to/downloaded/pdfs'))

Alternative: RSS Feeds

Some eBoard installations offer RSS feeds or calendar exports:

Look for RSS icon on meetings page
Look for "Subscribe" or "Export to Calendar" options
These may bypass the web interface restrictions

Future Enhancement Ideas

Browser Extension: Create a Chrome extension that scrapes while you browse
API Discovery: Research if eBoard has any undocumented APIs
Selenium Grid: Use residential proxy services for more sophisticated bot evasion
Contact District: Request bulk export of meeting documents directly

Why Automation Fails

eBoard's Incapsula protection includes:

Browser fingerprinting (detects headless browsers)
IP reputation checking
JavaScript challenges (requires full browser execution)
Session tracking (blocks rapid sequential requests)
Rate limiting per IP address

Even with Playwright running in visible mode, subsequent page navigations get blocked once the system detects automated patterns.

Recommended Approach

For comprehensive school district data:

Prioritize: Focus on city government data (working well)
Manual collection: Download key school board meetings manually
Selective import: Import only the most relevant documents
Direct contact: Reach out to school district IT for data sharing agreement

Status

✅ Tuscaloosa City Government: Automated scraping works (SuiteOne Media platform)
❌ Tuscaloosa City Schools: Manual download required (eBoard + Incapsula)
❌ Tuscaloosa County Schools: Manual download required (eBoard + Incapsula)

Issue: Incapsula Bot Protection​

Affected School Districts​

Query from Postgres​

Manual Download Steps​

1. Access Meeting Listings​

2. Download Documents​

3. Organize Downloads​

4. Import into System​

Alternative: RSS Feeds​

Future Enhancement Ideas​

Why Automation Fails​

Recommended Approach​

Status​