Gold Table Pipeline
Transform bronze/cache data into curated gold tables ready for analysis, dashboards, and AI applications.
🎉 Successfully Created!
This pipeline processes 153,452 meeting records from 18 years of civic engagement data (2006-2023) into structured gold tables.
📊 Meeting Data Pipeline Results
:::tip Success! 153,452 meeting records processed from 18 years of data (2006-2023) :::
Created Gold Tables
| Table | Size | Records | Description |
|---|---|---|---|
| meetings_calendar | 1.71 MB | 153,452 | Meeting dates, locations, jurisdictions |
| meetings_transcripts | 2.8 GB | 153,452 | Full searchable meeting text |
| meetings_demographics | 1.17 MB | 153,452 | Census data linked to meetings |
| meetings_topics | 1.04 MB | 153,452 | Extracted topics and themes |
| meetings_decisions | TBD | TBD | Policy decisions and votes |
meetings_calendar.parquet
Meeting metadata and basic information.
Columns:
meeting_id- Unique identifierjurisdiction- City/county namechannel_type- "OFFICIAL GOVT"record_index- Original record index
meetings_transcripts.parquet
Full searchable text from meeting captions/minutes.
Columns:
meeting_id- Links to calendarjurisdiction- City/countytranscript_text- Full meeting textword_count- Number of wordshas_captions- Boolean flag
Size: 2.8 GB of searchable civic engagement content!
meetings_demographics.parquet
Links meetings to jurisdiction demographic data from US Census.
Columns:
meeting_id- Links to calendarjurisdiction- City/countyacs_18_pop- Populationacs_18_median_age- Median ageacs_18_median_hh_inc- Median household incomeacs_18_median_gross_rent- Median rentacs_18_white,acs_18_black,acs_18_asian,acs_18_hispanic- Demographics
meetings_topics.parquet
Extracted topics using keyword matching.
Columns:
meeting_id- Links to calendarjurisdiction- City/countytopics- Comma-separated topic listtopic_count- Number of topics
Detected Topics:
- budget
- infrastructure
- public_safety
- health
- education
- parks
- zoning
- contracts
- ordinances
- public_comment
🏛️ Nonprofit Data Pipeline
Ready to discover and process nonprofit data from free APIs.
Planned Gold Tables
-
nonprofits_organizations.parquet
- Basic info: name, EIN, NTEE code, location
-
nonprofits_financials.parquet
- Revenue, assets, expenses from IRS Form 990
-
nonprofits_programs.parquet
- Services and programs offered
-
nonprofits_locations.parquet
- Geographic service areas
Data Sources
- ProPublica Nonprofit Explorer - IRS Form 990 data (FREE!)
- IRS Tax Exempt Org Search - Official tax-exempt status
- Every.org - Charity profiles
- Findhelp.org - Local services directory
🚀 Usage
Run Both Pipelines
cd /home/developer/projects/open-navigator
source .venv/bin/activate
python scripts/create_all_gold_tables.py
Run Only Meetings
python scripts/create_all_gold_tables.py --meetings-only
Run Only Nonprofits
# Discover nonprofits in specific states
python scripts/create_all_gold_tables.py --nonprofits-only --states AL MI
# Add more states
python scripts/create_all_gold_tables.py --nonprofits-only --states AL MI NY CA TX
Skip API Discovery
If you've already discovered nonprofits and want to regenerate gold tables:
python scripts/create_all_gold_tables.py --nonprofits-only --skip-discovery
📁 Pipeline Architecture
pipeline/
├── create_meetings_gold_tables.py # Meeting data → Gold tables
├── create_nonprofits_gold_tables.py # Nonprofit discovery → Gold tables
└── huggingface_publisher.py # Publish to HuggingFace
scripts/
└── create_all_gold_tables.py # Main orchestration
data/
├── cache/localview/ # 18 years of meeting data ✅
├── bronze/nonprofits/ # Discovered nonprofit data
└── gold/ # ⭐ CURATED GOLD TABLES
├── meetings_*.parquet # 5 meeting tables
└── nonprofits_*.parquet # 4 nonprofit tables
🔍 Use Cases
For Policy Makers
- Search 153K+ meeting transcripts for policy discussions
- Track budget decisions across jurisdictions over 18 years
- Analyze demographic context of policy decisions
For Researchers
- Text analysis of government transparency
- Topic modeling across jurisdictions
- Temporal analysis of civic engagement
For Developers
- Power search features in React app
- Feed AI/LLM applications with civic data
- Create visualizations and dashboards
For Families
- Find nonprofits by service area (food, housing, health)
- Compare financial health of organizations
- Discover programs in your community
📈 Performance
Meeting Pipeline
- Processing Time: ~2-3 minutes
- Records/Second: ~1,000-1,500
- Memory Usage: ~4-6 GB peak
- Output Size: 2.8 GB total
Nonprofit Pipeline
- API Rate Limit: 1 request/second (respectful to free APIs)
- Records/State: ~100-500 per NTEE code
- Recommended: Start with 2-5 states
- No API Key Required: All sources are free!
🔄 Data Refresh
Update Meeting Tables
python pipeline/create_meetings_gold_tables.py
Update Nonprofit Tables
python pipeline/create_nonprofits_gold_tables.py --states AL MI
🎯 Next Steps
Immediate Actions
- ✅ Run meeting pipeline (DONE!)
- ⏳ Run nonprofit pipeline for key states
- 📊 Integrate gold tables into React app
- 🔍 Add search features using transcript data
- 📈 Create visualizations
Future Enhancements
- Add NLP/ML topic extraction
- Entity recognition (people, orgs, places)
- Sentiment analysis of public comments
- Cross-reference meetings with nonprofits
- Time-series analysis tables
- Geospatial joins
🤝 Contributing
To add new gold tables:
- Create processing function in pipeline file
- Add to
create_all_gold_tables()method - Document schema and use cases
- Test with sample data
✨ Success Metrics
- ✅ 153,452 meeting records processed
- ✅ 2.8 GB of searchable transcripts
- ✅ 18 years of civic history
- ✅ 5 gold tables from meetings
- 🎯 4 nonprofit tables ready
- 🚀 100% free data sources!
📚 Learn More
Ready to discover nonprofits?
python scripts/create_all_gold_tables.py --nonprofits-only --states AL MI