Skip to main content

Gold Table Pipeline

Transform bronze/cache data into curated gold tables ready for analysis, dashboards, and AI applications.

🎉 Successfully Created!

This pipeline processes 153,452 meeting records from 18 years of civic engagement data (2006-2023) into structured gold tables.


📊 Meeting Data Pipeline Results

:::tip Success! 153,452 meeting records processed from 18 years of data (2006-2023) :::

Created Gold Tables

TableSizeRecordsDescription
meetings_calendar1.71 MB153,452Meeting dates, locations, jurisdictions
meetings_transcripts2.8 GB153,452Full searchable meeting text
meetings_demographics1.17 MB153,452Census data linked to meetings
meetings_topics1.04 MB153,452Extracted topics and themes
meetings_decisionsTBDTBDPolicy decisions and votes

meetings_calendar.parquet

Meeting metadata and basic information.

Columns:

  • meeting_id - Unique identifier
  • jurisdiction - City/county name
  • channel_type - "OFFICIAL GOVT"
  • record_index - Original record index

meetings_transcripts.parquet

Full searchable text from meeting captions/minutes.

Columns:

  • meeting_id - Links to calendar
  • jurisdiction - City/county
  • transcript_text - Full meeting text
  • word_count - Number of words
  • has_captions - Boolean flag

Size: 2.8 GB of searchable civic engagement content!

meetings_demographics.parquet

Links meetings to jurisdiction demographic data from US Census.

Columns:

  • meeting_id - Links to calendar
  • jurisdiction - City/county
  • acs_18_pop - Population
  • acs_18_median_age - Median age
  • acs_18_median_hh_inc - Median household income
  • acs_18_median_gross_rent - Median rent
  • acs_18_white, acs_18_black, acs_18_asian, acs_18_hispanic - Demographics

meetings_topics.parquet

Extracted topics using keyword matching.

Columns:

  • meeting_id - Links to calendar
  • jurisdiction - City/county
  • topics - Comma-separated topic list
  • topic_count - Number of topics

Detected Topics:

  • budget
  • infrastructure
  • public_safety
  • health
  • education
  • parks
  • zoning
  • contracts
  • ordinances
  • public_comment

🏛️ Nonprofit Data Pipeline

Ready to discover and process nonprofit data from free APIs.

Planned Gold Tables

  1. nonprofits_organizations.parquet

    • Basic info: name, EIN, NTEE code, location
  2. nonprofits_financials.parquet

    • Revenue, assets, expenses from IRS Form 990
  3. nonprofits_programs.parquet

    • Services and programs offered
  4. nonprofits_locations.parquet

    • Geographic service areas

Data Sources

  • ProPublica Nonprofit Explorer - IRS Form 990 data (FREE!)
  • IRS Tax Exempt Org Search - Official tax-exempt status
  • Every.org - Charity profiles
  • Findhelp.org - Local services directory

🚀 Usage

Run Both Pipelines

cd /home/developer/projects/open-navigator
source .venv/bin/activate
python scripts/create_all_gold_tables.py

Run Only Meetings

python scripts/create_all_gold_tables.py --meetings-only

Run Only Nonprofits

# Discover nonprofits in specific states
python scripts/create_all_gold_tables.py --nonprofits-only --states AL MI

# Add more states
python scripts/create_all_gold_tables.py --nonprofits-only --states AL MI NY CA TX

Skip API Discovery

If you've already discovered nonprofits and want to regenerate gold tables:

python scripts/create_all_gold_tables.py --nonprofits-only --skip-discovery

📁 Pipeline Architecture

pipeline/
├── create_meetings_gold_tables.py # Meeting data → Gold tables
├── create_nonprofits_gold_tables.py # Nonprofit discovery → Gold tables
└── huggingface_publisher.py # Publish to HuggingFace

scripts/
└── create_all_gold_tables.py # Main orchestration

data/
├── cache/localview/ # 18 years of meeting data ✅
├── bronze/nonprofits/ # Discovered nonprofit data
└── gold/ # ⭐ CURATED GOLD TABLES
├── meetings_*.parquet # 5 meeting tables
└── nonprofits_*.parquet # 4 nonprofit tables

🔍 Use Cases

For Policy Makers

  • Search 153K+ meeting transcripts for policy discussions
  • Track budget decisions across jurisdictions over 18 years
  • Analyze demographic context of policy decisions

For Researchers

  • Text analysis of government transparency
  • Topic modeling across jurisdictions
  • Temporal analysis of civic engagement

For Developers

  • Power search features in React app
  • Feed AI/LLM applications with civic data
  • Create visualizations and dashboards

For Families

  • Find nonprofits by service area (food, housing, health)
  • Compare financial health of organizations
  • Discover programs in your community

📈 Performance

Meeting Pipeline

  • Processing Time: ~2-3 minutes
  • Records/Second: ~1,000-1,500
  • Memory Usage: ~4-6 GB peak
  • Output Size: 2.8 GB total

Nonprofit Pipeline

  • API Rate Limit: 1 request/second (respectful to free APIs)
  • Records/State: ~100-500 per NTEE code
  • Recommended: Start with 2-5 states
  • No API Key Required: All sources are free!

🔄 Data Refresh

Update Meeting Tables

python pipeline/create_meetings_gold_tables.py

Update Nonprofit Tables

python pipeline/create_nonprofits_gold_tables.py --states AL MI

🎯 Next Steps

Immediate Actions

  1. ✅ Run meeting pipeline (DONE!)
  2. ⏳ Run nonprofit pipeline for key states
  3. 📊 Integrate gold tables into React app
  4. 🔍 Add search features using transcript data
  5. 📈 Create visualizations

Future Enhancements

  • Add NLP/ML topic extraction
  • Entity recognition (people, orgs, places)
  • Sentiment analysis of public comments
  • Cross-reference meetings with nonprofits
  • Time-series analysis tables
  • Geospatial joins

🤝 Contributing

To add new gold tables:

  1. Create processing function in pipeline file
  2. Add to create_all_gold_tables() method
  3. Document schema and use cases
  4. Test with sample data

✨ Success Metrics

  • 153,452 meeting records processed
  • 2.8 GB of searchable transcripts
  • 18 years of civic history
  • 5 gold tables from meetings
  • 🎯 4 nonprofit tables ready
  • 🚀 100% free data sources!

📚 Learn More


Ready to discover nonprofits?

python scripts/create_all_gold_tables.py --nonprofits-only --states AL MI