Data Model & Entity Relationship Diagram
Comprehensive overview of all data entities extracted, processed, and uploaded to HuggingFace datasets.
� HuggingFace Dataset Structure
Current Datasets Being Uploaded
open-navigator-data/
├── jurisdictions/ # 🏛️ Core jurisdiction data
│ ├── cities # 19,000+ incorporated places
│ ├── counties # 3,144 U.S. counties
│ ├── states # 50 states + DC, territories
│ ├── school_districts # 13,000+ districts (NCES data)
│ └── census_data # Basic FIPS codes & census year reference
│
├── demographics/ # 👥 Comprehensive demographic data (U.S. Census)
│ ├── population # Total population, age distribution
│ ├── race_ethnicity # Race and ethnicity breakdowns
│ ├── income_economics # Income, poverty, SNAP benefits
│ ├── education # Educational attainment levels
│ ├── housing # Housing units, ownership, values
│ ├── employment # Unemployment, labor force participation
│ └── health_insurance # Insurance coverage (uninsured, Medicaid, Medicare)
│
├── social/ # 📱 Social media presence
│ ├── twitter # Twitter/X accounts
│ ├── facebook # Facebook pages
│ ├── instagram # Instagram accounts
│ └── linkedin # LinkedIn pages
│
├── videos/ # 📹 Video & streaming platforms
│ ├── youtube_channels # Government YouTube channels
│ ├── vimeo # Vimeo accounts
│ └── livestreams # Live meeting streams
│
├── platforms/ # 🖥️ Meeting management systems
│ ├── legistar # Legistar URLs
│ ├── granicus # Granicus links
│ ├── suiteone # SuiteOne systems
│ └── civicplus # CivicPlus platforms
│
├── domains/ # 🌐 Official government websites
│ ├── gsa_domains # .gov domain registry
│ ├── municipal_websites # City/county websites
│ └── state_portals # State government sites
│
├── events/ # 📋 Meetings, Hearings & Public Events
│ ├── events # Government meetings, public hearings, community forums, town halls
│ ├── event_participants # Officials and organizations participating in events
│ ├── event_agenda_items # Individual agenda topics discussed
│ ├── event_documents # Agendas, minutes, presentations, handouts
│ ├── event_media # Video recordings, livestreams, audio files
│ └── event_bills # Bills discussed or considered at meetings
│
├── contacts/ # 👥 All People - Officials, Candidates, Donors, Constituents
│ ├── officials # Elected and appointed officials (mayors, council members, legislators)
│ ├── official_roles # Current and historical positions held
│ ├── official_contacts # Email, phone, office addresses
│ ├── official_identifiers # External IDs (Twitter, OpenStates, Ballotpedia)
│ ├── official_links # Websites, social media profiles
│ ├── candidates # Political candidates (House, Senate, President - FEC data)
│ ├── nonprofit_donors # Nonprofit leadership political giving (FEC analysis)
│ └── constituents # Donors, volunteers, members, beneficiaries (all people)
│
├── nonprofits/ # 🏢 Nonprofit organizations & churches
│ ├── irs_eobmf # IRS EO-BMF bulk data (1.9M+ organizations) - PRIMARY SOURCE
│ ├── irs_nonprofits # Legacy IRS 990 data (deprecated - use irs_eobmf)
│ ├── propublica_data # ProPublica API (financials, NTEE codes)
│ ├── everyorg_data # Every.org API (missions, causes, logos)
│ ├── nonprofit_990s # Detailed Form 990 financials (yearly filings)
│ ├── congregations # 🛐 Church & congregation data (ARDA, HIFLD, NCS)
│ ├── constituents # 🤝 Donors, volunteers, members, beneficiaries (Microsoft CDM)
│ ├── donations # 💝 Financial contributions and in-kind gifts (Microsoft CDM)
│ ├── campaigns # 📣 Fundraising campaigns and appeals (Microsoft CDM)
│ ├── memberships # 🎫 Member enrollment and renewals (Microsoft CDM)
│ ├── volunteer_activities # 🙋 Volunteer hours and activities (Microsoft CDM)
│ ├── program_delivery # 🎯 Programs and services delivered (Microsoft CDM)
│ └── program_outcomes # 📊 Impact metrics and outcome measurements (Microsoft CDM)
│
├── grants/ # 💵 Grant funding transactions
│ ├── nonprofit_grants # Grants to nonprofits (from 990 Schedule I)
│ ├── government_grants # Government grants to orgs/jurisdictions
│ ├── foundation_grants # Private foundation grants
│ └── federal_grants # Federal funding programs
│
├── causes/ # 🎯 Cause & category taxonomy
│ ├── ntee_codes # IRS NTEE classification system
│ └── everyorg_causes # Every.org cause tags
│
├── budgets/ # 💰 Government budgets & finances
│ ├── city_budgets # City/municipal budgets & spending
│ ├── county_budgets # County budgets & expenditures
│ ├── state_budgets # State government finances
│ ├── school_budgets # School district finances (NCES F-33)
│ ├── bond_debt # Municipal bonds & debt obligations
│ ├── budget_line_items # 📊 Detailed budget categories & line items
│ └── budget_deltas # 🔍 Budget-to-Minutes Delta analysis (political economy)
│
├── decisions/ # ⚖️ Policy decisions & political economy analysis
│ ├── policy_decisions # Extracted decisions from meetings
│ ├── decision_frames # Frame analysis (rhetoric patterns)
│ ├── decision_options # Options considered & rejected
│ ├── decision_tradeoffs # Tradeoffs discussed (cost vs benefit, etc.)
│ ├── stakeholder_positions # 👥 Who spoke for/against (Influence Radar)
│ ├── decision_votes # Detailed vote records per decision
│ ├── deferral_patterns # 📅 Stalling detection (same topic, multiple deferrals)
│ ├── deferral_instances # Individual tabling events linked to patterns
│ ├── keyword_density # Quantitative indicators (grant/taxpayer/emergency)
│ └── deferral_patterns # Tabled/delayed decisions (temporal analysis)
│
├── elections/ # 📅 Election cycles & temporal analysis
│ ├── election_cycles # Election dates & periods
│ └── election_influences # Pre/post-election decision patterns
│
├── campaigns/ # 💰 Political campaign finance (FEC data)
│ ├── committees # PACs, Super PACs, campaign committees
│ └── contributions # Individual political contributions $200+
│
├── civic/ # 🗳️ Google Civic & Wikidata
│ ├── civic_divisions # OCD divisions
│ ├── representatives # From Google Civic API
│ ├── wikidata_entities # Structured entities
│ └── dbpedia_resources # Wikipedia infobox data
│
├── ballots/ # 🗳️ Ballot initiatives & referendums
│ ├── state_measures # State propositions (fluoridation votes!)
│ ├── local_measures # City/county ballot questions
│ └── election_results # Historical voting outcomes
│
├── bills/ # 📜 Legislation & Lawmaking
│ ├── bills # Bills and resolutions (1.5M+ from all 50 states)
│ ├── bill_actions # Bill history (introduced, committee, floor vote, signed)
│ ├── bill_sponsorships # Primary sponsors and co-sponsors
│ ├── bill_abstracts # Bill summaries and descriptions
│ ├── bill_versions # Different versions of bill text (introduced, amended, enrolled)
│ ├── bill_version_links # Links to PDF/HTML bill text
│ ├── bill_documents # Supporting documents (fiscal notes, amendments, analysis)
│ ├── bill_document_links # Links to supporting documents
│ ├── bill_subjects # Subject/topic tags per bill
│ ├── legislative_sessions # Session identifiers (2023 Regular, 2024 Special, etc.)
│ ├── vote_events # Roll call votes on bills
│ ├── vote_counts # Vote tallies (yes, no, abstain, absent)
│ └── individual_votes # Individual legislator votes (yes/no/abstain)
│
├── topics/ # 🎯 Advocacy causes & campaigns
│ ├── topic_definitions # Validated survey questions from Roper Center
│ ├── survey_questions # Public opinion question wording library
│ ├── jurisdiction_topics # What each city is discussing
│ └── advocacy_alerts # Opportunities for engagement
│
├── surveys/ # 📊 Public opinion research & polling data
│ ├── survey_providers # Polling organizations (Gallup, Pew, Roper, etc.)
│ ├── survey_studies # Individual survey studies/waves
│ ├── survey_variables # Questions/items asked in surveys
│ ├── survey_responses # Aggregate and individual response data
│ ├── ipoll_metadata # Roper iPoll catalog metadata
│ └── survey_crosstabs # Breakdowns by demographics, geography
│
├── factchecks/ # ✅ Fact-checking & claim verification
│ ├── claim_reviews # Google Fact Check API (ClaimReview schema)
│ ├── politifact # PolitiFact Truth-O-Meter ratings
│ ├── factcheck_org # FactCheck.org verified claims
│ └── verified_claims # Aggregated fact-check database
│
├── civic_tech/ # 💻 Open source projects & hackathons
│ ├── github_repositories # Civic tech projects (GitHub API)
│ ├── project_metadata # Code for America, USDR, Civic Tech Field Guide
│ ├── contributors # Maintainers and core contributors
│ ├── project_issues # Good first issues, contribution opportunities
│ ├── hackathons # Civic hackathon events
│ ├── hackathon_projects # Projects built at hackathons
│ ├── brigade_chapters # Code for America brigade locations
│ └── project_funding # GitHub Sponsors, grants, OpenCollective
│
├── community_solutions/ # 🌟 Community engagement & use cases
│ ├── engagement_spectrum # Spectrum of Community Engagement to Ownership
│ ├── use_case_catalog # Harvard Data-Smart City Solutions examples
│ ├── data_academies # Brookings Institution training programs
│ ├── success_stories # Real-world outcomes (Providence, Portland, Tempe)
│ ├── metric_templates # Pre-built analytics for common challenges
│ └── workflow_guides # Step-by-step community data workflows
│
├── analytics/ # 📊 Time dimensions & metric views
│ ├── date_dimension # Date/time reference table (YYYY-MM-DD, day_of_week, fiscal_year)
│ ├── temporal_relationships # Time-series joins for all entities
│ ├── metric_views # Pre-computed analytics (advocacy, spending, nonprofit impact)
│ ├── aggregated_stats # Monthly/quarterly/yearly rollups
│ └── dashboard_metrics # Real-time dashboard data feeds
│
├── standards/ # 🌐 Schema.org, Popolo, CEDS, IATI exports
│ ├── schema_org_jsonld # JSON-LD exports (Event, Person, Organization, Legislation, ClaimReview)
│ ├── popolo_exports # Popolo-compliant JSON (Person, Organization, Membership, VoteEvent)
│ ├── ceds_aligned # CEDS-compliant education data (Element IDs, Option Sets)
│ ├── ocd_divisions # Open Civic Data division IDs
│ ├── iati_activities # IATI Standard v2.03 XML (programs, grants, humanitarian aid)
│ └── rdf_triples # RDF/Turtle semantic web exports
│
├── vocabulary/ # 🔧 OMOP-inspired concept & terminology (SYSTEM-INTERNAL)
│ ├── concept # Master concept table (cities, causes, officials)
│ ├── vocabulary # Vocabulary sources (OCD_ID, IRS_NTEE, US_Census)
│ ├── concept_class # Concept classifications (City, County, 501c3, Mayor)
│ ├── concept_relationship # Relationships (City → County, Topic → Legislation)
│ └── domain # Domain groupings (Jurisdiction, Nonprofit, Policy)
│
└── exports/ # 📤 API-ready formatted exports
├── csv_bulk # CSV downloads for all datasets
├── json_api # REST API JSON responses
├── graphql_schema # GraphQL schema definitions
└── parquet_optimized # Compressed Parquet (default format)
Parquet File Naming Convention
Rule: Use underscores (_) consistently, NOT hyphens (-)
Format: {category}_{subcategory}.parquet
Examples:
✅ CORRECT (using underscores):
jurisdictions_cities.parquet
jurisdictions_counties.parquet
jurisdictions_states.parquet
jurisdictions_school_districts.parquet
social_twitter.parquet
social_facebook.parquet
videos_youtube_channels.parquet
events_events.parquet
contacts_officials.parquet
bills_bills.parquet
nonprofits_organizations.parquet
nonprofits_financials.parquet
nonprofits_programs.parquet
nonprofits_locations.parquet
nonprofits_irs_eobmf.parquet
nonprofits_constituents.parquet
nonprofits_donations.parquet
nonprofits_campaigns.parquet # Nonprofit fundraising campaigns (NOT political)
contacts_candidates.parquet # Political candidates (FEC)
contacts_nonprofit_donors.parquet # Nonprofit leadership political giving (FEC analysis)
contacts_constituents.parquet # Donors, volunteers, members, beneficiaries
campaigns_committees.parquet # Political committees/PACs (FEC)
campaigns_contributions.parquet # Political contributions (FEC)
nonprofits_memberships.parquet
nonprofits_volunteer_activities.parquet
nonprofits_program_delivery.parquet
nonprofits_program_outcomes.parquet
grants_federal_grants.parquet
contacts_officials.parquet
contacts_official_roles.parquet
contacts_official_contacts.parquet
contacts_official_identifiers.parquet
contacts_official_links.parquet
contacts_candidates.parquet
contacts_nonprofit_donors.parquet
contacts_constituents.parquet
bills_bills.parquet
bills_bill_actions.parquet
bills_bill_sponsorships.parquet
bills_bill_abstracts.parquet
bills_bill_versions.parquet
bills_bill_version_links.parquet
bills_bill_documents.parquet
bills_bill_document_links.parquet
bills_bill_subjects.parquet
bills_legislative_sessions.parquet
bills_vote_events.parquet
bills_vote_counts.parquet
bills_individual_votes.parquet
events.parquet
event_participants.parquet
event_agenda_items.parquet
event_documents.parquet
event_media.parquet
event_bills.parquet
budgets_city_budgets.parquet
surveys_national_polls.parquet
surveys_roper_questions.parquet
surveys_survey_providers.parquet
surveys_survey_studies.parquet
surveys_survey_variables.parquet
surveys_survey_responses.parquet
surveys_ipoll_metadata.parquet
factchecks_claim_reviews.parquet
factchecks_politifact.parquet
analytics_date_dimension.parquet
analytics_metric_views.parquet
analytics_temporal_relationships.parquet
standards_schema_org_jsonld.parquet
standards_popolo_exports.parquet
standards_ceds_aligned.parquet
standards_iati_activities.parquet
vocabulary_concept.parquet
vocabulary_vocabulary.parquet
vocabulary_concept_class.parquet
vocabulary_concept_relationship.parquet
❌ INCORRECT (using hyphens):
jurisdictions-cities.parquet
social-twitter.parquet
meetings-government-meetings.parquet
surveys-national-polls.parquet
factchecks-claim-reviews.parquet
analytics-date-dimension.parquet
standards-schema-org.parquet
Why Underscores?
- ✅ Python-friendly variable names (can use
data.jurisdictions_cities) - ✅ SQL-compatible column names
- ✅ Consistent with folder structure (
school_districts, notschool-districts) - ✅ Better for programmatic access
- ✅ Avoids shell escaping issues
Repository Name Exception:
- HuggingFace repo:
CommunityOne/open-navigator-data(hyphen is fine for URLs) - File names inside repo: Use underscores (
jurisdictions_cities.parquet)
🔄 Data Extraction Pipeline
Phase 1: Discovery (Bronze Layer)
- Census Data → Jurisdictions list
- GSA Domains → Government websites
- NCES → School districts with financial data (F-33 forms)
- IRS EO-BMF → ALL 1.9M+ U.S. tax-exempt organizations (PRIMARY SOURCE)
- IRS TEOS → Legacy nonprofit EINs (deprecated - use IRS EO-BMF)
- Census of Governments → Municipal budgets & finances
- URL Discovery → Meeting platforms, YouTube, budget PDFs
- Social Media → Twitter, Facebook accounts
Phase 2: Enrichment (Silver Layer)
- IRS EO-BMF → Complete nonprofit registry with 28 data fields per organization
- ProPublica Nonprofit Explorer → Enhanced financial data, detailed 990 filings
- Every.org API → Nonprofit causes, missions, logos
- ARDA (Association of Religion Data Archives) → Congregation characteristics, health ministries
- HIFLD Places of Worship → Geospatial church locations (350K+ congregations)
- National Congregations Study → Social service provision patterns
- NCES F-33 Finance Survey → School district budgets, per-pupil spending
- Census Annual Survey → State/local government finances
- Municipal Securities Rulemaking Board (EMMA) → Bond debt data
- YouTube API → Channel statistics
- Open States PostgreSQL Database → Complete legislative data (~10 GB monthly dump)
- 8,600+ people (legislators, governors, mayors) across all 50 states + DC + Puerto Rico
- 1.5M+ bills with full text and history
- 13M+ bill actions (introduced, committee, amendments, floor votes, signed)
- 7.2M+ sponsorships (primary sponsors and co-sponsors)
- 3.5M+ bill versions (as introduced, committee substitute, enrolled, enacted)
- 180K+ events (legislative meetings, hearings, committee sessions)
- 835K+ event participants (who spoke, testified, or attended)
- 524K+ agenda items from meetings
- Vote events with individual legislator positions
- Organizations (legislative bodies, committees)
- Jurisdictions (states, territories)
- Updated monthly from https://data.openstates.org/postgres/monthly/
- OpenStates People Repository → Current legislator contact info
- GitHub repo: https://github.com/openstates/people
- YAML files with email, phone, district offices
- Social media profiles and website links
- Updated daily via automated scrapers
- Wikidata SPARQL → Entity relationships
- DBpedia → Wikipedia structured data
- Google Civic → Representatives
- OpenFEC API → Political contributions, candidates, committees (campaign finance)
- GitHub API → Civic tech projects, contributors, issues
- Civic Tech Field Guide → Curated project taxonomy
- Code for America → Brigade projects and hackathons
- Digital Public Goods Alliance → DPG-certified open source projects
Phase 3: Processing (Gold Layer)
- Meeting Extraction → Agenda/minutes text
- Video Transcripts → YouTube captions
- Document Analysis → Keyword detection
- Relationship Mapping → Entity connections
- Oral Health Filtering → Topic classification
- Temporal Indexing → Date dimension table, time-series relationships
- Metric View Creation → Pre-computed analytics (advocacy activity, government spending, nonprofit impact)
- Schema.org JSON-LD → Structured data exports (Event, Person, Organization, Legislation, ClaimReview)
- Popolo Compliance → Open government standard exports (Person, Organization, Membership, VoteEvent)
- CEDS Alignment → Education data mapping to NCES Element IDs and Option Sets
New Dataset Categories Explained
📊 Analytics Datasets
Purpose: Enable time-series analysis, trend detection, and dashboard metrics without complex SQL queries.
| Dataset | Description | Refresh Frequency |
|---|---|---|
analytics_date_dimension | Calendar reference table with fiscal years, quarters, day-of-week, holidays | Static (updated annually) |
analytics_temporal_relationships | Pre-joined date keys for all time-based entities (meetings, votes, budgets, filings) | Daily |
analytics_metric_views | Pre-computed analytics like advocacy_activity, government_spending, nonprofit_impact | Hourly |
analytics_aggregated_stats | Monthly/quarterly/yearly rollups (meeting counts, budget totals, grant sums) | Daily |
analytics_dashboard_metrics | Real-time feeds for dashboards (active meetings today, trending topics, hot ballot measures) | Every 5 minutes |
Example Use Case:
-- Instead of complex joins, use metric view:
SELECT * FROM analytics_metric_views
WHERE metric_name = 'advocacy_activity'
AND jurisdiction_id = 'ocd-division/country:us/state:al/place:birmingham'
AND date_period = '2024-Q1';
🌐 Standards-Compliant Exports
Purpose: Maximum interoperability with civic tech platforms, search engines, and semantic web tools.
| Dataset | Standard | Use Case | Consumers |
|---|---|---|---|
standards_schema_org_jsonld | Schema.org JSON-LD | Google Search rich results, voice assistants | Google, Bing, Alexa, Siri |
standards_popolo_exports | Popolo Project | Civic tech platform integration | mySociety, OpenNorth, Sunlight Foundation |
standards_ceds_aligned | Common Education Data Standards | Education data exchange, NCES reporting | State education depts, Ed-Fi, IMS Global |
standards_ocd_divisions | Open Civic Data IDs | Cross-platform jurisdiction referencing | Google Civic, Ballotpedia, Vote Smart |
standards_rdf_triples | RDF/Turtle | Linked open data, knowledge graphs | DBpedia, Wikidata, SPARQL endpoints |
Example Schema.org Export:
{
"@context": "https://schema.org",
"@type": "GovernmentOrganization",
"name": "Birmingham City Council",
"address": {
"@type": "PostalAddress",
"addressLocality": "Birmingham",
"addressRegion": "AL"
},
"event": [{
"@type": "Event",
"name": "Regular City Council Meeting",
"startDate": "2024-01-15T18:00:00-06:00"
}]
}
✅ Fact-Checking Datasets
Purpose: Verify claims made in meetings, legislation, and political speech.
| Dataset | Source | Fields | Update Frequency |
|---|---|---|---|
factchecks_claim_reviews | Google Fact Check API | claimReviewed, reviewRating, author, datePublished | Daily |
factchecks_politifact | PolitiFact web scraping | claim, ruling, truth_o_meter, context | Daily |
factchecks_factcheck_org | FactCheck.org API/scraping | claim, verdict, analysis, sources | Daily |
factchecks_verified_claims | Aggregated + deduplicated | claim_text, consensus_rating, verification_sources | Daily |
Integration with Meetings:
- Cross-reference meeting transcripts with verified claims
- Flag unverified statements in legislative debates
- Track politician accuracy scores over time
�📊 Complete Data Model (ERD)
Interactive Entity Relationship Diagram
⚖️ Political Economy Analysis Framework
The ERD includes specialized entities for political economy analysis - understanding the "WHY" behind government decisions, not just the "WHAT". These entities support a 4-step advocacy framework to expose gaps between rhetoric and reality:
The 4-Step Framework for Effective Change
Step 1: Rhetoric Gap - Frame Analysis
Goal: Establish they ALREADY agree it's important (stop the "need" debate)
ERD Support:
- DECISION_FRAME: Tracks how decisions are framed ("public health" vs "fiscal responsibility" vs "equity")
- KEYWORD_DENSITY: Measures rhetoric patterns ("priority", "essential", "critical" per 1000 words)
- POLICY_DECISION.primary_frame: Primary framing language used
Analysis Output:
Frame Distribution:
12x public health
8x fiscal responsibility
5x equity/access
3x economic development
→ They SAY oral health is a "priority" - hold them to it!
Step 2: Displacement Matrix - Budget-to-Minutes Delta
Goal: Show they HAD the money (stop the "budget constraint" excuse)
ERD Support:
- BUDGET_LINE_ITEM: Detailed budget categories with year-over-year changes
- BUDGET_DELTA: Compares meeting rhetoric to actual funding changes
- BUDGET_DELTA.delta_type: Classifies as "Expansion", "Lip Service", or "Hidden Priority"
- BUDGET_DELTA.delta_score: Quantifies rhetoric vs reality gap (-1 to +1)
Analysis Output:
🎭 Lip Service Detected:
• School dental program: -$50,000 decrease
Mentioned 12x in meetings as "critical for children"
Logic: PERFORMATIVE POLITICS
💰 Hidden Priority (Where the money REALLY went):
• IT Infrastructure: +$200,000 increase
Only mentioned 1x in meetings
Logic: Bureaucratic inertia - avoiding scrutiny
Step 3: Influence Radar - Stakeholder Analysis
Goal: Name who's blocking it (force personal accountability)
ERD Support:
- STAKEHOLDER_POSITION: Who spoke for/against with their arguments
- STAKEHOLDER_POSITION.speaking_order: Track who speaks when (early speakers often most influential)
- DECISION_VOTE: Individual vote records with stated reasons
- DECISION_VOTE.switched_position: Flag when officials change their stance
Analysis Output:
👥 Blocking Coalition:
• Taxpayer Association (opponent) - spoke 1st
Argument: "Cost concerns in tight budget"
Counter: School nurses budget UP $300K same meeting
• Council Member Smith (voted NO)
Stated reason: "Need more study"
Pattern: Has voted NO on 3 oral health items since 2022
Step 4: Deferral Pattern - Temporal Voting Analysis
Goal: Show they're stalling, not studying (expose the tactic)
ERD Support:
- DEFERRAL_PATTERN: Tracks same topic across multiple meetings
first_mentioned: When decision was first introducedlast_discussed: Most recent tabling datetotal_deferrals: How many times tabled/postponedmonths_in_limbo: Time elapsed since first mentionpattern_type: "Rationale of Attrition" / "Sincere Analysis" / "Political Timing"strategic_inference: Inferred reason for delaynext_review_date: When scheduled to revisit (if stated)
- DEFERRAL_INSTANCE: Individual tabling events with dates
deferral_date: When it was tabledstated_reason: Official justification givenspeaker: Who gave the reasonmonths_since_first: Time elapsedreason_changed: Flag for shifting justifications
- POLICY_DECISION.outcome: Captures "tabled/deferred/postponed"
- DECISION_OPTION.rejection_reason: Stated excuses for delay
- ELECTION_CYCLE: Links decisions to election timelines
- ELECTION_CYCLE.pre_election_spike_detected: Flag incumbent protection patterns
Complete Date Tracking:
-- All dates needed to track stalling:
MEETING.meeting_date -- When meeting occurred
POLICY_DECISION.meeting_date -- When decision was discussed
DEFERRAL_PATTERN.first_mentioned -- First introduction date
DEFERRAL_PATTERN.last_discussed -- Most recent tabling
DEFERRAL_INSTANCE.deferral_date -- Each individual deferral
ELECTION_CYCLE.election_date -- Election timing context
DECISION_VOTE.days_since_election -- Temporal political context
Analysis Output:
📅 Stalling Pattern DETECTED:
• Topic: "Fluoridation proposal"
• First mentioned: 2020-03-15 (4 years, 1 month ago)
• Total deferrals: 4 times
• Pattern type: Rationale of Attrition
📊 Timeline of Shifting Justifications:
- 2020-03: "Need more study" (12mo before election)
- 2020-09: "Budget constraints" (6mo before election)
- 2022-01: "Need public input" (new council members)
- 2024-05: "Awaiting staff report" (still pending)
🔍 Strategic Inference:
"They're NOT studying - they're AVOIDING! The board isn't
debating the merit; they're waiting for the advocate's
momentum to fade before the next election cycle."
🚨 Discomfort Score: 10/10 (Extremely politically sensitive)
Quantitative "Why" Indicators
Additional metrics to infer governance logic:
| Entity | Metric | Reveals |
|---|---|---|
| POLICY_DECISION.contention_score | Ratio of dissent (0-100) | Political sensitivity of topic |
| KEYWORD_DENSITY | "grant" vs "taxpayer" frequency | Decision driver: outside funding vs local demand |
| KEYWORD_DENSITY | "emergency" occurrence | Reactive vs planned governance |
| ELECTION_CYCLE.avg_project_cost_before | Spending spikes pre-election | Incumbency protection tactics |
| BUDGET_DELTA.underlying_logic | Genuine vs Performative vs Bureaucratic | Real priorities revealed |
Implementation Files
These analyses are fully implemented in the codebase:
extraction/decision_analyzer.py- Frame analysis, stakeholder extractionextraction/budget_analyzer.py- Budget delta calculation, opportunity cost mappingextraction/temporal_analyzer.py- Election cycle analysis, deferral patternsexamples/tuscaloosa_political_economy.py- Complete end-to-end analysis
See Political Economy Analysis Guide for detailed implementation status and usage examples.
🌐 Data Standards & Interoperability
Popolo Project Alignment
Our data model follows the Popolo Project specification for representing people, organizations, and elected positions. Popolo is an international open government data standard adopted by 30+ civic tech organizations worldwide including mySociety, Sunlight Foundation, OpenNorth, and Civic Commons.
Popolo Class Mappings
| Popolo Class | Our Entity | Description | Key Fields |
|---|---|---|---|
| Person | LEADER | Elected officials, government employees | full_name, email, phone, photo_url |
| Organization | ORGANIZATION | Nonprofits, government agencies, companies | name, org_type, address, website |
| Membership | LEADER ↔ ORGANIZATION | Relationship between people and organizations | leader_id, jurisdiction_id, term_start, term_end |
| Post | LEADER.position_type | Positions within organizations | title, office, position_type |
| Contact Detail | Embedded fields | Communication methods | email, phone, website in multiple entities |
| Motion | AGENDA, LEGISLATION | Formal proposals for decision | title, description, status |
| Vote Event | VOTE | Voting on motions/bills | vote_date, item_description |
| Count | VOTE, LEGISLATION | Vote tallies | vote_yes, vote_no, vote_value |
| Area | JURISDICTION | Geographic/political boundaries | jurisdiction_type, state_code, county_name |
| Event | MEETING | Gatherings with agendas | meeting_date, meeting_type, body_name |
| Speech | MINUTES, VIDEO | Spoken statements | transcript_text, summary_text |
Schema.org Type Mappings
Our entities map to Schema.org types for SEO-optimized structured data and semantic web compatibility:
| Our Entity | Schema.org Type | Properties | JSON-LD Export |
|---|---|---|---|
| JURISDICTION | AdministrativeArea | name, address, geo, telephone, url | ✅ City/county pages |
| MEETING | Event | name, startDate, endDate, location, organizer | ✅ Google Calendar rich results |
| LEADER | Person + GovernmentOfficial | name, email, telephone, jobTitle | ✅ Official profiles |
| ORGANIZATION | Organization + NGO | name, address, telephone, foundingDate | ✅ Nonprofit listings |
| LEGISLATION | Legislation | name, legislationDate, legislationPassedBy | ✅ Bill tracking |
| BALLOT_MEASURE | Legislation | name, datePosted, legislationChanges | ✅ Ballot guides |
| VOTE | VoteAction | agent, candidate, actionOption | ✅ Voting records |
| FACT_CHECK | ClaimReview | claimReviewed, reviewRating, author | ✅ Google Fact Check Explorer |
| SCHOOL_DISTRICT | EducationalOrganization | name, numberOfStudents, address | ✅ School district info |
| VIDEO | VideoObject | name, description, uploadDate, duration | ✅ YouTube integration |
| DOCUMENT | DigitalDocument | name, fileFormat, datePublished | ✅ Document library |
| Microsoft CDM Nonprofit Entities | |||
| CONSTITUENT | Person | name, email, telephone, address | ✅ Donor/volunteer profiles |
| DONATION | DonateAction | agent (Person), recipient (Organization), price | ✅ Donation receipts |
| CAMPAIGN | FundingScheme | name, startDate, endDate, url | ✅ Fundraising campaigns |
| MEMBERSHIP | ProgramMembership | member (Person), hostingOrganization, membershipNumber | ✅ Member cards |
| VOLUNTEER_ACTIVITY | VolunteerAction | agent (Person), startTime, endTime, location | ✅ Volunteer tracking |
| PROGRAM_DELIVERY | Service | name, provider, serviceType, areaServed | ✅ Program catalog |
| PROGRAM_OUTCOME | Observation | measurementTechnique, measuredValue, observationDate | ✅ Impact reporting |
Example: Meeting as Schema.org Event
{
"@context": "https://schema.org",
"@type": "Event",
"@id": "https://www.communityone.com/meetings/city-council-2024-01-15",
"name": "Birmingham City Council Regular Meeting",
"description": "Monthly city council meeting covering budget, zoning, and public health initiatives",
"startDate": "2024-01-15T18:00:00-06:00",
"endDate": "2024-01-15T20:30:00-06:00",
"eventStatus": "https://schema.org/EventScheduled",
"eventAttendanceMode": "https://schema.org/MixedEventAttendanceMode",
"location": {
"@type": "Place",
"name": "Birmingham City Hall",
"address": {
"@type": "PostalAddress",
"streetAddress": "710 N 20th St",
"addressLocality": "Birmingham",
"addressRegion": "AL",
"postalCode": "35203"
}
},
"organizer": {
"@type": "GovernmentOrganization",
"name": "Birmingham City Council",
"url": "https://www.birminghamal.gov/council/"
},
"recordedIn": {
"@type": "VideoObject",
"name": "City Council Meeting Recording",
"uploadDate": "2024-01-16",
"duration": "PT2H30M",
"thumbnailUrl": "https://example.com/thumbnail.jpg",
"contentUrl": "https://youtube.com/watch?v=example"
},
"subEvent": [
{
"@type": "Event",
"name": "Public Comment Period",
"startDate": "2024-01-15T18:15:00-06:00"
}
]
}
Common Education Data Standards (CEDS) Alignment
Our SCHOOL_DISTRICT entity follows CEDS specifications for education data interoperability:
| Our Field | CEDS Element | Element ID | NCES Alignment |
|---|---|---|---|
nces_id | LEA Identifier (NCES) | 000827 | CCD LEA ID |
district_name | Name of Institution | 000168 | Official district name |
district_type | LEA Type | 000108 | Regular/Specialized/Service Agency |
total_students | Student Count | 001475 | Fall enrollment |
total_schools | Number of Schools | 000856 | Operational schools count |
total_revenue | Total Revenue | 000612 | F-33 Survey Line A09 |
total_expenditures | Total Expenditures | 000611 | F-33 Survey Line B13 |
per_pupil_spending | Expenditure per Student | 000613 | Total exp / enrollment |
federal_revenue | Federal Revenue | 000614 | ESEA Title I, IDEA |
state_revenue | State Revenue | 000615 | State aid formulas |
local_revenue | Local Revenue | 000616 | Property tax, bonds |
superintendent | Chief Administrator | 000240 | District superintendent |
school_year | School Year | 000243 | YYYY-YYYY format |
CEDS Option Sets:
- LEA Type (000108): Regular local school district, Specialized (charter/magnet), Supervisory Union, Service Agency, State/Federal Agency
- Operational Status (000533): Open, Closed, New, Changed Agency, Temporarily Closed
- Locale Type (001315): City (Large/Midsize/Small), Suburb, Town, Rural (NCES Urban-centric codes)
Benefits:
- ✅ Compatible with NCES Common Core of Data (CCD) and F-33 Finance Survey
- ✅ Aligns with Ed-Fi Alliance, IMS Global, and SIF Association standards
- ✅ Supports federal reporting for ESSA, Title I, IDEA compliance
Microsoft Common Data Model for Nonprofits
Our nonprofit constituent management entities follow Microsoft's Common Data Model for Nonprofits, enabling seamless integration with Dynamics 365 and Power Platform:
| Our Entity | Microsoft CDM Entity | Description | Key Relationships |
|---|---|---|---|
| CONSTITUENT | Constituent | Donors, volunteers, members, beneficiaries | → DONATION, MEMBERSHIP, VOLUNTEER_ACTIVITY |
| DONATION | Donation | Financial contributions and in-kind gifts | ← CONSTITUENT, → CAMPAIGN, → DESIGNATION |
| CAMPAIGN | Campaign | Fundraising campaigns and appeals | → DONATION |
| DESIGNATION | Designation | Fund allocation (programs, unrestricted, endowment) | ← DONATION |
| MEMBERSHIP | Membership | Member enrollment and renewals | ← CONSTITUENT, ← ORGANIZATION |
| VOLUNTEER_ACTIVITY | Volunteer Preference | Volunteer activities and hours | ← CONSTITUENT |
| PROGRAM_DELIVERY | Delivery Framework | Programs and services delivered | ← ORGANIZATION, → PROGRAM_OUTCOME |
| PROGRAM_OUTCOME | Objective | Measurable impact and KPIs | ← PROGRAM_DELIVERY |
Microsoft CDM Core Patterns:
- Constituent-Centric Design: All engagement activities (donations, volunteering, membership) link to CONSTITUENT
- Designation-Based Accounting: Donations are allocated to specific designations (funds, programs, campaigns)
- Campaign Tracking: Multi-channel fundraising campaigns track goals, raised amounts, donor counts
- Outcome Measurement: Programs track objectives with target vs. actual metrics
- Temporal Tracking: Start/end dates on memberships, campaigns, and programs for lifecycle management
Integration Points:
| Microsoft Product | Integration Type | Use Case |
|---|---|---|
| Dynamics 365 Nonprofit | Native CDM compatibility | CRM for constituent relationship management |
| Power BI | Direct data connection | Fundraising dashboards, donor analytics |
| Power Apps | Low-code app builder | Volunteer management apps, event registration |
| Power Automate | Workflow automation | Donation receipts, membership renewals |
| Azure Synapse | Cloud analytics | Large-scale constituent analytics |
Example: Constituent-Donation Relationship
-- Find top 10 donors by lifetime giving
SELECT
c.constituent_id,
c.first_name,
c.last_name,
c.email,
c.lifetime_giving_total,
COUNT(d.donation_id) as donation_count,
AVG(d.amount) as avg_donation,
MAX(d.donation_date) as last_donation_date
FROM CONSTITUENT c
JOIN DONATION d ON c.constituent_id = d.constituent_id
WHERE c.constituent_type = 'Donor'
GROUP BY c.constituent_id, c.first_name, c.last_name, c.email, c.lifetime_giving_total
ORDER BY c.lifetime_giving_total DESC
LIMIT 10;
Benefits:
- ✅ Microsoft Ecosystem: Native compatibility with Dynamics 365, Power Platform, Azure
- ✅ Industry Standard: Used by large nonprofits (United Way, Boys & Girls Clubs, etc.)
- ✅ Grant Reporting: Built-in support for outcome tracking and funder reporting
- ✅ LYBNT Analysis: "Last Year But Not This Year" donor reactivation queries
- ✅ Constituent 360: Unified view of all engagement touchpoints
Underlying Standards
Popolo builds upon W3C, IETF, and DCMI specifications for maximum interoperability:
| Standard | Use Case | Example in Our Model |
|---|---|---|
| FOAF (Friend of a Friend) | Social network, people relationships | LEADER connections, ORGANIZATION networks |
| vCard (IETF RFC 6350) | Contact information | email, phone, address fields across entities |
| Schema.org | Structured web data | Meeting metadata, organization profiles for SEO |
| DCMI Terms | Metadata, provenance | created_at, updated_at, source_url timestamps |
| W3C Organization Ontology | Hierarchical organizations | Government hierarchy, nonprofit structures |
| ISA Location Core Vocabulary | Address standardization | address, city, state_code, latitude, longitude |
| GeoNames Ontology | Geographic identifiers | Place names, jurisdiction boundaries |
| SKOS | Taxonomies and classification | NTEE codes, policy topic categories |
Benefits of Standards Compliance
- Interoperability: Data can be easily shared with other civic tech platforms
- API Compatibility: Standard field names work with existing tools (e.g., EveryPolitician, Open Civic Data)
- Semantic Web: RDF/JSON-LD export capabilities for linked open data
- Tooling: Existing libraries and validators (e.g.,
pupa,everypolitician-popolo) - Documentation: Well-documented schemas reduce onboarding time
Example: Popolo-Compatible JSON-LD Export
{
"@context": "http://www.popoloproject.com/contexts/person.jsonld",
"@type": "Person",
"id": "ocd-person/12345678-90ab-cdef-1234-567890abcdef",
"name": "Jane Doe",
"email": "jane.doe@example.gov",
"links": [{
"note": "official website",
"url": "https://example.gov/mayor"
}],
"memberships": [{
"@type": "Membership",
"organization_id": "ocd-organization/jurisdiction/us/city/springfield",
"post_id": "mayor",
"role": "Mayor",
"start_date": "2022-01-01",
"end_date": "2026-12-31"
}],
"contact_details": [{
"type": "email",
"value": "jane.doe@example.gov",
"note": "official"
}, {
"type": "voice",
"value": "+1-555-123-4567"
}],
"sources": [{
"url": "https://example.gov/government/officials",
"note": "Official city website"
}]
}
Open Civic Data (OCD-ID) Identifiers
We use OCD-IDs for jurisdiction identifiers following OCDEP 2:
Format: ocd-division/country:<country_code>/<type>:<type_id>
Examples:
- State: ocd-division/country:us/state:al
- County: ocd-division/country:us/state:al/county:jefferson
- City: ocd-division/country:us/state:al/place:birmingham
- School District: ocd-division/country:us/state:al/school_district:birmingham_city
Normalization Rules:
- Lowercase ASCII characters (a-z)
- Numbers (0-9)
- Valid punctuation:
._~- - Spaces → underscores
- Remove special characters
📊 Data Statistics
| Entity Type | Estimated Count | Source |
|---|---|---|
| Jurisdictions | 22,000+ | Census Gazetteer |
| Counties | 3,144 | FIPS codes |
| Cities | 19,000+ | Incorporated places |
| School Districts | 13,000+ | NCES CCD |
| School District Budgets | 13,000+ | NCES F-33 Finance Survey |
| Government Budgets | 22,000+ | Census of Governments |
| Municipal Bonds | TBD | EMMA (MSRB) |
| Nonprofits | 3,000,000+ | IRS TEOS |
| Nonprofit 990 Filings | 10,000,000+ | ProPublica (10+ years) |
| Microsoft CDM: Nonprofit Engagement | ||
| Constituents | TBD | Donors, volunteers, members, beneficiaries (Microsoft CDM) |
| Donations | TBD | Financial contributions and in-kind gifts (Microsoft CDM) |
| Campaigns | TBD | Fundraising campaigns and appeals (Microsoft CDM) |
| Memberships | TBD | Member enrollments and renewals (Microsoft CDM) |
| Volunteer Activities | TBD | Volunteer hours and service events (Microsoft CDM) |
| Program Delivery Records | TBD | Programs and services delivered (Microsoft CDM) |
| Program Outcomes | TBD | Impact metrics and KPIs (Microsoft CDM) |
| Grants (Individual Awards) | TBD | IRS 990-I, USASpending.gov, Foundation Center |
| Federal Grants | 100,000+ | USASpending.gov API |
| Nonprofit Causes | 600+ | NTEE + Every.org |
| YouTube Channels | 5,000+ | Discovery pipeline |
| Meeting Platforms | 10,000+ | URL detection |
| State Legislators | 7,300+ | Open States |
| Meetings & Events | 500,000+ | Scraped (govt, hearings, events, trainings) |
| Trainings | TBD | Professional development, workshops |
| Documents | 2,000,000+ | PDF extraction |
| Ballot Measures | TBD | State/local election sites |
| State Bills | 100,000+ | Open States API |
| Policy Topics | ~50 | Curated + extracted |
| Analytics & Standards | ||
| Date Dimension Records | ~7,300 | 20 years (2010-2030) |
| Metric Views | ~100 | Pre-computed analytics definitions |
| Temporal Relationships | ~1M+ | Date keys for all time-based entities |
| Schema.org JSON-LD Exports | ~500K+ | Event, Person, Organization, Legislation, ClaimReview |
| Popolo Exports | ~100K+ | Person, Organization, Membership, VoteEvent |
| CEDS-Aligned Records | 13,000+ | School districts with NCES Element IDs |
| OCD Division IDs | 22,000+ | All jurisdictions with standardized identifiers |
| IATI Activity Files | TBD | Programs, grants, humanitarian aid (v2.03 XML) |
| Fact-Checking | ||
| Verified Claims | ~50K+ | Google Fact Check API |
| PolitiFact Ratings | ~20K+ | Truth-O-Meter rulings |
| FactCheck.org Articles | ~10K+ | Verified fact-checks |
| Vocabulary & Concepts (OMOP-Inspired) | ||
| Concept Entries | ~2M+ | Cities, nonprofits, officials, topics, demographics |
| Vocabularies | 10+ | OCD_ID, IRS_NTEE, US_Census, NCES, OHDSI (Gender/Race/Ethnicity) |
| Concept Classes | 20+ | City, County, 501c3, Mayor, Health Policy, etc. |
| Concept Relationships | ~5M+ | Hierarchies (City→County→State), Associations (Topic→Legislation) |
| OHDSI Gender Concepts | 3 | MALE, FEMALE, OTHER (Athena standard) |
| OHDSI Race Concepts | 20+ | Census OMB categories (Athena standard) |
| OHDSI Ethnicity Concepts | 2 | Hispanic or Latino, Not Hispanic or Latino (Athena standard) |
� See Also
For full citations, licenses, BibTeX references, and detailed attribution for all data sources, standards, and research:
👉 View Citations & Data Sources
Includes academic research, government data APIs, civic tech standards (OCD-ID, Popolo, Schema.org, CEDS, IATI), Microsoft CDM, OMOP CDM, fact-checking sources, and more.
�📝 Meeting & Event Types
Event Categories in the MEETING Entity
The MEETING entity tracks 4 main event categories to capture all civic engagement opportunities:
1. Government Meetings (event_category: "government_meeting")
- City council meetings, school board meetings, county commissions
- Official business conducted by elected bodies
- Fields:
body_name,meeting_type(regular, special, emergency) - Example: "Tuscaloosa City Council Regular Meeting - 3rd Tuesday"
2. Public Hearings (event_category: "public_hearing")
- Public comment sessions on specific issues
- Budget hearings, zoning hearings, policy feedback
- Fields:
meeting_type(budget, zoning, policy) - Example: "Public Hearing on FY2026 Water System Fluoridation Budget"
3. Community Events (event_category: "community_event")
- Town halls, community forums, listening sessions
- Informal engagement between government and citizens
- Fields:
location_type(in-person, virtual, hybrid) - Example: "Town Hall on Community Health Priorities"
4. Trainings (event_category: "training") ⭐ NEW
- Professional development workshops
- Continuing education for healthcare workers, teachers, officials
- Certification courses, skill-building sessions
- Fields:
training_topic- Subject matter (e.g., "Pediatric Oral Health", "Water Fluoridation Safety")target_audience- Who should attend (e.g., "Dental Hygienists", "School Nurses", "Water Operators")presenter- Trainer/instructor name or organizationrequires_registration- Boolean flagregistration_fee- Cost to attend (0 for free)max_capacity- Attendance limitend_date- Training end time (multi-day events)
- Example: "Fluoride Varnish Application Training for School Nurses (3 CEU)"
Why Trainings Matter for Advocacy
Capacity Building:
- ✅ Identify training gaps ("No fluoride varnish training in past 2 years")
- ✅ Track professional development opportunities
- ✅ Monitor continuing education credits (CEUs) offered
Stakeholder Engagement:
- ✅ Find healthcare workers trained in specific skills
- ✅ Identify champions (frequent training attendees)
- ✅ Target outreach to trained professionals
Policy Implementation:
- ✅ "City wants dental screenings but no trained staff" → Show available trainings
- ✅ Track certification status (who's qualified to implement policy)
- ✅ Link training availability to policy feasibility
Example Questions Now Answerable:
- "What oral health trainings are offered in Alabama?" → Filter by
training_topicLIKE '%oral%' - "Which jurisdictions offer free fluoride training?" →
registration_fee = 0ANDtraining_topicLIKE '%fluoride%' - "How many school nurses attended varnish training last year?" → Count attendees by
target_audience - "Are there upcoming water fluoridation operator trainings?" →
training_topicANDmeeting_date> TODAY
Meeting Types Within Each Category
Government Meetings:
- Regular sessions, special sessions, emergency meetings
- Work sessions, committee meetings, executive sessions
Public Hearings:
- Budget hearings, zoning hearings, policy feedback sessions
- Environmental impact hearings, license applications
Community Events:
- Town halls, listening sessions, community forums
- Neighborhood meetings, stakeholder roundtables
Trainings:
- Professional development workshops
- Certification courses (CPR, fluoride application, etc.)
- Continuing education (CEU/CME credits)
- Skill-building sessions (motivational interviewing, cultural competency)
�💰 Nonprofit Funding Source Tracking
Revenue Source Breakdown (Form 990 Data)
The NONPROFIT_FINANCES entity tracks 10 different revenue sources to understand how nonprofits are funded:
1. Grant Revenue (Institutional Funding)
government_grants- Federal, state, local government grantsfoundation_grants- Private foundation grants (Gates, Ford, etc.)- Why it matters: Grant-dependent orgs may be less sustainable, more restrictive
2. Donation Revenue (Community Funding)
individual_donations- Direct donations from peoplecorporate_donations- Corporate giving programsmembership_dues- Member subscriptions/fees- Why it matters: Grassroots funding = community support, more flexible use
3. Earned Revenue (Self-Sufficiency)
program_service_revenue- Fees for services (clinic visits, classes, etc.)special_events_revenue- Galas, fundraisers, eventsrental_income- Property rentalssale_of_assets- Asset sales- Why it matters: Self-generated revenue = sustainability, independence
4. Investment Revenue
investment_income- Interest, dividends, capital gains- Why it matters: Endowment size, financial health
5. Other Revenue
other_revenue- Miscellaneous sources- Why it matters: Unusual funding patterns
Calculated Metrics
-
overhead_ratio= (admin_expenses + fundraising_expenses) / total_expenses- Lower = more efficient (more goes to programs)
- Industry benchmark: <25% overhead is "good"
-
fundraising_efficiency= contributions_received / fundraising_expenses- Higher = better (more money raised per dollar spent)
- Industry benchmark: $4+ raised per $1 spent
Why This Matters for Advocacy
Find sustainable partners:
- ✅ High individual donations = community trust
- ✅ Diversified revenue = financial stability
- ⚠️ Single-grant dependent = risky partnership
Evaluate efficiency:
- ✅ Low overhead ratio = more program dollars
- ✅ High fundraising efficiency = good stewardship
- ⚠️ High admin costs = potential waste
Identify funding gaps:
- Compare similar nonprofits' revenue mix
- Find underutilized funding sources (e.g., membership programs)
- Target corporate donation opportunities
Example Questions Now Answerable:
- "Which dental nonprofits have the most individual donors?" (community support)
- "What's the average overhead for oral health organizations?" (efficiency benchmark)
- "Are dental nonprofits more grant-dependent or self-sufficient?" (sustainability)
- "Which funders support oral health work?" (foundation grants analysis)
💵 Grant Tracking System
Individual Grant Transactions (GRANT Entity)
The GRANT entity tracks individual grant awards beyond just aggregate 990 financials. This provides transaction-level detail for:
Grant Fields
- Recipient Info:
recipient_ein,recipient_name,recipient_type(nonprofit, government, etc.) - Funder Info:
funder_name,funder_ein,funder_type(foundation, government, corporate) - Grant Details:
grant_amount,grant_purpose,program_area - Timeline:
award_date,start_date,end_date,grant_duration_months - Status:
grant_status(active, completed, terminated) - Type:
funding_source(federal, state, foundation, corporate) - Restrictions:
multi_year,restrictions,reporting_requirements
Data Sources
IRS Form 990 Schedule I:
- Grants PAID by nonprofits to other organizations
- Required for organizations granting >$5,000/year
- Shows foundation giving patterns
USASpending.gov API (FREE):
- All federal grants to states, localities, nonprofits
- Contract and grant transactions $25K+
- Real-time data updated daily
Foundation Center/Candid:
- Private foundation grants (990-PF data)
- Grant descriptions, amounts, recipients
State Grant Databases:
- State-level grant programs
- Varies by state
Why Grant Tracking Matters
Follow the Money:
- ✅ "Who funds oral health work in Alabama?" → Track all grants by
program_area - ✅ "Which foundations support fluoridation?" → Search grant purposes
- ✅ "How much federal money goes to dental access?" → Sum
funding_source = federal
Find Funding Opportunities:
- ✅ Identify active grant programs (similar grants to similar orgs)
- ✅ Discover new funders entering a program area
- ✅ Track grant sizes and typical durations
Partnership Intelligence:
- ✅ "Who else is this foundation funding?" → Find collaborators
- ✅ "What's this nonprofit's grant portfolio?" → Assess stability
- ✅ Multi-year grants = long-term commitment signal
Policy Implementation:
- ✅ "Is there grant funding for this program?" → Search active grants
- ✅ "Which jurisdictions received similar grants?" → Learn from others
- ✅ Track grant requirements and restrictions
Example Questions Now Answerable:
-
"What federal grants support dental health in Alabama schools?" →
funding_source = 'federal'ANDprogram_areaLIKE '%dental%' ANDrecipient_type = 'school_district' -
"Which foundations give the largest oral health grants?" → GROUP BY
funder_nameWHEREprogram_areaLIKE '%oral health%' ORDER BY SUM(grant_amount) -
"How long do typical dental access grants last?" → AVG(
grant_duration_months) WHEREprogram_area= 'dental access' -
"Which nonprofits receive multi-year fluoridation funding?" →
multi_year = trueANDgrant_purposeLIKE '%fluoride%' -
"What grants end in the next 6 months?" →
end_dateBETWEEN NOW() AND NOW() + 6 MONTHS (renewal opportunities!)
Dataset Structure
grants/
├── nonprofit_grants # Grants TO nonprofits (Schedule I recipients)
├── government_grants # Federal/state grants to jurisdictions
├── foundation_grants # Private foundation giving (990-PF)
└── federal_grants # USASpending.gov federal grants
⏰ Time Dimension Modeling
To enable robust time-series analysis, trend tracking, and temporal comparisons, we implement a comprehensive time dimension alongside our fact tables.
Time Dimension Table
DATE_DIMENSION {
date date PK
year int
quarter int
quarter_name string
month int
month_name string
month_abbr string
day_of_month int
day_of_week int
day_name string
day_abbr string
week_of_year int
fiscal_year int
fiscal_quarter int
fiscal_month int
is_weekend boolean
is_holiday boolean
holiday_name string
is_business_day boolean
days_in_month int
year_month string
year_quarter string
}
Temporal Relationships
All time-bound entities link to the date dimension for consistent temporal analysis:
Temporal Analysis Patterns
Year-over-Year Comparisons:
SELECT
d.year,
d.quarter_name,
COUNT(m.meeting_id) as meeting_count,
COUNT(m.meeting_id) - LAG(COUNT(m.meeting_id)) OVER (ORDER BY d.year, d.quarter) as yoy_change
FROM MEETING m
JOIN DATE_DIMENSION d ON m.meeting_date = d.date
WHERE d.year BETWEEN 2023 AND 2025
GROUP BY d.year, d.quarter, d.quarter_name
ORDER BY d.year, d.quarter;
Fiscal Period Aggregation:
SELECT
d.fiscal_year,
d.fiscal_quarter,
SUM(b.total_expenditures) as total_spending,
AVG(b.total_expenditures) as avg_spending
FROM GOVERNMENT_BUDGET b
JOIN DATE_DIMENSION d ON b.fiscal_year = d.fiscal_year
WHERE b.jurisdiction_type = 'city'
GROUP BY d.fiscal_year, d.fiscal_quarter;
Trend Detection:
-- Identify growing advocacy momentum
SELECT
d.year_month,
COUNT(DISTINCT pt.topic_id) as active_topics,
COUNT(m.meeting_id) as related_meetings,
COUNT(bm.measure_id) as ballot_initiatives
FROM DATE_DIMENSION d
LEFT JOIN MEETING m ON m.meeting_date = d.date AND m.oral_health_related = true
LEFT JOIN POLICY_TRACKER pt ON d.date BETWEEN pt.start_date AND COALESCE(pt.end_date, CURRENT_DATE)
LEFT JOIN BALLOT_MEASURE bm ON bm.election_date = d.date
WHERE d.date >= DATE_SUB(CURRENT_DATE, INTERVAL 24 MONTH)
GROUP BY d.year_month
ORDER BY d.year_month;
📊 Metric Views
Metric views provide pre-aggregated, analysis-ready datasets combining multiple source tables with built-in dimensions, measures, and filters.
Core Metric View Components
| Component | Description | Example |
|---|---|---|
| Source | Base table, view, or SQL query containing the data | MEETING, GOVERNMENT_BUDGET, NONPROFIT_FILING |
| Dimensions | Column attributes used to segment or group metrics | jurisdiction_type, fiscal_year, policy_topic |
| Measures | Column aggregations that produce metrics | COUNT(meeting_id) as meeting_count, SUM(grant_amount) as total_funding |
| Filters | Conditions applied to source data to define scope | oral_health_related = true, fiscal_year > 2020 |
| Joins | Relationships between tables to enrich data | JOIN JURISDICTION ON meeting.jurisdiction_id = jurisdiction.jurisdiction_id |
Example Metric Views
1. Advocacy Activity Metrics
Purpose: Track oral health advocacy momentum across jurisdictions
CREATE VIEW metric_advocacy_activity AS
SELECT
-- Dimensions
j.jurisdiction_id,
j.jurisdiction_type,
j.state_code,
j.county_name,
d.year,
d.quarter_name,
d.month_name,
pt.topic_name,
-- Measures
COUNT(DISTINCT m.meeting_id) as meeting_count,
COUNT(DISTINCT bm.measure_id) as ballot_measure_count,
COUNT(DISTINCT l.bill_id) as legislation_count,
COUNT(DISTINCT fc.claim_id) as fact_check_count,
-- Calculated Metrics
SUM(CASE WHEN m.oral_health_related THEN 1 ELSE 0 END) as oral_health_meeting_count,
AVG(CASE WHEN bm.result = 'passed' THEN 1 ELSE 0 END) as ballot_success_rate,
COUNT(DISTINCT n.nonprofit_id) as active_nonprofit_count
FROM JURISDICTION j
JOIN DATE_DIMENSION d ON d.date BETWEEN DATE_SUB(CURRENT_DATE, INTERVAL 365 DAY) AND CURRENT_DATE
LEFT JOIN MEETING m ON m.jurisdiction_id = j.jurisdiction_id AND m.meeting_date = d.date
LEFT JOIN POLICY_TRACKER pt ON pt.jurisdiction_id = j.jurisdiction_id
LEFT JOIN BALLOT_MEASURE bm ON bm.jurisdiction_id = j.jurisdiction_id AND bm.election_date = d.date
LEFT JOIN LEGISLATION l ON l.state_code = j.state_code AND l.introduced_date = d.date
LEFT JOIN FACT_CHECK fc ON fc.published_date = d.date
LEFT JOIN NONPROFIT n ON n.jurisdiction_id = j.jurisdiction_id
WHERE
-- Filters
d.is_business_day = true
AND (
m.oral_health_related = true
OR pt.topic_area LIKE '%oral health%'
OR pt.topic_area LIKE '%dental%'
OR pt.topic_area LIKE '%fluoride%'
)
GROUP BY
j.jurisdiction_id, j.jurisdiction_type, j.state_code, j.county_name,
d.year, d.quarter_name, d.month_name, pt.topic_name;
Usage:
-- Find top 10 most active jurisdictions for oral health advocacy
SELECT
jurisdiction_id,
state_code,
jurisdiction_type,
SUM(meeting_count) as total_meetings,
SUM(ballot_measure_count) as total_ballot_measures,
SUM(oral_health_meeting_count) as oral_health_meetings
FROM metric_advocacy_activity
WHERE year = 2025
GROUP BY jurisdiction_id, state_code, jurisdiction_type
ORDER BY total_meetings DESC
LIMIT 10;
2. Government Spending Metrics
Purpose: Analyze government budget allocations and trends
CREATE VIEW metric_government_spending AS
SELECT
-- Dimensions
j.jurisdiction_id,
j.jurisdiction_type,
j.state_code,
j.population,
d.fiscal_year,
d.fiscal_quarter,
bc.category_name as budget_category,
-- Measures
SUM(gb.total_revenue) as total_revenue,
SUM(gb.total_expenditures) as total_expenditures,
SUM(gb.total_debt) as total_debt,
SUM(gb.property_tax_revenue) as property_tax_revenue,
SUM(gb.federal_grants) as federal_grants,
SUM(gb.state_grants) as state_grants,
-- Calculated Metrics
SUM(gb.total_revenue) / NULLIF(j.population, 0) as revenue_per_capita,
SUM(gb.total_expenditures) / NULLIF(j.population, 0) as spending_per_capita,
SUM(gb.total_debt) / NULLIF(j.population, 0) as debt_per_capita,
(SUM(gb.total_revenue) - SUM(gb.total_expenditures)) as budget_surplus_deficit,
SUM(bc.amount) / NULLIF(SUM(gb.total_expenditures), 0) * 100 as category_pct_of_budget
FROM JURISDICTION j
JOIN DATE_DIMENSION d ON d.fiscal_year BETWEEN 2020 AND 2025
JOIN GOVERNMENT_BUDGET gb ON gb.jurisdiction_id = j.jurisdiction_id AND gb.fiscal_year = d.fiscal_year
LEFT JOIN BUDGET_CATEGORY bc ON bc.budget_id = gb.budget_id
WHERE
-- Filters
gb.total_expenditures > 0
AND j.population > 0
GROUP BY
j.jurisdiction_id, j.jurisdiction_type, j.state_code, j.population,
d.fiscal_year, d.fiscal_quarter, bc.category_name;
Usage:
-- Compare spending per capita across jurisdiction types
SELECT
jurisdiction_type,
fiscal_year,
AVG(spending_per_capita) as avg_spending_per_capita,
AVG(revenue_per_capita) as avg_revenue_per_capita,
AVG(debt_per_capita) as avg_debt_per_capita
FROM metric_government_spending
GROUP BY jurisdiction_type, fiscal_year
ORDER BY fiscal_year DESC, avg_spending_per_capita DESC;
3. Nonprofit Impact Metrics
Purpose: Measure nonprofit activity, funding, and service delivery
CREATE VIEW metric_nonprofit_impact AS
SELECT
-- Dimensions
n.nonprofit_id,
n.organization_name,
n.state,
n.city,
nc.ntee_code,
nc.category_name,
d.tax_year,
-- Measures
SUM(nf.total_revenue) as total_revenue,
SUM(nf.total_expenses) as total_expenses,
SUM(nf.total_assets) as total_assets,
SUM(nf.program_service_expenses) as program_expenses,
SUM(nf.fundraising_expenses) as fundraising_expenses,
SUM(nf.management_expenses) as management_expenses,
COUNT(DISTINCT g.grant_id) as grants_received_count,
SUM(g.grant_amount) as total_grants_received,
-- Calculated Metrics
SUM(nf.program_service_expenses) / NULLIF(SUM(nf.total_expenses), 0) * 100 as program_expense_ratio,
SUM(nf.fundraising_expenses) / NULLIF(SUM(nf.total_revenue), 0) * 100 as fundraising_efficiency,
(SUM(nf.total_revenue) - SUM(nf.total_expenses)) as net_income,
COUNT(DISTINCT nf.year) as years_active
FROM NONPROFIT n
JOIN DATE_DIMENSION d ON d.year BETWEEN 2020 AND 2025
JOIN NONPROFIT_FILING nf ON nf.ein = n.ein AND nf.tax_year = d.year
LEFT JOIN NONPROFIT_CAUSE nc ON nc.cause_id = n.primary_cause_id
LEFT JOIN GRANT g ON g.recipient_ein = n.ein AND g.tax_year = d.year
WHERE
-- Filters
nf.total_revenue > 0
AND (
nc.category_name LIKE '%health%'
OR nc.category_name LIKE '%dental%'
OR n.mission_statement LIKE '%oral health%'
)
GROUP BY
n.nonprofit_id, n.organization_name, n.state, n.city,
nc.ntee_code, nc.category_name, d.tax_year;
Usage:
-- Identify high-performing health nonprofits by program efficiency
SELECT
organization_name,
state,
tax_year,
total_revenue,
program_expense_ratio,
fundraising_efficiency
FROM metric_nonprofit_impact
WHERE
tax_year = 2024
AND total_revenue > 1000000
AND program_expense_ratio > 75 -- More than 75% goes to programs
ORDER BY program_expense_ratio DESC, total_revenue DESC
LIMIT 20;
Metric View Best Practices
- Grain Definition: Clearly define the granularity of each metric view (e.g., per jurisdiction per month)
- Performance: Pre-aggregate expensive calculations to improve query performance
- Incremental Updates: Design views to support incremental refresh rather than full rebuilds
- Documentation: Document all dimension values, measure calculations, and filter logic
- Naming Convention: Use
metric_prefix followed by descriptive name (e.g.,metric_advocacy_activity) - Testing: Validate measure calculations against source data to ensure accuracy
Query Optimization
For large-scale analytics, metric views can be materialized:
-- Materialize for fast querying
CREATE MATERIALIZED VIEW metric_advocacy_activity_mat AS
SELECT * FROM metric_advocacy_activity;
-- Refresh incrementally
REFRESH MATERIALIZED VIEW metric_advocacy_activity_mat;
-- Add indexes on common filter/join columns
CREATE INDEX idx_advocacy_state_year
ON metric_advocacy_activity_mat(state_code, year);
CREATE INDEX idx_advocacy_jurisdiction
ON metric_advocacy_activity_mat(jurisdiction_id);
🔧 System-Internal Tables (OMOP-Inspired Vocabulary)
Purpose: Standardized terminology and concept management following OHDSI OMOP Common Data Model principles. These tables provide semantic interoperability and enable precise data linkage across all entities.
⚠️ Technical Note: These are internal reference tables used by data engineers and ETL pipelines. Non-technical users can ignore this section.
Vocabulary & Concept System (OMOP-Inspired)
Vocabulary Sources
| Vocabulary ID | Vocabulary Name | Source | Use Case |
|---|---|---|---|
| OCD_ID | Open Civic Data Division IDs | https://github.com/opencivicdata/ocd-division-ids | Standard jurisdiction identifiers |
| IRS_NTEE | IRS National Taxonomy of Exempt Entities | IRS TEOS | Nonprofit classification |
| US_Census | U.S. Census Bureau | Census Gazetteer, ACS | Demographics, geography |
| NCES | National Center for Education Statistics | NCES CCD, F-33 | School districts, education data |
| OHDSI_Gender | OHDSI Gender | Athena | Standard gender concepts (interoperable with medical research) |
| OHDSI_Race | OHDSI Race | Athena | Standard race concepts (OMB Classification) |
| OHDSI_Ethnicity | OHDSI Ethnicity | Athena | Standard ethnicity concepts (Hispanic/Latino) |
| OpenNavigator | Custom Civic Concepts | Internal | Cities, officials, topics (ID > 2,000,000,000) |
Concept Classes for Civic Data
| Concept Class | Domain | Examples |
|---|---|---|
| City | Jurisdiction | Incorporated places, consolidated city-counties |
| County | Jurisdiction | U.S. counties, county equivalents |
| State | Jurisdiction | U.S. states, territories, DC |
| School District | Jurisdiction | LEAs (Local Educational Agencies) |
| 501c3 | Nonprofit | Tax-exempt charitable organizations |
| 501c4 | Nonprofit | Social welfare organizations |
| Mayor | Position | Chief executive of city government |
| Council Member | Position | Legislative member |
| Superintendent | Position | School district chief administrator |
| Health Policy | Topic | Fluoridation, nutrition, dental care |
| Education Policy | Topic | School funding, curriculum, facilities |
Example Concept Entries
-- Standard concept for a city
INSERT INTO CONCEPT VALUES (
2000000001, -- concept_id (custom range)
'Birmingham, Alabama', -- concept_name
'Jurisdiction', -- domain_id
'OCD_ID', -- vocabulary_id
'City', -- concept_class_id
'S', -- standard_concept (Standard)
'ocd-division/country:us/state:al/place:birmingham', -- concept_code
'2020-01-01', -- valid_start_date
'2099-12-31', -- valid_end_date
NULL -- invalid_reason
);
-- Concept for a nonprofit cause
INSERT INTO CONCEPT VALUES (
2000000101, -- concept_id
'Animal Welfare', -- concept_name
'Nonprofit', -- domain_id
'IRS_NTEE', -- vocabulary_id
'501c3', -- concept_class_id
'C', -- standard_concept (Classification)
'D20', -- concept_code (NTEE code)
'2020-01-01',
'2099-12-31',
NULL
);
-- Gender concept from OHDSI Athena (interoperable with medical research)
INSERT INTO CONCEPT VALUES (
8507, -- concept_id (OHDSI standard)
'MALE', -- concept_name
'Gender', -- domain_id
'OHDSI_Gender', -- vocabulary_id
'Gender', -- concept_class_id
'S',
'M',
'1970-01-01',
'2099-12-31',
NULL
);
Concept Relationships
-- City is part of County
INSERT INTO CONCEPT_RELATIONSHIP VALUES (
2000000001, -- Birmingham, AL (concept_id_1)
2000000050, -- Jefferson County, AL (concept_id_2)
'Is part of', -- relationship_id
'2020-01-01',
'2099-12-31',
NULL
);
-- Topic regulates Legislation
INSERT INTO CONCEPT_RELATIONSHIP VALUES (
2000000201, -- Water Fluoridation topic
2000000305, -- Ordinance 101
'Regulates',
'2024-01-15',
'2099-12-31',
NULL
);
-- Organization addresses Topic
INSERT INTO CONCEPT_RELATIONSHIP VALUES (
2000000401, -- Dental Health Foundation
2000000201, -- Water Fluoridation
'Addresses',
'2023-06-01',
'2099-12-31',
NULL
);
Implementation in Existing Tables
Updated JURISDICTION Schema
CREATE TABLE JURISDICTION (
jurisdiction_id VARCHAR(255) PRIMARY KEY,
name VARCHAR(255),
jurisdiction_type VARCHAR(50),
-- OMOP-style concept references
city_concept_id INTEGER REFERENCES CONCEPT(concept_id),
city_type_concept_id INTEGER REFERENCES CONCEPT(concept_id), -- e.g., "Consolidated City-County"
source_value VARCHAR(255), -- Original text from data source (e.g., "SF, Calif")
state_code VARCHAR(2),
county_name VARCHAR(100),
population INTEGER,
-- ... other fields
);
Updated ORGANIZATION Schema
CREATE TABLE ORGANIZATION (
org_id VARCHAR(255) PRIMARY KEY,
ein VARCHAR(20),
name VARCHAR(255),
-- OMOP-style concept references
organization_concept_id INTEGER REFERENCES CONCEPT(concept_id),
cause_concept_id INTEGER REFERENCES CONCEPT(concept_id), -- Links to NTEE concept
org_type_concept_id INTEGER REFERENCES CONCEPT(concept_id), -- 501c3, 501c4, etc.
source_value VARCHAR(255), -- Original organization name from IRS
ntee_code VARCHAR(10),
-- ... other fields
);
Updated DEMOGRAPHICS Schema
CREATE TABLE DEMOGRAPHICS (
demographics_id VARCHAR(255) PRIMARY KEY,
jurisdiction_id VARCHAR(255) REFERENCES JURISDICTION(jurisdiction_id),
-- OMOP-style concept references (OHDSI Athena vocabularies)
race_concept_id INTEGER REFERENCES CONCEPT(concept_id), -- Standard OHDSI race codes
ethnicity_concept_id INTEGER REFERENCES CONCEPT(concept_id), -- Hispanic/Latino classification
gender_concept_id INTEGER REFERENCES CONCEPT(concept_id), -- MALE/FEMALE/OTHER
race_source_value VARCHAR(100), -- Census text (e.g., "White alone")
ethnicity_source_value VARCHAR(100), -- Census text
gender_source_value VARCHAR(50), -- Census text
-- ... other demographic fields
);
Benefits of OMOP-Style Vocabulary
- Semantic Interoperability: Civic data can be joined with healthcare research data using standard OHDSI demographic concepts
- Reproducible IDs: Deterministic hashing (uuid5) on strings like "JURISDICTION:CITY:NEW_YORK" generates consistent concept_id values
- Version Control:
valid_start_dateandvalid_end_datetrack concept changes over time - Relationship Tracking:
CONCEPT_RELATIONSHIPtable captures hierarchies (City → County → State) and associations (Topic → Legislation) - Source Traceability:
source_valuepreserves original text whileconcept_idprovides standardized reference
ETL Mapping Strategy
import uuid
def generate_concept_id(domain: str, type: str, identifier: str) -> int:
"""
Generate deterministic concept_id using UUID5.
Returns integer > 2,000,000,000 for custom civic concepts.
"""
namespace = uuid.UUID('6ba7b810-9dad-11d1-80b4-00c04fd430c8') # OMOP namespace
concept_string = f"{domain}:{type}:{identifier}".upper()
concept_uuid = uuid.uuid5(namespace, concept_string)
# Convert to integer in custom range
return 2_000_000_000 + (int(concept_uuid.hex[:8], 16) % 1_000_000_000)
# Example usage
city_concept_id = generate_concept_id("JURISDICTION", "CITY", "NEW_YORK")
print(city_concept_id) # e.g., 2045879021 (repeatable)
Downloading OHDSI Athena Vocabularies
- Visit https://athena.ohdsi.org/
- Select vocabularies:
- ✅ Gender
- ✅ Race
- ✅ Ethnicity
- ✅ Geography (US Counties, States)
- Download CSV files
- Import into
vocabulary/folder:vocabulary_gender.parquetvocabulary_race.parquetvocabulary_ethnicity.parquetvocabulary_geography.parquet
🎯 Missing Datasets to Add
High Priority
- Ballot Measures - ✅ Added to data model! Fluoridation votes, bond measures
- State Legislation - ✅ Added to data model! Open States API (FREE)
- Policy Topics - ✅ Added to data model! Oral health advocacy tracking
- Government Finances - ✅ Added to data model! City/county/state budgets, Census of Governments
- School Finances - ✅ Added to data model! NCES F-33 per-pupil spending, revenues
- Nonprofit Financials - ✅ Added to data model! Form 990 detailed financials (10M+ filings)
- Census Demographics - Full census data per jurisdiction (beyond population)
- Procurement Records - Government contracts
- Election Results - Historical voting data
- Health Outcomes - CDC PLACES data (oral health metrics!)
- Environmental Data - EPA water quality (fluoridation levels)
Medium Priority
- Property Records - Public assessment data
- Crime Statistics - UCR/NIBRS data
- Business Licenses - Local business registrations (dental clinics!)
- Building Permits - Construction activity
- Code Violations - Inspection records
- Police Reports - Public safety incidents
- Fire Department Data - Emergency response
- Parks & Recreation - Facilities & programs
- Transportation Data - Traffic & transit
Integration Improvements
- Full Wikidata Sync - All civic entities
- DBpedia Expansion - Complete local government coverage
- Ballotpedia Data - (if budget allows) Electoral info & analysis
- Court Records - Public dockets
- Tax Records - Property tax data
🚀 Implementation Status
✅ Completed
- Jurisdiction discovery pipeline
- YouTube channel discovery
- Meeting platform detection
- NCES school district ingestion
- Open States API integration
- Wikidata SPARQL queries
- DBpedia Lookup API
- Google Civic API (code ready)
- Social media discovery
- HuggingFace upload pipeline
🔨 In Progress
- Meeting minutes extraction (Tuscaloosa pilot)
- Video transcript processing
- Document keyword detection
- Nonprofit data enrichment
📋 Planned
- Automated meeting scraping at scale
- Real-time meeting notifications
- Budget document parsing
- Full census integration
- Health outcome correlation
📚 Related Documentation
Data Standards & Specifications
-
Popolo Project - Open government data specification for people, organizations, and elected positions. Our LEADER, ORGANIZATION, and JURISDICTION entities follow Popolo schema conventions for maximum interoperability with civic tech platforms.
-
Schema.org - W3C structured data vocabulary for semantic web. Our entities map to Schema.org types (Event, Person, Organization, Legislation, ClaimReview, etc.) enabling SEO-optimized JSON-LD exports, Google Search rich results, and voice assistant compatibility.
-
Common Education Data Standards (CEDS) - U.S. Department of Education data standards for K-12, postsecondary, and workforce data. Our SCHOOL_DISTRICT entity aligns with CEDS Element IDs and NCES survey specifications (CCD, F-33 Finance).
-
Open Civic Data (OCD-IDs) - Standardized division identifiers for jurisdictions. Format:
ocd-division/country:us/state:al/place:birmingham
Internal Documentation
Last Updated: 2026-05-02
Data Model Version: 2.1 (Popolo-compatible)