Skip to main content

Working with Open States Legislative Data

Complete guide to using the Open States PostgreSQL dump downloaded from Plural Policy.

� Quick Start (Complete Setup)

Prerequisites:

  • PostgreSQL 15+ installed (Download)
  • Python 3.12+ with venv
  • ~15 GB free disk space

Full Setup (20-30 minutes):

# 1. Download BOTH schema and data files (~10 GB total, takes 5-10 min)
python scripts/bulk_legislative_download.py --postgres --month 2026-04

# This downloads TWO files:
# - 2026-04-schema.pgdump (~50 MB) - creates tables
# - 2026-04-public.pgdump (~10 GB) - contains data

# 2. Create database
createdb openstates

# 3. Restore schema first (creates all tables, ~30 seconds)
pg_restore \
--dbname=openstates \
--clean \
--if-exists \
--no-owner \
--no-privileges \
data/cache/legislation_bulk/postgres/2026-04-schema.pgdump

# 4. Restore data (takes 10-15 minutes for 9.8 GB)
pg_restore \
--dbname=openstates \
--data-only \
--disable-triggers \
--no-owner \
--no-privileges \
data/cache/legislation_bulk/postgres/2026-04-public.pgdump

# 5. Verify tables loaded
psql openstates -c "\dt" | grep opencivicdata

# 6. Test query
psql openstates -c "SELECT COUNT(*) FROM opencivicdata_person;"

# 7. Add to .env file
echo "OPENSTATES_DATABASE_URL=postgresql://postgres:postgres@localhost:5432/openstates" >> .env

# 8. Test Python connection
python -c "
from dotenv import load_dotenv
import os
from sqlalchemy import create_engine
load_dotenv()
engine = create_engine(os.getenv('OPENSTATES_DATABASE_URL'))
with engine.connect() as conn:
from sqlalchemy import text
result = conn.execute(text('SELECT COUNT(*) FROM opencivicdata_person'))
print(f'✅ Connected! Total people: {result.scalar()}')
"

Expected Output:

✅ Connected! Total people: 7342

You're ready to query legislative data!

�📥 Download the Data

The PostgreSQL dump requires TWO separate files from OpenStates:

  1. Schema file (~50 MB): Creates all database tables, indexes, and constraints
  2. Data file (~10 GB): Contains all the actual legislative data
# Download both files for the latest month
python scripts/bulk_legislative_download.py --postgres --month 2026-04

# Files saved to:
# data/cache/legislation_bulk/postgres/2026-04-schema.pgdump (schema)
# data/cache/legislation_bulk/postgres/2026-04-public.pgdump (data)

Why two files?

OpenStates separates schema and data for flexibility:

  • Schema file: Small, updates rarely, creates table structure
  • Data file: Large, updated monthly, contains all records

This allows you to update data without recreating the schema each time.

What's Included:

  • 7,300+ state legislators with complete profiles
  • 100,000+ bills with full text (2020+)
  • Committee assignments and memberships
  • Roll call votes with individual legislator positions
  • Bill sponsorships and co-sponsors
  • Bill actions (timeline of committee/floor activity)
  • Multiple bill versions (as introduced, amended, enrolled)
  • Legislator contact information (district and capitol offices)

🗄️ Load Data into PostgreSQL

Prerequisites:

  • PostgreSQL 15+ installed (Download here)
  • Both schema and data files downloaded (see above section)

Setup Steps (Two-Step Restore):

# 1. Create database
createdb openstates

# 2. Restore SCHEMA first (creates tables, indexes, constraints)
pg_restore \
--dbname=openstates \
--clean \
--if-exists \
--no-owner \
--no-privileges \
data/cache/legislation_bulk/postgres/2026-04-schema.pgdump

# This takes ~30 seconds and creates all tables

# 3. Restore DATA (loads all records into tables)
pg_restore \
--dbname=openstates \
--data-only \
--disable-triggers \
--no-owner \
--no-privileges \
data/cache/legislation_bulk/postgres/2026-04-public.pgdump

# This takes 10-15 minutes for 9.8 GB of data

# 4. Verify restoration
psql openstates -c "\dt" # List all tables (should see 30+ opencivicdata_* tables)

# 5. Test query
psql openstates -c "SELECT COUNT(*) FROM opencivicdata_person;"
# Expected: ~7,300+ legislators

Add to .env:

OPENSTATES_DATABASE_URL=postgresql://postgres:postgres@localhost:5432/openstates

Automated Script:

For convenience, use the provided setup script that does both steps:

./scripts/setup_openstates_db.sh

This script:

  • ✅ Checks that both schema and data files exist
  • ✅ Creates the database if needed
  • ✅ Restores schema first, then data
  • ✅ Verifies the restore was successful
  • ✅ Shows table counts

Troubleshooting:

# If you need to set PostgreSQL password
psql -U postgres -c "ALTER USER postgres PASSWORD 'postgres';"

# If port 5432 is busy, use different port
psql -p 5433 # Then update .env accordingly

Option 2: Docker PostgreSQL (Clean Isolation)

Use a different port (5433) to avoid conflicts with main database:

# Start PostgreSQL container
docker run -d \
--name openstates-db \
-e POSTGRES_PASSWORD=openstates \
-e POSTGRES_DB=openstates \
-p 5433:5432 \
-v $(pwd)/data/cache/legislation_bulk/postgres:/dumps \
postgres:15

# Restore SCHEMA first
docker exec -i openstates-db \
pg_restore \
--dbname=openstates \
--username=postgres \
--clean \
--if-exists \
--no-owner \
--no-privileges \
/dumps/2026-04-schema.pgdump

# Restore DATA second
docker exec -i openstates-db \
pg_restore \
--dbname=openstates \
--username=postgres \
--data-only \
--disable-triggers \
--no-owner \
--no-privileges \
/dumps/2026-04-public.pgdump

# Test connection
docker exec -it openstates-db psql -U postgres -d openstates -c "SELECT COUNT(*) FROM opencivicdata_person;"

Add to .env:

OPENSTATES_DATABASE_URL=postgresql://postgres:openstates@localhost:5433/openstates

Stop/Start Container:

docker stop openstates-db # Stop container
docker start openstates-db # Start container
docker rm openstates-db # Remove container

📊 Database Schema

Core Tables

The Open States database follows the Popolo Project standard for representing political data.

opencivicdata_person - Legislators

SELECT
id, -- OCD-ID: ocd-person/{uuid}
name, -- Full name
given_name, -- First name
family_name, -- Last name
gender, -- Male/Female/Other
email, -- Official email
biography, -- Bio text
birth_date, -- Date of birth
image, -- Photo URL
created_at,
updated_at
FROM opencivicdata_person
LIMIT 5;

opencivicdata_personmembership - Legislator Roles

SELECT
person_id, -- Links to opencivicdata_person.id
organization_id, -- Legislature ID
post_id, -- District/position
label, -- "Senator"/"Representative"
role, -- "upper"/"lower"
start_date, -- Term start
end_date -- Term end
FROM opencivicdata_personmembership
WHERE end_date > CURRENT_DATE -- Active legislators
LIMIT 5;

opencivicdata_bill - Legislation

SELECT
id, -- OCD-ID: ocd-bill/{uuid}
identifier, -- HB 123, SB 456
title, -- Bill title
classification, -- bill/resolution/concurrent_resolution
subject, -- Array of topics
from_organization_id, -- Legislature
legislative_session_id,
created_at,
updated_at
FROM opencivicdata_bill
WHERE subject @> ARRAY['Health']::varchar[] -- Health-related bills
LIMIT 5;

opencivicdata_billabstract - Bill Summaries

SELECT
bill_id,
abstract, -- Summary text
note, -- "Official Summary"
date
FROM opencivicdata_billabstract
LIMIT 5;

opencivicdata_billsponsorship - Bill Sponsors

SELECT
bill_id,
person_id, -- Links to opencivicdata_person.id
classification, -- "primary"/"cosponsor"
primary_sponsorship, -- Boolean
entity_type -- "person"/"organization"
FROM opencivicdata_billsponsorship
LIMIT 5;

opencivicdata_voteevent - Roll Call Votes

SELECT
id, -- OCD-ID: ocd-vote/{uuid}
bill_id,
organization_id, -- Chamber
motion_text, -- "Passage of HB 123"
motion_classification, -- "passage"/"amendment"
result, -- "pass"/"fail"
start_date, -- Vote date
created_at,
updated_at
FROM opencivicdata_voteevent
LIMIT 5;

opencivicdata_personvote - Individual Legislator Votes

SELECT
vote_event_id,
voter_id, -- Links to opencivicdata_person.id
option, -- "yes"/"no"/"abstain"/"absent"
voter_name, -- Name at time of vote
note
FROM opencivicdata_personvote
LIMIT 5;

opencivicdata_organization - Committees

SELECT
id, -- OCD-ID: ocd-organization/{uuid}
name, -- "Committee on Health and Human Services"
classification, -- "committee"/"subcommittee"/"legislature"
parent_id, -- Parent committee (for subcommittees)
jurisdiction_id,
created_at,
updated_at
FROM opencivicdata_organization
WHERE classification = 'committee'
LIMIT 5;

🔍 Useful Queries

SELECT
b.identifier,
b.title,
b.subject,
p.name AS sponsor,
b.created_at
FROM opencivicdata_bill b
LEFT JOIN opencivicdata_billsponsorship bs ON bs.bill_id = b.id AND bs.primary_sponsorship = true
LEFT JOIN opencivicdata_person p ON p.id = bs.person_id
LEFT JOIN opencivicdata_legislativesession ls ON ls.id = b.legislative_session_id
WHERE ls.jurisdiction_id = 'ocd-jurisdiction/country:us/state:al/government'
AND ls.identifier LIKE '2024%'
AND (
b.subject @> ARRAY['Health']::varchar[]
OR b.title ILIKE '%dental%'
OR b.title ILIKE '%oral health%'
OR b.title ILIKE '%medicaid%'
)
ORDER BY b.created_at DESC;

List Active Legislators with Committee Assignments

SELECT
p.name,
p.party_name,
pm.role AS chamber,
pm.label AS position,
o.name AS committee
FROM opencivicdata_person p
JOIN opencivicdata_personmembership pm
ON pm.person_id = p.id
AND pm.end_date > CURRENT_DATE
LEFT JOIN opencivicdata_personmembership cm
ON cm.person_id = p.id
AND cm.end_date > CURRENT_DATE
LEFT JOIN opencivicdata_organization o
ON o.id = cm.organization_id
AND o.classification = 'committee'
WHERE pm.organization_id LIKE 'ocd-organization%/legislature'
ORDER BY p.name;

Track Bill Progress Through Legislature

SELECT
b.identifier,
b.title,
ba.date AS action_date,
ba.description AS action,
ba.classification,
o.name AS organization
FROM opencivicdata_bill b
JOIN opencivicdata_billaction ba ON ba.bill_id = b.id
LEFT JOIN opencivicdata_organization o ON o.id = ba.organization_id
WHERE b.identifier = 'HB 123'
AND b.from_organization_id LIKE '%state:al%'
ORDER BY ba.date;

Count Bills by Legislator (Top Sponsors)

SELECT
p.name,
COUNT(DISTINCT bs.bill_id) AS bills_sponsored
FROM opencivicdata_person p
JOIN opencivicdata_billsponsorship bs ON bs.person_id = p.id
WHERE bs.primary_sponsorship = true
GROUP BY p.id, p.name
ORDER BY bills_sponsored DESC
LIMIT 20;

Find Roll Call Votes on Health Bills

SELECT
b.identifier,
b.title,
v.motion_text,
v.result,
v.start_date,
COUNT(CASE WHEN pv.option = 'yes' THEN 1 END) AS yes_votes,
COUNT(CASE WHEN pv.option = 'no' THEN 1 END) AS no_votes,
COUNT(CASE WHEN pv.option = 'abstain' THEN 1 END) AS abstain_votes
FROM opencivicdata_bill b
JOIN opencivicdata_voteevent v ON v.bill_id = b.id
LEFT JOIN opencivicdata_personvote pv ON pv.vote_event_id = v.id
WHERE b.subject @> ARRAY['Health']::varchar[]
GROUP BY b.id, b.identifier, b.title, v.id, v.motion_text, v.result, v.start_date
ORDER BY v.start_date DESC;

🐍 Python Integration with SQLAlchemy

Using Environment Variables

Setup .env file first (see above), then:

import os
from sqlalchemy import create_engine, text
import pandas as pd
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Connect using OPENSTATES_DATABASE_URL from .env
engine = create_engine(os.getenv('OPENSTATES_DATABASE_URL'))

# Test connection
with engine.connect() as conn:
result = conn.execute(text("SELECT COUNT(*) FROM opencivicdata_person"))
print(f"Total people in database: {result.scalar()}")

# Query legislators
legislators_df = pd.read_sql_query("""
SELECT
p.name,
p.email,
p.party_name,
pm.role AS chamber,
pm.label AS position,
j.name AS state
FROM opencivicdata_person p
JOIN opencivicdata_personmembership pm ON pm.person_id = p.id
JOIN opencivicdata_jurisdiction j ON j.id LIKE '%' || pm.organization_id || '%'
WHERE pm.end_date > CURRENT_DATE
LIMIT 100
""", engine)

print(f"Active legislators: {len(legislators_df)}")
print(legislators_df.head())

# Query health bills
health_bills_df = pd.read_sql_query("""
SELECT
b.identifier,
b.title,
b.subject,
ls.identifier AS session
FROM opencivicdata_bill b
JOIN opencivicdata_legislativesession ls ON ls.id = b.legislative_session_id
WHERE b.subject @> ARRAY['Health']::varchar[]
AND ls.identifier LIKE '2024%'
LIMIT 50
""", engine)

print(f"Health bills in 2024: {len(health_bills_df)}")

Complete Example Script

Save as scripts/query_openstates.py:

#!/usr/bin/env python3
"""
Query Open States PostgreSQL database for legislative data.

Usage:
python scripts/query_openstates.py --state al --topic health
"""
import os
import argparse
from sqlalchemy import create_engine
import pandas as pd
from dotenv import load_dotenv

load_dotenv()

def get_engine():
"""Create database engine from environment variable."""
db_url = os.getenv('OPENSTATES_DATABASE_URL')
if not db_url:
raise ValueError(
"OPENSTATES_DATABASE_URL not set in .env file. "
"See website/docs/guides/open-states-legislative-data.md for setup."
)
return create_engine(db_url)

def query_legislators(engine, state_code=None):
"""Get active legislators, optionally filtered by state."""
where_clause = ""
if state_code:
where_clause = f"AND j.id LIKE '%state:{state_code}%'"

query = f"""
SELECT
p.name,
p.email,
p.party_name,
pm.role AS chamber,
pm.label AS position,
j.name AS state
FROM opencivicdata_person p
JOIN opencivicdata_personmembership pm ON pm.person_id = p.id
JOIN opencivicdata_jurisdiction j ON j.id LIKE '%' || pm.organization_id || '%'
WHERE pm.end_date > CURRENT_DATE
{where_clause}
ORDER BY p.name
"""

return pd.read_sql_query(query, engine)

def query_bills_by_topic(engine, topic, state_code=None, year=2024):
"""Get bills by topic (Health, Education, etc.)."""
where_clause = ""
if state_code:
where_clause = f"AND ls.jurisdiction_id LIKE '%state:{state_code}%'"

query = f"""
SELECT
b.identifier,
b.title,
b.subject,
ls.identifier AS session,
ls.jurisdiction_id
FROM opencivicdata_bill b
JOIN opencivicdata_legislativesession ls ON ls.id = b.legislative_session_id
WHERE b.subject @> ARRAY['{topic}']::varchar[]
AND ls.identifier LIKE '{year}%'
{where_clause}
ORDER BY b.created_at DESC
"""

return pd.read_sql_query(query, engine)

if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Query Open States database")
parser.add_argument("--state", help="State code (e.g., al, ca, ny)")
parser.add_argument("--topic", default="Health", help="Bill topic")
parser.add_argument("--year", type=int, default=2024, help="Legislative year")

args = parser.parse_args()

engine = get_engine()

print(f"\n📊 Querying Open States Database...")
print(f" State: {args.state or 'All'}")
print(f" Topic: {args.topic}")
print(f" Year: {args.year}")

# Get legislators
legislators = query_legislators(engine, args.state)
print(f"\n👥 Active Legislators: {len(legislators)}")
print(legislators.head())

# Get bills
bills = query_bills_by_topic(engine, args.topic, args.state, args.year)
print(f"\n📜 {args.topic} Bills in {args.year}: {len(bills)}")
print(bills.head())

# Save to CSV
output_dir = "output/openstates_queries"
os.makedirs(output_dir, exist_ok=True)

legislators.to_csv(f"{output_dir}/legislators_{args.state or 'all'}.csv", index=False)
bills.to_csv(f"{output_dir}/bills_{args.topic}_{args.year}.csv", index=False)

print(f"\n✅ Results saved to {output_dir}/")

Run it:

# Query all health bills in Alabama for 2024
python scripts/query_openstates.py --state al --topic Health --year 2024

# Query all education bills nationwide
python scripts/query_openstates.py --topic Education

📊 Export to Parquet for HuggingFace

Complete Export Script

Save as scripts/export_openstates_parquet.py:

#!/usr/bin/env python3
"""
Export Open States PostgreSQL data to Parquet files for HuggingFace.

Usage:
python scripts/export_openstates_parquet.py --output data/gold/legislation/
"""
import os
import argparse
import pandas as pd
from sqlalchemy import create_engine
from dotenv import load_dotenv
from loguru import logger

load_dotenv()

def get_engine():
"""Get database engine from environment."""
db_url = os.getenv('OPENSTATES_DATABASE_URL')
if not db_url:
raise ValueError("OPENSTATES_DATABASE_URL not set in .env")
return create_engine(db_url)

def export_legislators(engine, output_dir):
"""Export active legislators."""
logger.info("Exporting legislators...")

df = pd.read_sql_query("""
SELECT
p.id AS legislator_id,
p.name AS full_name,
p.given_name,
p.family_name,
p.gender,
p.email,
p.biography,
p.birth_date,
p.death_date,
p.image,
p.party[1]->>'name' AS party_name,
p.created_at,
p.updated_at
FROM opencivicdata_person p
WHERE id IN (
SELECT DISTINCT person_id
FROM opencivicdata_personmembership
WHERE end_date > CURRENT_DATE
)
""", engine)

output_path = os.path.join(output_dir, 'legislation_legislators.parquet')
df.to_parquet(output_path, index=False, compression='snappy')
logger.success(f"✅ Exported {len(df)} legislators to {output_path}")
return len(df)

def export_legislator_roles(engine, output_dir):
"""Export legislator roles/terms."""
logger.info("Exporting legislator roles...")

df = pd.read_sql_query("""
SELECT
pm.id AS role_id,
pm.person_id AS legislator_id,
pm.organization_id AS legislature_id,
pm.post_id,
pm.label AS position,
pm.role AS chamber,
pm.start_date AS term_start,
pm.end_date AS term_end,
pm.created_at,
pm.updated_at
FROM opencivicdata_personmembership pm
WHERE pm.organization_id LIKE 'ocd-organization%'
""", engine)

output_path = os.path.join(output_dir, 'legislation_legislator_roles.parquet')
df.to_parquet(output_path, index=False, compression='snappy')
logger.success(f"✅ Exported {len(df)} roles to {output_path}")
return len(df)

def export_bills(engine, output_dir):
"""Export bills from 2020+."""
logger.info("Exporting bills (2020+)...")

df = pd.read_sql_query("""
SELECT
b.id AS bill_id,
b.identifier,
b.title,
b.classification,
b.subject,
b.from_organization_id AS legislature_id,
b.legislative_session_id,
b.created_at,
b.updated_at
FROM opencivicdata_bill b
WHERE b.created_at >= '2020-01-01'
""", engine)

output_path = os.path.join(output_dir, 'legislation_bills.parquet')
df.to_parquet(output_path, index=False, compression='snappy')
logger.success(f"✅ Exported {len(df)} bills to {output_path}")
return len(df)

def export_bill_sponsors(engine, output_dir):
"""Export bill sponsorships."""
logger.info("Exporting bill sponsors...")

df = pd.read_sql_query("""
SELECT
bs.id AS sponsor_id,
bs.bill_id,
bs.person_id AS legislator_id,
bs.classification AS sponsor_type,
bs.primary AS is_primary_sponsor,
bs.entity_type
FROM opencivicdata_billsponsorship bs
""", engine)

output_path = os.path.join(output_dir, 'legislation_bill_sponsors.parquet')
df.to_parquet(output_path, index=False, compression='snappy')
logger.success(f"✅ Exported {len(df)} sponsorships to {output_path}")
return len(df)

def export_vote_events(engine, output_dir):
"""Export vote events from 2020+."""
logger.info("Exporting vote events...")

df = pd.read_sql_query("""
SELECT
v.id AS vote_event_id,
v.bill_id,
v.organization_id,
v.motion_text,
v.motion_classification,
v.result,
v.start_date AS vote_date,
v.created_at,
v.updated_at
FROM opencivicdata_voteevent v
WHERE v.start_date >= '2020-01-01'
""", engine)

output_path = os.path.join(output_dir, 'legislation_vote_events.parquet')
df.to_parquet(output_path, index=False, compression='snappy')
logger.success(f"✅ Exported {len(df)} vote events to {output_path}")
return len(df)

def export_legislator_votes(engine, output_dir):
"""Export individual legislator votes."""
logger.info("Exporting legislator votes...")

df = pd.read_sql_query("""
SELECT
pv.id AS legislator_vote_id,
pv.vote_event_id,
pv.voter_id AS legislator_id,
pv.option AS vote_position,
pv.voter_name,
pv.note
FROM opencivicdata_personvote pv
""", engine)

output_path = os.path.join(output_dir, 'legislation_legislator_votes.parquet')
df.to_parquet(output_path, index=False, compression='snappy')
logger.success(f"✅ Exported {len(df)} legislator votes to {output_path}")
return len(df)

def export_committees(engine, output_dir):
"""Export committees."""
logger.info("Exporting committees...")

df = pd.read_sql_query("""
SELECT
o.id AS committee_id,
o.jurisdiction_id,
o.name,
o.classification,
o.parent_id,
o.created_at,
o.updated_at
FROM opencivicdata_organization o
WHERE o.classification IN ('committee', 'subcommittee')
""", engine)

output_path = os.path.join(output_dir, 'legislation_committees.parquet')
df.to_parquet(output_path, index=False, compression='snappy')
logger.success(f"✅ Exported {len(df)} committees to {output_path}")
return len(df)

def export_committee_memberships(engine, output_dir):
"""Export committee memberships."""
logger.info("Exporting committee memberships...")

df = pd.read_sql_query("""
SELECT
pm.id AS membership_id,
pm.organization_id AS committee_id,
pm.person_id AS legislator_id,
pm.role,
pm.start_date,
pm.end_date,
pm.created_at,
pm.updated_at
FROM opencivicdata_personmembership pm
WHERE pm.organization_id IN (
SELECT id FROM opencivicdata_organization
WHERE classification IN ('committee', 'subcommittee')
)
""", engine)

output_path = os.path.join(output_dir, 'legislation_committee_memberships.parquet')
df.to_parquet(output_path, index=False, compression='snappy')
logger.success(f"✅ Exported {len(df)} memberships to {output_path}")
return len(df)

if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Export Open States data to Parquet")
parser.add_argument(
"--output",
default="data/gold/legislation/",
help="Output directory for Parquet files"
)

args = parser.parse_args()
os.makedirs(args.output, exist_ok=True)

logger.info(f"📊 Exporting Open States data to {args.output}")

engine = get_engine()

# Export all tables
stats = {
'legislators': export_legislators(engine, args.output),
'legislator_roles': export_legislator_roles(engine, args.output),
'bills': export_bills(engine, args.output),
'bill_sponsors': export_bill_sponsors(engine, args.output),
'vote_events': export_vote_events(engine, args.output),
'legislator_votes': export_legislator_votes(engine, args.output),
'committees': export_committees(engine, args.output),
'committee_memberships': export_committee_memberships(engine, args.output),
}

logger.success("\n✅ Export complete!")
logger.info("\n📊 Summary:")
for table, count in stats.items():
logger.info(f" {table}: {count:,} records")

Run the export:

# Export all tables to Parquet
python scripts/export_openstates_parquet.py --output data/gold/legislation/

# Verify files were created
ls -lh data/gold/legislation/

# Output:
# legislation_legislators.parquet
# legislation_legislator_roles.parquet
# legislation_bills.parquet
# legislation_bill_sponsors.parquet
# legislation_vote_events.parquet
# legislation_legislator_votes.parquet
# legislation_committees.parquet
# legislation_committee_memberships.parquet

Upload to HuggingFace

# Install huggingface-hub if not already installed
pip install huggingface-hub

# Upload datasets
python scripts/upload_to_huggingface.py \
--dataset CommunityOne/open-navigator-data \
--folder data/gold/legislation/

🎯 Oral Health Policy Use Cases

Finding Water Fluoridation Legislation

SELECT
b.identifier,
b.title,
ls.jurisdiction_id,
b.subject,
b.created_at
FROM opencivicdata_bill b
JOIN opencivicdata_legislativesession ls ON ls.id = b.legislative_session_id
WHERE (
b.title ILIKE '%fluorid%'
OR b.title ILIKE '%water treatment%'
OR EXISTS (
SELECT 1 FROM opencivicdata_billabstract ba
WHERE ba.bill_id = b.id
AND ba.abstract ILIKE '%fluorid%'
)
)
ORDER BY b.created_at DESC;

Tracking Medicaid Dental Coverage Expansion

SELECT
b.identifier,
b.title,
p.name AS sponsor,
v.result AS vote_outcome,
v.start_date AS vote_date
FROM opencivicdata_bill b
LEFT JOIN opencivicdata_billsponsorship bs ON bs.bill_id = b.id AND bs.primary_sponsorship = true
LEFT JOIN opencivicdata_person p ON p.id = bs.person_id
LEFT JOIN opencivicdata_voteevent v ON v.bill_id = b.id
WHERE (
b.title ILIKE '%medicaid%' AND b.title ILIKE '%dental%'
OR b.title ILIKE '%medicaid%' AND b.title ILIKE '%oral health%'
)
ORDER BY b.created_at DESC;

School-Based Dental Screening Programs

SELECT
b.identifier,
b.title,
ls.identifier AS session,
j.name AS state
FROM opencivicdata_bill b
JOIN opencivicdata_legislativesession ls ON ls.id = b.legislative_session_id
JOIN opencivicdata_jurisdiction j ON j.id = ls.jurisdiction_id
WHERE (
b.title ILIKE '%school%' AND b.title ILIKE '%dental%'
OR b.title ILIKE '%school%' AND b.title ILIKE '%oral health%'
OR b.title ILIKE '%school nurse%' AND b.title ILIKE '%screening%'
)
ORDER BY b.created_at DESC;