Skip to main content

OpenStates Integration & Contribution Opportunities

This document outlines our integration with OpenStates/Plural Policy and potential opportunities to contribute code back to the open-source community.

📚 References Added to Citations

We have properly cited and referenced the following OpenStates resources:

In Root Citations (CITATIONS.md)

In Website Documentation (website/docs/data-sources/citations.md)

  • ✅ Comprehensive OpenStates/Plural Policy section
  • ✅ PostgreSQL dump setup instructions
  • ✅ Contribution guidelines
  • ✅ BibTeX citation

In Contributing Guide (CONTRIBUTING.md)

  • ✅ Code of Conduct alignment with OpenStates
  • ✅ Upstream contribution guidelines
  • ✅ Testing requirements for scraper contributions

🔄 Our Current OpenStates Integration

Data We Use

  1. PostgreSQL Monthly Dumps (9.8GB+)

    • Complete legislative database for all 50 states
    • Script: scripts/bulk_legislative_download.py --postgres --month 2026-04
    • Setup: scripts/setup_openstates_db.sh
    • Use: Local SQL queries, no API rate limits
  2. CSV/JSON Session Data

    • Per-state legislative sessions
    • Bill text, votes, sponsors
    • Committee assignments
  3. Video Sources

    • YouTube channel URLs from sources field
    • Granicus video portal links
    • Meeting archive locations

Our Implementation

File: discovery/openstates_sources.py

  • Fetches jurisdiction data via API
  • Extracts video sources (YouTube, Vimeo, Granicus)
  • Maps to our jurisdiction database

File: scripts/bulk_legislative_download.py

  • Downloads PostgreSQL dumps
  • Downloads CSV/JSON session data
  • Handles all 50 states + DC + PR

🤝 Code We Could Contribute to OpenStates Scrapers

The openstates-scrapers repository uses Scrapy to collect legislative data. We have complementary code that could enhance their project:

1. Video Source Discovery Patterns

Our Code: discovery/youtube_channel_discovery.py

What it does:

  • Finds all YouTube channels for a jurisdiction (not just first match)
  • Scrapes homepages for YouTube links
  • Uses YouTube Data API for verification
  • Discovers Vimeo and Granicus portals

Potential Contribution:

  • Add video source extraction to OpenStates scrapers
  • Enhance sources field with verified YouTube channels
  • Automate Granicus portal discovery

Example Pattern:

# Our code finds these patterns
patterns = {
"youtube_channel": r"youtube\.com/(?:c/|channel/|user/|@)([\w-]+)",
"vimeo_channel": r"vimeo\.com/([\w-]+)",
"granicus": r"granicus\.com/([^/]+)",
}

2. Legistar/Granicus Platform Detection

Our Code: discovery/url_discovery_agent.py

What it does:

  • Identifies Legistar instances across cities
  • Maps Granicus video portals
  • Extracts meeting URLs and agendas

Potential Contribution:

  • Enhance OpenStates scrapers with meeting video links
  • Add Legistar meeting agenda extraction
  • Contribute URL validation patterns

Platform Patterns We Use:

platforms = {
"granicus": ["granicus.com", "legistar.com"],
"youtube": ["youtube.com", "youtu.be"],
"vimeo": ["vimeo.com"],
}

3. Meeting Archive Scraping

Our Code: agents/scraper.py

What it does:

  • Scrapes PDF meeting minutes
  • Extracts text from scanned documents (OCR)
  • Parses meeting dates and types
  • Handles multiple document formats

Potential Contribution:

  • Add meeting minutes text extraction to OpenStates
  • Enhance bill analysis with meeting context
  • Link bills to meeting discussions

📝 How to Contribute to OpenStates Scrapers

Following their local database documentation:

1. Setup OpenStates Development Environment

# Clone the scrapers repository
git clone https://github.com/openstates/openstates-scrapers.git
cd openstates-scrapers

# Install dependencies
pip install -r requirements.txt

# Setup local PostgreSQL database
createdb openstates

# Import schema (if needed)
psql -d openstates -f schema/openstates.sql

2. Test Your Scraper Locally

# Run a specific state scraper
os-update al --scrape --rpm 10

# Validate data
os-update al --scrape --validate

3. Follow Their Code of Conduct

All contributions must follow the OpenStates Code of Conduct:

  • Be respectful and professional
  • Welcome diverse perspectives
  • Focus on what's best for the community
  • Show empathy towards other contributors

4. Submit Pull Request

# Create feature branch
git checkout -b feature/video-sources

# Make changes (add video discovery to a state scraper)
# Example: scrapers/al/videos.py

# Test thoroughly
os-update al --scrape --rpm 10

# Commit and push
git commit -m "Add video source discovery for Alabama legislature"
git push origin feature/video-sources

# Open PR on GitHub

🎯 Specific Contribution Ideas

Priority 1: Add Video Sources to Scrapers

Goal: Enhance the sources field with verified video links

States to Start With:

  • Alabama - Has YouTube channel, needs verification
  • California - @CALegislature (well-documented)
  • Texas - Multiple chambers on YouTube
  • New York - Both Assembly and Senate channels

Implementation:

# In scrapers/al/__init__.py
class AlabamaScraper(BaseScraper):
def scrape_sources(self):
"""Add video sources for Alabama legislature."""
return {
"youtube": "https://www.youtube.com/@AlabamaLegislature",
"granicus": "https://alabama.granicus.com/ViewPublisher.php?view_id=6",
}

Priority 2: Meeting Minutes Integration

Goal: Link bills to meeting discussions

Use Case:

  • When bill HB123 is discussed in committee
  • Link to YouTube timestamp of discussion
  • Extract quotes from meeting minutes
  • Connect legislators' comments to votes

Implementation:

# Add meeting metadata to bill objects
bill.add_source(
url="https://www.youtube.com/watch?v=xyz&t=1234s",
note="Committee discussion at 20:34"
)

Priority 3: Granicus Portal Scraping

Goal: Automate discovery of Granicus video portals

Pattern:

  • Many jurisdictions use Granicus for meeting videos
  • URLs follow pattern: {jurisdiction}.granicus.com/ViewPublisher.php?view_id={id}
  • Could automate discovery and link to OpenStates jurisdictions

🔒 License Compatibility

Our License

  • Code: Open source (check root LICENSE file)
  • Data: Citations required (see CITATIONS.md)

OpenStates License

  • Code: BSD-style license (permissive)
  • Data: Public domain (bulk downloads)
  • Content: Varies by state (some restrictions)

Compatible: Our code contributions would be compatible with their license.


📚 Required Reading Before Contributing

Before submitting any code to OpenStates, review:

  1. Local Database Setup: https://docs.openstates.org/contributing/local-database/

    • How to set up PostgreSQL locally
    • How to run scrapers in development
    • How to test data quality
  2. Scraper Development Guide: https://docs.openstates.org/contributing/scrapers/

    • Scrapy patterns used
    • Data validation requirements
    • Testing procedures
  3. Code of Conduct: https://docs.openstates.org/code-of-conduct/

    • Community standards
    • Communication guidelines
    • Enforcement policies
  4. Schema Documentation: https://github.com/openstates/people/blob/master/schema.md

    • Data model structure
    • Required vs optional fields
    • Relationship patterns

🚀 Next Steps

For This Project

  1. Citations Added - OpenStates properly credited
  2. Code of Conduct - Aligned with their standards
  3. Local Database - PostgreSQL dumps integrated
  4. Test Contributions - Validate our code works with their schema

For Community Contribution

  1. Identify Target State - Choose state needing video sources
  2. Test Locally - Set up OpenStates dev environment
  3. Develop Scraper - Add video discovery code
  4. Submit PR - Follow their contribution guidelines
  5. Iterate - Respond to code review feedback

💡 Benefits of Contributing

For OpenStates:

  • Enhanced video source coverage
  • Better meeting-to-bill linkage
  • More comprehensive legislative tracking

For Our Project:

  • Upstream improvements benefit us
  • Community recognition
  • Better data quality for all users

For Civic Tech:

  • Shared infrastructure improvements
  • Reduced duplication of effort
  • Stronger open-source ecosystem

📞 Questions?


Last Updated: April 29, 2026
Maintained By: Open Navigator team