Skip to main content

API Troubleshooting

Common issues when working with external APIs and their solutions.

ProPublica Nonprofit Explorer API

500 Internal Server Error

Symptom:

ERROR | ProPublica API request failed: 500 Server Error: Internal Server Error

Cause: The ProPublica API is experiencing server-side issues. This is not a problem with your code or configuration.

Solution:

The pipeline now includes automatic retry logic with exponential backoff:

  1. Automatic retries: Up to 3 attempts per request
  2. Exponential backoff: 2s, 4s, 8s delays between retries
  3. Graceful degradation: Continues processing other states/NTEE codes if one fails

What to do:

  1. Wait and retry - API issues are usually temporary:

    # Try again in 5-10 minutes
    python scripts/create_all_gold_tables.py --nonprofits-only --states AL MI
  2. Try different states - Some states may work while others fail:

    # Try California and Texas instead
    python scripts/create_all_gold_tables.py --nonprofits-only --states CA TX
  3. Use cached data - If you've successfully discovered data before:

    # Use existing bronze data
    python scripts/create_all_gold_tables.py --nonprofits-only --skip-discovery
  4. Check API status - Visit the ProPublica website to check for known issues

  5. Reduce request volume - Try fewer NTEE codes at once by modifying the script

:::tip Success Rate The pipeline shows a discovery summary with success/failure counts:

DISCOVERY SUMMARY
Total requests: 12
Successful: 8 (66.7%)
No results: 2
Failed: 2
Total nonprofits discovered: 1,247

Even with some failures, you'll still get useful data! :::

Rate Limiting

Symptom:

Too many requests

Solution: The pipeline includes automatic rate limiting (1 request/second). If you still encounter issues, the built-in retry logic will handle it.

Timeout Errors

Symptom:

Request timeout

Solution:

  • Automatic retry with exponential backoff
  • Timeout increased to 30 seconds per request
  • If all retries fail, continues to next request

Alternative Data Sources

If ProPublica API is consistently unavailable, you can use these alternative sources:

Direct download of IRS data:

2. Every.org API

Alternative nonprofit data source (requires registration):

3. GuideStar/Candid

Comprehensive nonprofit database (some features require subscription):

Pipeline Best Practices

Start Small

# Test with one state first
python scripts/create_all_gold_tables.py --nonprofits-only --states AL

Check Cached Data

# See what's already been discovered
ls -lh data/cache/nonprofits/
ls -lh data/bronze/nonprofits/

Monitor Progress

The pipeline provides detailed logging:

  • ✅ Successful requests
  • ⚠️ No results found
  • ❌ Failed requests
  • Progress counter (8/12)

Use Skip Discovery

If you've already discovered data and just want to regenerate gold tables:

python scripts/create_all_gold_tables.py --nonprofits-only --skip-discovery

Error Codes Reference

Error CodeMeaningSolution
500Server errorRetry later, API is down
429Too many requestsBuilt-in rate limiting handles this
404Not foundCheck state/NTEE code validity
403ForbiddenCheck if API requires authentication
TimeoutRequest took too longAutomatic retry with backoff

Getting Help

If issues persist:

  1. Check cache directory - Data may have been partially downloaded:

    ls -lh data/cache/nonprofits/
  2. Review logs - Detailed error messages help diagnose issues

  3. Try different parameters:

    # Different states
    --states NY CA FL

    # Skip discovery (use cached)
    --skip-discovery
  4. File an issue - Include:

    • Error messages
    • States/NTEE codes attempted
    • Timestamp
    • Discovery summary output

Success Stories

Expected behavior:

  • Some requests may fail (API issues)
  • Pipeline continues processing
  • You get partial results from successful requests
  • Summary shows what worked vs. what failed

Example successful run:

DISCOVERY SUMMARY
Total requests: 24 (4 states × 6 NTEE codes)
Successful: 18 (75%)
No results: 4
Failed: 2
Total nonprofits discovered: 3,421

✅ Created gold tables with 3,421 nonprofit records!

Even with 2 failed requests, you got 3,400+ nonprofits!


Quick Reference

# Standard run (handles failures gracefully)
python scripts/create_all_gold_tables.py --nonprofits-only --states AL MI

# Use cached data (skip API calls)
python scripts/create_all_gold_tables.py --nonprofits-only --skip-discovery

# Try different states if some fail
python scripts/create_all_gold_tables.py --nonprofits-only --states CA TX NY

# Run only meetings (no API calls)
python scripts/create_all_gold_tables.py --meetings-only