API Troubleshooting
Common issues when working with external APIs and their solutions.
ProPublica Nonprofit Explorer API
500 Internal Server Error
Symptom:
ERROR | ProPublica API request failed: 500 Server Error: Internal Server Error
Cause: The ProPublica API is experiencing server-side issues. This is not a problem with your code or configuration.
Solution:
The pipeline now includes automatic retry logic with exponential backoff:
- Automatic retries: Up to 3 attempts per request
- Exponential backoff: 2s, 4s, 8s delays between retries
- Graceful degradation: Continues processing other states/NTEE codes if one fails
What to do:
-
Wait and retry - API issues are usually temporary:
# Try again in 5-10 minutespython scripts/create_all_gold_tables.py --nonprofits-only --states AL MI -
Try different states - Some states may work while others fail:
# Try California and Texas insteadpython scripts/create_all_gold_tables.py --nonprofits-only --states CA TX -
Use cached data - If you've successfully discovered data before:
# Use existing bronze datapython scripts/create_all_gold_tables.py --nonprofits-only --skip-discovery -
Check API status - Visit the ProPublica website to check for known issues
-
Reduce request volume - Try fewer NTEE codes at once by modifying the script
:::tip Success Rate The pipeline shows a discovery summary with success/failure counts:
DISCOVERY SUMMARY
Total requests: 12
Successful: 8 (66.7%)
No results: 2
Failed: 2
Total nonprofits discovered: 1,247
Even with some failures, you'll still get useful data! :::
Rate Limiting
Symptom:
Too many requests
Solution: The pipeline includes automatic rate limiting (1 request/second). If you still encounter issues, the built-in retry logic will handle it.
Timeout Errors
Symptom:
Request timeout
Solution:
- Automatic retry with exponential backoff
- Timeout increased to 30 seconds per request
- If all retries fail, continues to next request
Alternative Data Sources
If ProPublica API is consistently unavailable, you can use these alternative sources:
1. IRS Tax Exempt Organization Search
Direct download of IRS data:
2. Every.org API
Alternative nonprofit data source (requires registration):
3. GuideStar/Candid
Comprehensive nonprofit database (some features require subscription):
Pipeline Best Practices
Start Small
# Test with one state first
python scripts/create_all_gold_tables.py --nonprofits-only --states AL
Check Cached Data
# See what's already been discovered
ls -lh data/cache/nonprofits/
ls -lh data/bronze/nonprofits/
Monitor Progress
The pipeline provides detailed logging:
- ✅ Successful requests
- ⚠️ No results found
- ❌ Failed requests
- Progress counter (8/12)
Use Skip Discovery
If you've already discovered data and just want to regenerate gold tables:
python scripts/create_all_gold_tables.py --nonprofits-only --skip-discovery
Error Codes Reference
| Error Code | Meaning | Solution |
|---|---|---|
| 500 | Server error | Retry later, API is down |
| 429 | Too many requests | Built-in rate limiting handles this |
| 404 | Not found | Check state/NTEE code validity |
| 403 | Forbidden | Check if API requires authentication |
| Timeout | Request took too long | Automatic retry with backoff |
Getting Help
If issues persist:
-
Check cache directory - Data may have been partially downloaded:
ls -lh data/cache/nonprofits/ -
Review logs - Detailed error messages help diagnose issues
-
Try different parameters:
# Different states--states NY CA FL# Skip discovery (use cached)--skip-discovery -
File an issue - Include:
- Error messages
- States/NTEE codes attempted
- Timestamp
- Discovery summary output
Success Stories
Expected behavior:
- Some requests may fail (API issues)
- Pipeline continues processing
- You get partial results from successful requests
- Summary shows what worked vs. what failed
Example successful run:
DISCOVERY SUMMARY
Total requests: 24 (4 states × 6 NTEE codes)
Successful: 18 (75%)
No results: 4
Failed: 2
Total nonprofits discovered: 3,421
✅ Created gold tables with 3,421 nonprofit records!
Even with 2 failed requests, you got 3,400+ nonprofits!
Quick Reference
# Standard run (handles failures gracefully)
python scripts/create_all_gold_tables.py --nonprofits-only --states AL MI
# Use cached data (skip API calls)
python scripts/create_all_gold_tables.py --nonprofits-only --skip-discovery
# Try different states if some fail
python scripts/create_all_gold_tables.py --nonprofits-only --states CA TX NY
# Run only meetings (no API calls)
python scripts/create_all_gold_tables.py --meetings-only