DBT Models for Stats Aggregates with Trending Causes
Overview
The dbt project includes models to load jurisdiction_state_aggregate table with location-specific trending causes based on decisions from the last 90 days.
✨ NEW: The frontend now displays trending causes dynamically based on the selected geography (national, state, county, or city level).
Quick Start
# Update trending causes data (runs all required dbt models)
./scripts/data/update_trending_causes.sh
# Or manually:
cd dbt_project
dbt run --select stg_bronze_decisions int_trending_causes_by_jurisdiction jurisdiction_state_aggregate
How It Works
- User selects a location in the frontend (e.g., "Mobile, AL")
- Frontend queries
/api/stats?state=AL&city=Mobile - API returns stats with
trending_causesJSONB field - Frontend displays the top trending causes for that location in the last 90 days
- Fallback: If no location-specific causes exist, shows global trending causes from
/api/trending
Models Created
1. Staging Layer
stg_bronze_decisions.sql
- Cleans and normalizes bronze_decisions data
- Adds
is_recentflag for decisions in last 90 days - Calculates
days_since_decisionfor trending analysis - Filters out decisions without dates
2. Intermediate Layer
int_trending_causes_by_jurisdiction.sql
- Aggregates decisions by cause (NTEE major group) and jurisdiction
- Ranks causes by decision count and recency
- Includes top 10 trending causes per jurisdiction
- Generates JSON structure with:
- Cause category and code
- Decision count and unique topics
- Most recent decision date
- Sample headlines (up to 3)
3. Marts Layer
jurisdiction_state_aggregate.sql
- Builds the final jurisdiction_state_aggregate table
- Supports multiple levels: national, state, county, city, jurisdiction
- Includes
trending_causesas JSONB column - Joins trending causes data from intermediate model
Schema Changes
Added trending_causes JSONB column to jurisdiction_state_aggregate table:
ALTER TABLE jurisdiction_state_aggregate ADD COLUMN IF NOT EXISTS trending_causes JSONB;
Trending Causes JSON Structure
The trending_causes JSONB column contains different structures depending on the aggregation level:
City Level (Jurisdiction-Specific)
[
{
"cause": "Education and Workforce",
"code": "COFOG-09",
"decision_count": 5,
"topics": 3,
"most_recent": "2024-05-22",
"rank": 1,
"sample_headlines": [
"MPS highlights literacy strategies...",
"Board approves new curriculum...",
"Teacher hiring approved..."
]
},
{
"cause": "Health",
"code": "COFOG-07",
"decision_count": 3,
"topics": 2,
"most_recent": "2024-05-20",
"rank": 2,
"sample_headlines": [...]
}
]
State Level (Aggregated Across State)
[
{
"cause": "Education and Workforce",
"decision_count": 127,
"jurisdictions": 15
},
{
"cause": "Health",
"decision_count": 89,
"jurisdictions": 12
}
]
National Level (Aggregated Across Nation)
[
{
"cause": "Education and Workforce",
"decision_count": 1543,
"states": 42
},
{
"cause": "Infrastructure",
"decision_count": 1201,
"states": 38
}
]
Usage
Running the Models
# Quick update (recommended)
./scripts/data/update_trending_causes.sh
# Or step-by-step:
cd dbt_project
# Install dependencies
dbt deps
# Run staging and intermediate models
dbt run --select stg_bronze_decisions int_trending_causes_by_jurisdiction
# Run marts layer (jurisdiction_state_aggregate)
dbt run --select jurisdiction_state_aggregate
# Run all models
dbt run
# Test data quality
dbt test
Verifying the Data
After running the models, verify trending causes are populated:
-- Check city-level trending causes
SELECT
city,
state_code,
jsonb_array_length(trending_causes) as cause_count,
trending_causes
FROM jurisdiction_state_aggregate
WHERE level = 'city'
AND trending_causes IS NOT NULL
AND city ILIKE '%Mobile%'
LIMIT 1;
-- See top causes for a state
SELECT
state_code,
jsonb_pretty(trending_causes) as causes
FROM jurisdiction_state_aggregate
WHERE level = 'state'
AND state_code = 'AL';
-- National trending causes
SELECT jsonb_pretty(trending_causes)
FROM jurisdiction_state_aggregate
WHERE level = 'national';
Testing in the Frontend
-
Start the application:
./start-all.sh -
Search for a location (e.g., "Mobile, AL")
-
Observe the trending topics bar at the top - it should show location-specific causes
-
Switch to different locations and see the trending causes update dynamically
Integration with Python Scripts
The existing Python migration scripts in packages/hosting/src/hosting/neon/ can now:
- Use dbt to generate jurisdiction_state_aggregate
- OR continue using Python aggregation
- Merge both approaches (Python for counts, dbt for trending causes)
Recommended Workflow
# In migrate.py or update_stats.py
import subprocess
# Run dbt models first to calculate trending causes
subprocess.run(['dbt', 'run', '--select', 'jurisdiction_state_aggregate'],
cwd='/path/to/dbt_project')
# Then update counts using Python (jurisdictions, nonprofits, etc.)
# The trending_causes column will be preserved
Dependencies
Bronze Tables Required
bronze_decisions- Policy decisions with dates and themesbronze_events- Meeting events with jurisdiction info
Source Configuration
Sources are defined in models/staging/_staging.yml:
- Database:
open_navigator - Schema:
bronze
Data Quality Tests
The models include data quality tests:
# stg_bronze_decisions
- decision_date: not_null
- bronze_decision_id: unique, not_null
# int_trending_causes_by_jurisdiction
- state_code: not_null
- jurisdiction_name: not_null
- cause_category: not_null
- decision_count: not_null
# jurisdiction_state_aggregate
- level: not_null, accepted_values
- last_updated: not_null
Run tests with:
dbt test
Maintenance
Incremental Updates
The models currently use full refresh. For incremental updates:
- Change materialization to
incremental - Add
is_incremental()logic - Filter by
extracted_at > max(last_updated)
{% if is_incremental() %}
WHERE extracted_at > (SELECT MAX(last_updated) FROM {{ this }})
{% endif %}
Refreshing Trending Causes
Trending causes should be refreshed daily:
# Cron job example
0 2 * * * cd /path/to/dbt_project && dbt run --select jurisdiction_state_aggregate
Next Steps
- Populate counts: Update Python scripts or create dbt models to load actual jurisdiction/nonprofit counts
- Add indexes: Create GIN index on
trending_causesJSONB column for faster queries - API integration: Update
/api/statsendpoint to returntrending_causes - Frontend: Display trending causes in dashboard/stats pages