Open Navigator dbt Project
Transforms bronze AI extractions into production-ready search tables.
๐ฏ Purposeโ
This dbt project handles:
- Bronze โ Production transformations (AI extracted data)
- Data quality testing
- Incremental processing (only new records)
- Entity deduplication
- Documentation generation
๐ Project Structureโ
dbt_project/
โโโ dbt_project.yml # Project configuration
โโโ profiles.yml.example # Database connection template
โโโ models/
โ โโโ staging/ # Clean bronze data
โ โ โโโ _staging.yml
โ โ โโโ stg_bronze_contacts.sql
โ โ โโโ stg_bronze_organizations_meetings.sql
โ โ โโโ stg_bronze_bills.sql
โ โโโ intermediate/ # Deduplicate
โ โ โโโ _intermediate.yml
โ โ โโโ int_contacts_deduped.sql
โ โโโ marts/ # Production tables
โ โโโ _marts.yml
โ โโโ contact_ai.sql
โโโ macros/ # Reusable SQL functions
โ โโโ calculate_confidence.sql
โ โโโ normalize_bill_number.sql
โ โโโ normalize_name.sql
โโโ tests/ # Custom data quality tests
โโโ README.md # This file
๐ Quick Startโ
1. Install dbtโ
# Install dbt-postgres
pip install dbt-postgres
# Verify installation
dbt --version
2. Configure Database Connectionโ
# Copy example profiles
cp profiles.yml.example ~/.dbt/profiles.yml
# Edit with your database credentials
nano ~/.dbt/profiles.yml
Or set environment variables:
export POSTGRES_PASSWORD=your_password
export NEON_HOST=your-neon-host.neon.tech
export NEON_USER=your_user
export NEON_PASSWORD=your_password
3. Test Connectionโ
# Check dbt can connect
dbt debug
# Should show:
# โ Connection test: [OK connection ok]
4. Run Modelsโ
# Run all models
dbt run
# Run specific model
dbt run --select stg_bronze_contacts
# Run with full refresh (rebuild everything)
dbt run --full-refresh
# Run tests
dbt test
# Generate documentation
dbt docs generate
dbt docs serve # Opens in browser
๐ Model Layersโ
Staging (models/staging/)โ
Purpose: Clean and normalize bronze data
stg_bronze_contacts.sql- Clean contact names, filter invalid recordsstg_bronze_organizations_meetings.sql- Normalize org names, clean EINsstg_bronze_bills.sql- Standardize bill numbers
Materialization: view (no storage, computed on-the-fly)
Intermediate (models/intermediate/)โ
Purpose: Deduplicate and prepare for production
int_contacts_deduped.sql- One record per person per org
Materialization: table (stored, fast to query)
Marts (models/marts/)โ
Purpose: Production-ready tables for API
contact_ai.sql- AI-extracted contacts (incremental)
Materialization: incremental (only processes new records)
๐งช Testingโ
Run Testsโ
# Run all tests
dbt test
# Run tests for specific model
dbt test --select contact_ai
# Run specific test type
dbt test --select test_type:unique
dbt test --select test_type:not_null
Available Testsโ
-
Schema tests (in
.ymlfiles)unique- No duplicatesnot_null- No NULL valuesaccepted_values- Value in allowed listrelationships- Foreign key exists
-
Custom tests (in
tests/folder)- Custom SQL assertions
๐ Incremental Processingโ
Models marked materialized='incremental' only process new records:
{% if is_incremental() %}
WHERE extracted_at > (SELECT MAX(last_updated) FROM {{ this }})
{% endif %}
Full Refreshโ
To rebuild everything from scratch:
dbt run --full-refresh --select contact_ai
๐จ Macrosโ
Reusable SQL functions in macros/:
calculate_confidence(datasource)โ
SELECT {{ calculate_confidence('datasource') }} as score
-- Returns 1.0 for authoritative, 0.60 for AI extraction
normalize_bill_number(column)โ
SELECT {{ normalize_bill_number('official_number') }} as bill_num
-- 'HB 123' โ 'HB123'
normalize_name(column)โ
SELECT {{ normalize_name('full_name') }} as name_clean
-- Lowercase, trim, remove special chars
๐ Workflow Integrationโ
Combined with Python ETLโ
#!/bin/bash
# Full ETL pipeline
# 1. Python: Load bronze data
python scripts/datasources/gemini/load_meeting_transcripts_bronze.py
# 2. dbt: Transform to production
cd dbt_project
dbt run --select staging+
dbt run --select intermediate+
dbt run --select marts+
dbt test
# 3. Python: Export to parquet (if needed)
cd ..
python scripts/data/export_to_gold_parquet.py
๐ Troubleshootingโ
"relation does not exist"โ
Problem: Source table not found
Solution: Check you're connected to the right database
dbt debug
# Look at "target" database
"Compilation Error: macro 'dbt_utils' is not defined"โ
Problem: Missing dbt packages
Solution: Install packages
# Create packages.yml
cat > packages.yml << EOF
packages:
- package: dbt-labs/dbt_utils
version: 1.1.1
EOF
# Install
dbt deps
"Incremental model not updating"โ
Problem: New records not being processed
Solution: Check timestamp logic
# Full refresh to rebuild
dbt run --full-refresh --select contact_ai
๐ Resourcesโ
๐ Related Documentationโ
- dbt ETL Strategy - Full architecture guide
- Bronze to Production Merge - Merge strategy
- Data Sources - All data sources
โญ๏ธ Next Stepsโ
- Install packages:
dbt deps - Run models:
dbt run - Run tests:
dbt test - Generate docs:
dbt docs generate && dbt docs serve - Iterate: Add more models incrementally