Skip to main content

GSA .gov Domain Integration

Overview

This document describes the integration of GSA (General Services Administration) .gov domain data into the jurisdictions_details_search table.

Data Source: https://github.com/cisagov/dotgov-data

Type of Load: ENRICHMENT/UPDATE LOAD

  • Does NOT create new jurisdiction records
  • Adds official .gov domain information to existing jurisdictions
  • Updates records with GSA-verified government contact data

Data Fields Added

ColumnTypeDescription
gov_domainsJSONBArray of all official .gov domains registered to this jurisdiction
security_contact_emailTEXTOfficial security/technical contact email from GSA registry
gsa_organization_nameTEXTOfficial organization name as registered with GSA
gsa_domain_typeTEXTGSA classification (City, County, State, Special District, etc.)
gsa_last_updatedTIMESTAMPWhen GSA data was last loaded

GSA Source Data Fields

The GSA dataset contains:

  • Domain name (e.g., "bostonma.gov")
  • Domain type (Federal, State, County, City, Township, Special District, School District)
  • Organization name (official registered name)
  • Suborganization name (departments, divisions)
  • City (jurisdiction location)
  • State (2-letter code)
  • Security contact email (technical/security contact)

Matching Strategy

City Matching

Matches by normalized city name + state code:

# Examples:
"Town of Abington, MA" -> normalized to "abington" + "MA"
"City of Boston, MA" -> normalized to "boston" + "MA"

Common prefixes removed during normalization:

  • "City of "
  • "Town of "
  • "Village of "
  • "Township of "
  • "Borough of "

County Matching

Matches by county name + state code:

# Examples:
"King County, WA" -> normalized to "king county" + "WA"
"Suffolk County, MA" -> normalized to "suffolk county" + "MA"

Coverage Statistics (6 Dev States)

Overall Summary

  • Total jurisdictions checked: 4,383
  • Matched with GSA domains: 679 (15.5%)
  • No GSA match found: 3,704 (84.5%)

By State and Type

StateTypeTotalWith GSACoverageWith Contact Email
ALCity59411519.4%75
ALCounty6700.0%0
GACity67517425.8%104
GACounty15900.0%0
INCity976858.7%54
INCounty9200.0%0
MACity2483614.5%29
MACounty1400.0%0
WACity63912419.4%94
WACounty3900.0%0
WICity80814317.7%102
WICounty7222.8%1

Key Insights

Cities: 8-26% have official .gov domains registered

  • Best coverage: Georgia (25.8%), Alabama (19.4%), Washington (19.4%)
  • Lowest coverage: Indiana (8.7%)

⚠️ Counties: Very low .gov domain registration

  • Only 2 out of 443 counties have .gov domains (0.5%)
  • Counties typically use state-run websites or no dedicated domain

Example Enriched Records

City with Multiple Domains

{
"jurisdiction_name": "Monroe",
"state_code": "WI",
"jurisdiction_type": "city",
"gov_domains": [
"cityofmonroewi.gov",
"pdmonroewi.gov",
"townofclarnowi.gov",
"townofjordanwi.gov",
"townofmonroewi.gov",
"townofsylvesterwi.gov"
],
"security_contact_email": "rjacobson@cityofmonroe.org",
"gsa_organization_name": "City of Monroe",
"gsa_domain_type": "City",
"website_url": "https://cityofmonroewi.gov"
}

City with Security Contact

{
"jurisdiction_name": "Beloit",
"state_code": "WI",
"jurisdiction_type": "city",
"gov_domains": [
"beloitwi.gov",
"newarkwi.gov",
"townofbeloitwi.gov",
"townofturtlewi.gov"
],
"security_contact_email": "adminnotification@beloitwi.gov",
"gsa_organization_name": "City of Beloit",
"gsa_domain_type": "City"
}

Scripts

Main Loading Script

File: scripts/datasources/gsa/load_gsa_domains_to_postgres.py

Usage:

# Load all states
python scripts/datasources/gsa/load_gsa_domains_to_postgres.py

# Load specific states
python scripts/datasources/gsa/load_gsa_domains_to_postgres.py --states AL,GA,MA,WA

# Dry run (preview without updating)
python scripts/datasources/gsa/load_gsa_domains_to_postgres.py --states AL,GA --dry-run

Features:

  • ✅ Downloads latest GSA data from GitHub (cached for 24 hours)
  • ✅ Normalizes jurisdiction names for matching
  • ✅ Batch updates with ON CONFLICT handling
  • ✅ Dry-run mode for testing
  • ✅ Comprehensive statistics and logging

Database Schema Updates

-- Add columns (idempotent)
ALTER TABLE jurisdictions_details_search
ADD COLUMN IF NOT EXISTS gov_domains JSONB DEFAULT '[]',
ADD COLUMN IF NOT EXISTS security_contact_email TEXT,
ADD COLUMN IF NOT EXISTS gsa_organization_name TEXT,
ADD COLUMN IF NOT EXISTS gsa_domain_type TEXT,
ADD COLUMN IF NOT EXISTS gsa_last_updated TIMESTAMP;

-- Create index for domain lookups
CREATE INDEX IF NOT EXISTS idx_jurisdiction_details_gov_domains
ON jurisdictions_details_search USING gin(gov_domains);

Use Cases

1. Find Jurisdictions with Official Government Domains

SELECT
jurisdiction_name,
state_code,
gov_domains,
website_url
FROM jurisdictions_details_search
WHERE jsonb_array_length(gov_domains) > 0
ORDER BY state_code, jurisdiction_name;

2. Get Security Contact for a Jurisdiction

SELECT
jurisdiction_name,
security_contact_email,
gsa_organization_name
FROM jurisdictions_details_search
WHERE jurisdiction_name ILIKE '%boston%'
AND state_code = 'MA';

3. Find Jurisdictions with Multiple Domains

SELECT
jurisdiction_name,
state_code,
jsonb_array_length(gov_domains) as domain_count,
gov_domains
FROM jurisdictions_details_search
WHERE jsonb_array_length(gov_domains) > 3
ORDER BY domain_count DESC;

4. Validate Website URLs Against GSA Registry

SELECT
jurisdiction_name,
website_url,
gov_domains,
CASE
WHEN website_url IS NULL THEN 'No website'
WHEN gov_domains ? REPLACE(REPLACE(website_url, 'https://', ''), 'http://', '') THEN 'GSA verified'
ELSE 'Not in GSA registry'
END as verification_status
FROM jurisdictions_details_search
WHERE jsonb_array_length(gov_domains) > 0
LIMIT 20;

Limitations

Low County Coverage

  • Only 0.5% of counties have registered .gov domains
  • Most counties use state-operated websites (e.g., county.state.gov)
  • Some counties have no dedicated web presence

City Name Variations

Some jurisdictions may not match due to:

  • Inconsistent naming (GSA: "City of X" vs Census: "X city")
  • Merged jurisdictions (one domain, multiple census places)
  • Special characters or apostrophes

Domain Ownership

  • GSA data shows registered domains, not necessarily active websites
  • Some domains may redirect to other sites
  • Multiple domains may point to the same website

Maintenance

Update Frequency

  • GSA updates the domain list continuously as new domains are registered
  • Recommend running enrichment load: Monthly or Quarterly
  • Cache is valid for 24 hours to avoid excessive downloads

Re-running the Load

The load is idempotent and safe to re-run:

# Updates existing records with latest GSA data
python scripts/datasources/gsa/load_gsa_domains_to_postgres.py --states AL,GA,IN,MA,WA,WI

Monitoring

Check for stale data:

SELECT
state_code,
COUNT(*) as jurisdictions_with_gsa,
MAX(gsa_last_updated) as most_recent_update,
MIN(gsa_last_updated) as oldest_update
FROM jurisdictions_details_search
WHERE gov_domains IS NOT NULL
GROUP BY state_code
ORDER BY state_code;

Next Steps

Potential Enhancements

  1. Expand to All 50 States

    python scripts/datasources/gsa/load_gsa_domains_to_postgres.py
    # (no --states filter = all states)
  2. Domain Validation

    • Check if domains are actually active (HTTP status)
    • Verify SSL certificates
    • Update website_url with verified primary domain
  3. County Domain Discovery

    • Scrape state government portals for county websites
    • Check for <county-name>.<state>.gov patterns
    • Alternative sources: Wikipedia, county associations
  4. Integration with YouTube Discovery

    • Cross-reference .gov domains with YouTube channel URLs
    • Identify official government channels
    • Flag non-governmental channels

References