Cleanup Roadmap β Manager's Memory
This is the Manager surface for the monorepo refactor: the high-level map of what's done, what's next, and how to route work. It pairs with two other persistent stores β keep all three in sync, don't duplicate:
CLAUDE.md(repo root) β the standing rules every session/sub-agent must obey.- Claude Code memory (
.claude/.../memory/) β cross-session facts; the canonical, detailed refactor recipe + progress lives inproject_core_lib_refactor.md. - This file β the living backlog + routing table for the cleanup.
Goalβ
Eliminate the top-level scripts/ tree. Move everything into packages/ refactored
correctly as libraries (not just relocated). Any leftover scripts/ module is a
porting candidate, not a permanent home.
Sub-agent routing (the specialists)β
The Manager (interactive Claude) holds this roadmap and routes scoped work to isolated
specialists in .claude/agents/. Each runs in a clean, narrow context and returns only
a summary β no raw file dumps bleed back.
| Specialist | Owns | Spin up for |
|---|---|---|
python-packages-specialist | the packages/ uv workspace (accessibility, agents, core, core-lib, datamodels, ingestion, llm, scrapers) | "where should this Python live", porting scripts/ β packages/, library refactors. Enforces prefer packages, never add to scripts/ |
data-dbt-specialist | dbt_project/, scripts/datasources/*, scripts/enrichment*, scripts/discovery/ | dbt models, SQL/JSONB transformation logic, data semantics |
api-specialist | api/ (app, routes, models, auth, errors, batch_jobs) | FastAPI routes, Pydantic schemas, API DB access, OTel |
frontend-specialist | frontend/src/, website/ | React/TS components, hooks, API client, Tailwind, Docusaurus |
Overlap note: Python library structure / where code lives β
python-packages-specialist; dbt/SQL transformation semantics βdata-dbt-specialist. A datasource port touches both β lead with python-packages for the move, pull in data-dbt for the SQL details.
Route by file scope. A task that crosses layers gets split: the specialist flags out-of-scope work in its summary and hands it back to the Manager to re-route.
The established port recipe (summary β full version in memory)β
- Branch off
main:feat/datasource-<source>-port. - Two commits, never one: first a pure
git mv legacy_loader.py <name>_pipeline.py(sogit blame --followsurvives), then a second commit that refactors contents. - Per script: define
<Name>Row(RawRow)pydantic schema (Field max_length = bronze widths); implement<Source>Pipeline(DataSourcePipeline[<Name>Row])withextract()(async stream of validated dicts) +load_batch()(parameterizedtext()UPSERT, JSONB viaCAST(:col AS jsonb)); replace psycopg2 / hardcodedDATABASE_URLwithcore_lib.db(async_session,get_async_engine). Preserve pure helpers and UPSERT ON CONFLICT semantics verbatim. Keep--file/--limit/--truncate. - Unit tests in
tests/test_<source>_<name>_pipeline.py(helpers, schema +/-, metadata, synthetic-file extract, error paths). - After porting, grep the whole repo for the OLD import path β exporters/QA/frontend-prep scripts silently still import it. Update or list in the PR body.
- New workspace member β
uv sync(or.venv/bin/pip install -e packages/<new>). - Triage before porting: many "still references old table" scripts are dead/superseded
by dbt β grep usages + check for a dbt replacement before assuming a port; archive
dead ones to
archive/datasources/<source>/viagit mv.
Status (as of 2026-05-30)β
Done / merged to main: core-lib framework + 6 ports (census/states, fec/contributions,
gsa/domains, hifld/locations, dot/events, uscm/mayors). 16 branches consolidated β just
main. packages/llm extracted (gemini + enrichment subpackages). Migration-048 cleanup
swept refs to the dropped public.jurisdiction table (now public.civic_jurisdiction).
scripts/colab/ eliminated β packages/llm/src/llm/governance/ (2026-05-30): 24 live
modules + notebook + README + mount_drive.sh + 2 CLIs moved via git mv (blame preserved);
flat Colab imports rewritten to package-relative (from .x import β¦); dead colab_public_data.py
colab_notebook_ui.py(+ its test) deleted. Notebook bootstrap now addspackages/llm/srctosys.pathand importsllm.governance.*; CLIs run viapython -m llm.governance.<cli>. Tests (test_colab_bootstrap,test_colab_runtime_phases,test_meeting_consolidated_summary,test_pipeline_media_scope) repointed. Residual cross-dep: governance modules still importscripts.utils.gdrive_paths(shared withscripts/discovery/*) β port that util next so the package stops reaching intoscripts/.
In flight: feat/llm-enrichment-extraction (current branch) β enrichment subpackage port.
Backlog (prioritized):
- Small/clean ports: nccs, naco, ballotpedia (measures), nces.
- Medium (multi-loader): census (acs, municipalitiesβ¦), parcels, jurisdictions, openstates.
- Complex (need scoping, don't fit DataSourcePipeline cleanly): irs/load_irs_bmf.py, ballotpedia_integration.py (1570L), google_civic (1147L), wikidata (18 files), youtube (29 files).
- HTTP downloaders (BaseAsyncClient migration, not DataSourcePipeline): download_gsa_domains, download_hifld, download_state_dot_public_pages, load_fec_bulk.
- Skip (not pipelines): one-off SQL fixes, demos, helper modules, READMEs.
Remaining scripts/ subdirs still to triage: data, database, datasources,
deployment, discovery, eboard, enrichment, enrichment_ai, examples, frontend, huggingface,
jurisdictions, localview, maintenance, mcp, media, migrations, scraping, utils, wikicommons,
wikimedia. (colab β
done β packages/llm/src/llm/governance/.)
Context hygiene (native, not hand-rolled)β
Claude Code handles compaction and tool-result lifecycle automatically β don't build a message-pruning wrapper. When a unit of work finishes: record durable facts in memory, update this roadmap's Status, and start a fresh session for the next module.