Plans008 2026 02 21 Package Restructure
Package Restructure
Overview
Reorganize the monorepo package layout to better reflect language boundaries and pipeline concerns. Merge @packages/sync + @packages/format into a single @packages/pipeline TypeScript package. Rename @packages/transform → @python/analytics under a new @python/ workspace prefix. Restructure @source/ into @source/raw/, @source/clean/, @source/config/. Add Vitest as the test runner in @packages/pipeline. The result is a coherent two-language split: all TypeScript pipeline logic in one package, all Python/dbt analytics in one package.
This plan is purely structural — no new logic, no new capabilities. Plan 009 (Data Quality Pipeline) depends on this restructure being complete.
Goals
- Merge
@packages/sync+@packages/formatinto@packages/pipelinewith unified CLI using sub-commands - Move
@packages/transform→@python/analyticswith@python/as a new pnpm workspace prefix - Restructure
@source/intoraw/,clean/,config/subdirectories; update all YAML config paths - Add Vitest to
@packages/pipelinewith a passing scaffold test - Update all references: root
package.jsonscripts,pnpm-workspace.yaml,turbo.json,AGENTS.md pnpm build && pnpm test:type && pnpm testpasses with the new layout
Non-Goals
- Implementing the extract step (xlsx → CSV) — that is Plan 009
- Implementing the validate step — that is Plan 009
- Adding dbt
is_validflagging — that is Plan 009 - Writing audit documentation — that is Plan 009
- Changing any transform SQL logic
- Changing any format report logic
- Moving the
@packages/dbpackage
Phases
- Phase 1:
@source/restructure — create subdirectories, move xlsx files, update YAML configs - Phase 2:
@python/workspace — add prefix topnpm-workspace.yaml, move and rename@packages/transform - Phase 3:
@packages/pipeline— scaffold package, migrate@packages/syncsource, migrate@packages/formatsource, wire unified CLI - Phase 4: Vitest — add Vitest dependency and scaffold tests
- Phase 5: Wiring — update root scripts, turbo config, AGENTS.md, delete old packages; verify build
Success
-
@source/raw/contains all xlsx files;@source/clean/and@source/config/exist - All
@source/config/*.yamlpaths updated from@source/LAPORAN...→@source/raw/LAPORAN... -
@python/analytics/exists with all dbt files;pyproject.tomlname isatlas-analytics -
pnpm-workspace.yamlpackages includes@python/** -
@packages/syncand@packages/formatdirectories removed -
@packages/pipeline/contains merged source from both, organized intoextract/,load/,validate/,format/subdirs under@source/ -
@packages/pipelineCLI dispatches on sub-command:extract,load,validate,format,seed,run - Vitest installed;
pnpm --filter @packages/pipeline testpasses with ≥1 test - Root scripts updated:
extract,sync(→ load),validate,format,pipelineshortcuts -
pnpm build && pnpm test:type && pnpm lintpasses across all packages
Requirements
- Plans 001–007 completed (existing packages in place)
pnpmavailable at monorepo root- No running processes holding
atlas.dbopen during restructure @python/is a new directory at the monorepo root (sibling to@packages/,@services/,@core/)
Context
Why This Approach
- Extract + load + validate share the same config loader, DuckDB connection, and entity/year/month CLI arguments — one package, one CLI is the natural fit
- Format reads DuckDB marts produced by dbt; it runs in the same TypeScript runtime and shares the same
duck.tspatterns — merging avoids duplication @python/prefix makes the language boundary explicit at the workspace level — any developer (or AI agent) immediately knows@python/*means Python/uv tooling only- Renaming
transform→analyticsgeneralizes the package for future Python work (notebooks, validation scripts) alongside dbt without needing another package - Two language packages (one TS, one Python) matches how the pipeline actually works: TS extracts/loads/formats, Python transforms
Key Constraints
- dbt requires
dbt_project.yml,profiles.yml,models/,macros/at the root of the dbt project directory —@python/analytics/must be flat (dbt files at root), not a nested sub-directory profiles.ymlcurrently points to../../atlas.dbrelative to@packages/transform/— after moving to@python/analytics/, the relative path changes to../../atlas.dbstill (same depth), so no change needed@packages/dbis referenced by@packages/syncasworkspace:*—@packages/pipelinemust preserve this dependency- Root
package.jsonsync:allscript chains multiplepnpm run synccalls — must be updated to use the newpnpm run sync(which maps topipeline load) command - SheetJS (
xlsx) is already in rootpackage.jsondependencies catalog (util) —@packages/pipelinecan reference it viacatalog:util - Turbo's task graph (
dependsOn: ["^build"]) must still resolve correctly after package moves
Edge Cases
@packages/formatexports types from@source/index.ts(used by@services/presentand@services/dashboard) — merged package must preserve these exports or update consumers@packages/synchas@packages/dbas a workspace dependency — this must carry over to@packages/pipeline- The
@source/config/*.yamlpath:fields are project-root-relative (@source/LAPORAN...) — after moving xlsx to@source/raw/, these must become@source/raw/LAPORAN...; theloadConfig()normalizer resolves paths from project root so this is a YAML-only change @services/dashboardand@services/presentmay import from@packages/format— check and update imports before deleting@packages/format- The
sync:allroot script hardcodes--source xlsx— after restructure this becomes--source csv(Plan 009 concern, not this plan) — leave--source xlsxworking for now as load still reads xlsx directly until Plan 009
Tradeoffs
- Merging sync + format into one package increases the package's dependency footprint (both DuckDB connection patterns + SheetJS + pptxgenjs-adjacent concerns) — acceptable because they're runtime-isolated (load runs first, format runs after dbt, never concurrent)
- Keeping
@packages/dbseparate (not merging into pipeline) — Drizzle schema and LibSQL are a distinct operational concern; the pipeline only seeds it, not owns it - Flat
@python/analytics/(dbt at root) vs nested@python/analytics/transform/— flat chosen because dbt tooling assumes project root; nesting would require--project-dirflags everywhere
Skills
plan— plan file format and conventionsturborepo— workspace config, task pipeline, package naming conventionsvitest— test runner setup and scaffold
Boundaries
- Always: Update Progress.md after each task completion
- Always: Verify
pnpm test:typepasses before marking any TypeScript task complete - Always: Preserve all existing dbt SQL logic unchanged — this plan is structural only
- Always: Preserve all existing format report logic unchanged
- Ask first: Any dependency addition not already in
pnpm-workspace.yamlcatalogs - Ask first: Any change to
@packages/dbschema or seed logic - Never: Delete old package directories before verifying the new package builds and tests pass
- Never: Change dbt model SQL, macros, or profiles.yml logic
- Never: Change format section query logic (revenue.ts, orders.ts, marketing.ts, schools.ts)
Questions
- Should
@packages/formatexports be preserved for@services/dashboardand@services/present? → Check during T-010 (consumer audit); update imports in consumers if needed -
profiles.ymlpath after move? →../../atlas.dbis unchanged (same directory depth) -
@source/clean/— empty directory committed to git? → Yes, with a.gitkeep; Plan 009 will populate it - CLI sub-command separator —
pnpm run pipeline -- load --entity IONS? → Yes, one--at pnpm boundary; sub-command as first positional arg in script