Atlas Plan
Plans008 2026 02 21 Package Restructure

Package Restructure

Overview

Reorganize the monorepo package layout to better reflect language boundaries and pipeline concerns. Merge @packages/sync + @packages/format into a single @packages/pipeline TypeScript package. Rename @packages/transform@python/analytics under a new @python/ workspace prefix. Restructure @source/ into @source/raw/, @source/clean/, @source/config/. Add Vitest as the test runner in @packages/pipeline. The result is a coherent two-language split: all TypeScript pipeline logic in one package, all Python/dbt analytics in one package.

This plan is purely structural — no new logic, no new capabilities. Plan 009 (Data Quality Pipeline) depends on this restructure being complete.

Goals

  • Merge @packages/sync + @packages/format into @packages/pipeline with unified CLI using sub-commands
  • Move @packages/transform@python/analytics with @python/ as a new pnpm workspace prefix
  • Restructure @source/ into raw/, clean/, config/ subdirectories; update all YAML config paths
  • Add Vitest to @packages/pipeline with a passing scaffold test
  • Update all references: root package.json scripts, pnpm-workspace.yaml, turbo.json, AGENTS.md
  • pnpm build && pnpm test:type && pnpm test passes with the new layout

Non-Goals

  • Implementing the extract step (xlsx → CSV) — that is Plan 009
  • Implementing the validate step — that is Plan 009
  • Adding dbt is_valid flagging — that is Plan 009
  • Writing audit documentation — that is Plan 009
  • Changing any transform SQL logic
  • Changing any format report logic
  • Moving the @packages/db package

Phases

  • Phase 1: @source/ restructure — create subdirectories, move xlsx files, update YAML configs
  • Phase 2: @python/ workspace — add prefix to pnpm-workspace.yaml, move and rename @packages/transform
  • Phase 3: @packages/pipeline — scaffold package, migrate @packages/sync source, migrate @packages/format source, wire unified CLI
  • Phase 4: Vitest — add Vitest dependency and scaffold tests
  • Phase 5: Wiring — update root scripts, turbo config, AGENTS.md, delete old packages; verify build

Success

  • @source/raw/ contains all xlsx files; @source/clean/ and @source/config/ exist
  • All @source/config/*.yaml paths updated from @source/LAPORAN...@source/raw/LAPORAN...
  • @python/analytics/ exists with all dbt files; pyproject.toml name is atlas-analytics
  • pnpm-workspace.yaml packages includes @python/**
  • @packages/sync and @packages/format directories removed
  • @packages/pipeline/ contains merged source from both, organized into extract/, load/, validate/, format/ subdirs under @source/
  • @packages/pipeline CLI dispatches on sub-command: extract, load, validate, format, seed, run
  • Vitest installed; pnpm --filter @packages/pipeline test passes with ≥1 test
  • Root scripts updated: extract, sync (→ load), validate, format, pipeline shortcuts
  • pnpm build && pnpm test:type && pnpm lint passes across all packages

Requirements

  • Plans 001–007 completed (existing packages in place)
  • pnpm available at monorepo root
  • No running processes holding atlas.db open during restructure
  • @python/ is a new directory at the monorepo root (sibling to @packages/, @services/, @core/)

Context

Why This Approach

  • Extract + load + validate share the same config loader, DuckDB connection, and entity/year/month CLI arguments — one package, one CLI is the natural fit
  • Format reads DuckDB marts produced by dbt; it runs in the same TypeScript runtime and shares the same duck.ts patterns — merging avoids duplication
  • @python/ prefix makes the language boundary explicit at the workspace level — any developer (or AI agent) immediately knows @python/* means Python/uv tooling only
  • Renaming transformanalytics generalizes the package for future Python work (notebooks, validation scripts) alongside dbt without needing another package
  • Two language packages (one TS, one Python) matches how the pipeline actually works: TS extracts/loads/formats, Python transforms

Key Constraints

  • dbt requires dbt_project.yml, profiles.yml, models/, macros/ at the root of the dbt project directory — @python/analytics/ must be flat (dbt files at root), not a nested sub-directory
  • profiles.yml currently points to ../../atlas.db relative to @packages/transform/ — after moving to @python/analytics/, the relative path changes to ../../atlas.db still (same depth), so no change needed
  • @packages/db is referenced by @packages/sync as workspace:*@packages/pipeline must preserve this dependency
  • Root package.json sync:all script chains multiple pnpm run sync calls — must be updated to use the new pnpm run sync (which maps to pipeline load) command
  • SheetJS (xlsx) is already in root package.json dependencies catalog (util) — @packages/pipeline can reference it via catalog:util
  • Turbo's task graph (dependsOn: ["^build"]) must still resolve correctly after package moves

Edge Cases

  • @packages/format exports types from @source/index.ts (used by @services/present and @services/dashboard) — merged package must preserve these exports or update consumers
  • @packages/sync has @packages/db as a workspace dependency — this must carry over to @packages/pipeline
  • The @source/config/*.yaml path: fields are project-root-relative (@source/LAPORAN...) — after moving xlsx to @source/raw/, these must become @source/raw/LAPORAN...; the loadConfig() normalizer resolves paths from project root so this is a YAML-only change
  • @services/dashboard and @services/present may import from @packages/format — check and update imports before deleting @packages/format
  • The sync:all root script hardcodes --source xlsx — after restructure this becomes --source csv (Plan 009 concern, not this plan) — leave --source xlsx working for now as load still reads xlsx directly until Plan 009

Tradeoffs

  • Merging sync + format into one package increases the package's dependency footprint (both DuckDB connection patterns + SheetJS + pptxgenjs-adjacent concerns) — acceptable because they're runtime-isolated (load runs first, format runs after dbt, never concurrent)
  • Keeping @packages/db separate (not merging into pipeline) — Drizzle schema and LibSQL are a distinct operational concern; the pipeline only seeds it, not owns it
  • Flat @python/analytics/ (dbt at root) vs nested @python/analytics/transform/ — flat chosen because dbt tooling assumes project root; nesting would require --project-dir flags everywhere

Skills

  • plan — plan file format and conventions
  • turborepo — workspace config, task pipeline, package naming conventions
  • vitest — test runner setup and scaffold

Boundaries

  • Always: Update Progress.md after each task completion
  • Always: Verify pnpm test:type passes before marking any TypeScript task complete
  • Always: Preserve all existing dbt SQL logic unchanged — this plan is structural only
  • Always: Preserve all existing format report logic unchanged
  • Ask first: Any dependency addition not already in pnpm-workspace.yaml catalogs
  • Ask first: Any change to @packages/db schema or seed logic
  • Never: Delete old package directories before verifying the new package builds and tests pass
  • Never: Change dbt model SQL, macros, or profiles.yml logic
  • Never: Change format section query logic (revenue.ts, orders.ts, marketing.ts, schools.ts)

Questions

  • Should @packages/format exports be preserved for @services/dashboard and @services/present? → Check during T-010 (consumer audit); update imports in consumers if needed
  • profiles.yml path after move? → ../../atlas.db is unchanged (same directory depth)
  • @source/clean/ — empty directory committed to git? → Yes, with a .gitkeep; Plan 009 will populate it
  • CLI sub-command separator — pnpm run pipeline -- load --entity IONS? → Yes, one -- at pnpm boundary; sub-command as first positional arg in script

On this page