Workflow
Workflow
End-to-end data flow from source xlsx files to outputs.
Overview
Pipeline Stages
1. Extract (xlsx → CSV)
Reads xlsx source files and writes canonical CSV snapshots with normalized column headers.
| Input | @source/raw/*.xlsx |
| Output | @source/clean/*.csv |
| Config | @source/config/{entity}-{year}.yaml |
| Command | pnpm run extract -- --entity IONS --year 2026 |
Column mappings are defined per-year in config files, handling schema drift across years. CSVs are frozen — not overwritten unless --overwrite is passed.
See @packages/pipeline/AGENTS.md for CLI details.
2. Load (CSV → DuckDB)
Loads CSV snapshots into DuckDB raw tables without transformation.
| Input | @source/clean/*.csv |
| Output | atlas.db — raw_transactions, raw_students, raw_targets, raw_organizations |
| Command | pnpm run sync -- --entity IONS --year 2026 |
Optional single-month load:
pnpm run sync -- --entity IONS --year 2026 --month 23. Seed (Registry → LibSQL)
Seeds LibSQL lookup tables with reference data from registry.md.
| Output | atlas-ops.db — groups, entities, units, channels, payment methods, etc. |
| Command | pnpm run seed -- --entity IONS |
Idempotent — uses stable ULIDs and onConflictDoNothing. Safe to re-run.
See @packages/db/AGENTS.md for schema details.
4. Transform (dbt)
Runs dbt models to transform raw tables through staging → intermediate → mart layers.
| Input | DuckDB raw tables |
| Output | DuckDB mart tables |
| Command | cd @python/analytics && uv run dbt run |
| Layer | Materialization | Purpose |
|---|---|---|
Staging (stg_*) | view | Clean columns, deduplicate |
Intermediate (int_*) | view | Joins, business logic, validation |
Mart (mart_*) | table | Pre-aggregated, report-ready |
See analytics.md for mart schemas and column definitions.
5. Validate
Runs SQL checks against raw tables and outputs a JSON report.
| Input | DuckDB raw tables |
| Output | output/validation/{entity}-{year}-validation.json |
| Command | pnpm run validate -- --entity IONS --year 2026 |
Catches data quality issues before dbt transforms. Run after Load, before Transform.
6. Format (Marts → Report JSON)
Reads mart tables and assembles a structured report document.
| Input | DuckDB mart tables |
| Output | output/monthly/{YYYY-MM}-report.json |
| Command | pnpm run format -- --entity IONS --period 2026-02 |
Report sections:
- Revenue Comparison — actuals vs target, prior period, prior year, all-time best
- Key Comparison — order counts by customer type (New/Renewal/Alumni)
- Program Progress — per-product enrollment counts
- School Progress — organization × year matrix
- Channel Breakdown — leads/follow-ups/closings by channel
See glossary.md § Report Sections for term definitions.
Output Generation
PPTX + PDF (@services/present)
pnpm run present -- --entity IONS --period 2026-02Reads report.json and generates slides using pptxgenjs. Optionally converts to PDF via LibreOffice CLI.
| Input | output/monthly/{YYYY-MM}-report.json |
| Output | output/monthly/{YYYY-MM}-{ENTITY}.pptx, .pdf |
See @services/present/AGENTS.md for slide structure.
Slidev Decks (@services/slides)
pnpm run slides -- dev --entity IONS --period 2026-02 --unit WLC
pnpm run slides -- export --entity IONS --period 2026-02 --unit WLCData-driven Slidev decks with Vue components. Exports to PDF, PPTX, or static web. Built artifacts upload to R2 and are served via Worker.
See @services/slides/AGENTS.md for CLI and Worker routes.
Live Dashboard (@services/dashboard)
pnpm run --filter @services/dashboard devTanStack Start (React SSR) app with real-time views. Server functions call @services/api, which queries Turso/LibSQL.
| Route | Section |
|---|---|
/ | Overview |
/revenue | Revenue Comparison |
/enrollments | Enrollment Breakdown |
/programs | Program Progress |
/schools | School Progress |
/channels | Channel Breakdown |
/records/people | People records list |
/records/organizations | Organization records list |
See @services/dashboard/AGENTS.md for architecture.
Runtime Architecture
- Dashboard calls API Worker for all runtime data (no direct DB access)
- API Worker queries Turso/LibSQL (remote LibSQL)
- Slides Worker serves report artifacts from R2
- All services share cookie-based auth (
AUTH_*env vars)
See architecture.md § Runtime Boundary for details.
Quick Reference
Full pipeline for a monthly report:
# 1. Seed lookups (first time only)
pnpm run seed -- --entity IONS
# 2. Sync source data into DuckDB marts
pnpm run sync -- --entity IONS --year 2026
# 3. Publish monthly period to LibSQL
pnpm run publish -- --entity IONS --year 2026 --month 2
# 4. Generate report artifacts by period type
pnpm run report -- --entity IONS --year 2026 --month 2 --type monthly
pnpm run report -- --entity IONS --year 2026 --month 1-3 --type quarterly
pnpm run report -- --entity IONS --year 2026 --type yearlyStage-level debug commands still exist when needed:
pnpm run extract -- --entity IONS --year 2026
pnpm run load -- --entity IONS --year 2026 --month 2 --mode append
pnpm run validate -- --entity IONS --year 2026
pnpm run transform -- --entity IONS --select mart_revenue
pnpm run format -- --entity IONS --period 2026-02
pnpm run present -- --report output/monthly/IONS-2026-02.jsonSee quickstart.md for prerequisites and setup.