Atlas Plan

Workflow

Workflow

End-to-end data flow from source xlsx files to outputs.


Overview


Pipeline Stages

1. Extract (xlsx → CSV)

Reads xlsx source files and writes canonical CSV snapshots with normalized column headers.

Input@source/raw/*.xlsx
Output@source/clean/*.csv
Config@source/config/{entity}-{year}.yaml
Commandpnpm run extract -- --entity IONS --year 2026

Column mappings are defined per-year in config files, handling schema drift across years. CSVs are frozen — not overwritten unless --overwrite is passed.

See @packages/pipeline/AGENTS.md for CLI details.


2. Load (CSV → DuckDB)

Loads CSV snapshots into DuckDB raw tables without transformation.

Input@source/clean/*.csv
Outputatlas.dbraw_transactions, raw_students, raw_targets, raw_organizations
Commandpnpm run sync -- --entity IONS --year 2026

Optional single-month load:

pnpm run sync -- --entity IONS --year 2026 --month 2

3. Seed (Registry → LibSQL)

Seeds LibSQL lookup tables with reference data from registry.md.

Outputatlas-ops.db — groups, entities, units, channels, payment methods, etc.
Commandpnpm run seed -- --entity IONS

Idempotent — uses stable ULIDs and onConflictDoNothing. Safe to re-run.

See @packages/db/AGENTS.md for schema details.


4. Transform (dbt)

Runs dbt models to transform raw tables through staging → intermediate → mart layers.

InputDuckDB raw tables
OutputDuckDB mart tables
Commandcd @python/analytics && uv run dbt run
LayerMaterializationPurpose
Staging (stg_*)viewClean columns, deduplicate
Intermediate (int_*)viewJoins, business logic, validation
Mart (mart_*)tablePre-aggregated, report-ready

See analytics.md for mart schemas and column definitions.


5. Validate

Runs SQL checks against raw tables and outputs a JSON report.

InputDuckDB raw tables
Outputoutput/validation/{entity}-{year}-validation.json
Commandpnpm run validate -- --entity IONS --year 2026

Catches data quality issues before dbt transforms. Run after Load, before Transform.


6. Format (Marts → Report JSON)

Reads mart tables and assembles a structured report document.

InputDuckDB mart tables
Outputoutput/monthly/{YYYY-MM}-report.json
Commandpnpm run format -- --entity IONS --period 2026-02

Report sections:

  • Revenue Comparison — actuals vs target, prior period, prior year, all-time best
  • Key Comparison — order counts by customer type (New/Renewal/Alumni)
  • Program Progress — per-product enrollment counts
  • School Progress — organization × year matrix
  • Channel Breakdown — leads/follow-ups/closings by channel

See glossary.md § Report Sections for term definitions.


Output Generation

PPTX + PDF (@services/present)

pnpm run present -- --entity IONS --period 2026-02

Reads report.json and generates slides using pptxgenjs. Optionally converts to PDF via LibreOffice CLI.

Inputoutput/monthly/{YYYY-MM}-report.json
Outputoutput/monthly/{YYYY-MM}-{ENTITY}.pptx, .pdf

See @services/present/AGENTS.md for slide structure.


Slidev Decks (@services/slides)

pnpm run slides -- dev --entity IONS --period 2026-02 --unit WLC
pnpm run slides -- export --entity IONS --period 2026-02 --unit WLC

Data-driven Slidev decks with Vue components. Exports to PDF, PPTX, or static web. Built artifacts upload to R2 and are served via Worker.

See @services/slides/AGENTS.md for CLI and Worker routes.


Live Dashboard (@services/dashboard)

pnpm run --filter @services/dashboard dev

TanStack Start (React SSR) app with real-time views. Server functions call @services/api, which queries Turso/LibSQL.

RouteSection
/Overview
/revenueRevenue Comparison
/enrollmentsEnrollment Breakdown
/programsProgram Progress
/schoolsSchool Progress
/channelsChannel Breakdown
/records/peoplePeople records list
/records/organizationsOrganization records list

See @services/dashboard/AGENTS.md for architecture.


Runtime Architecture

  • Dashboard calls API Worker for all runtime data (no direct DB access)
  • API Worker queries Turso/LibSQL (remote LibSQL)
  • Slides Worker serves report artifacts from R2
  • All services share cookie-based auth (AUTH_* env vars)

See architecture.md § Runtime Boundary for details.


Quick Reference

Full pipeline for a monthly report:

# 1. Seed lookups (first time only)
pnpm run seed -- --entity IONS

# 2. Sync source data into DuckDB marts
pnpm run sync -- --entity IONS --year 2026

# 3. Publish monthly period to LibSQL
pnpm run publish -- --entity IONS --year 2026 --month 2

# 4. Generate report artifacts by period type
pnpm run report -- --entity IONS --year 2026 --month 2 --type monthly
pnpm run report -- --entity IONS --year 2026 --month 1-3 --type quarterly
pnpm run report -- --entity IONS --year 2026 --type yearly

Stage-level debug commands still exist when needed:

pnpm run extract -- --entity IONS --year 2026
pnpm run load -- --entity IONS --year 2026 --month 2 --mode append
pnpm run validate -- --entity IONS --year 2026
pnpm run transform -- --entity IONS --select mart_revenue
pnpm run format -- --entity IONS --period 2026-02
pnpm run present -- --report output/monthly/IONS-2026-02.json

See quickstart.md for prerequisites and setup.

On this page