Architecture
Architecture
System architecture for Atlas. All terminology follows glossary.md. All data entities follow model.md and analytics.md.
Overview
Atlas is a local-first Modern Data Stack for multi-entity business reporting. It normalizes data from operational sources using an ELT pipeline, transforms raw data into analytical marts via dbt, and serves outputs through static presentation layers (PPTX/PDF and Slidev exports), an Evidence OSS BI service, plus a Cloudflare Worker dashboard that reads through an authenticated API backed by Turso.
Principles
| Principle | Description |
|---|---|
| Modern Data Stack | Modular, best-of-breed components for each layer. Each layer has a single responsibility and a clean interface to the next. |
| ELT over ETL | Raw source data is loaded first, untransformed. All business logic lives in the transform layer (dbt). Source data is always reprocessable. |
| Two-layer database | DuckDB for analytical workloads (aggregations, marts, reporting). LibSQL/Turso for operational workloads (individual record lookup, dashboard drill-down). |
| Declarative transforms | All business logic for reporting is expressed as SQL models in dbt. No imperative calculation code in the pipeline. |
| Adapter pattern | The Sync layer uses source adapters to decouple extraction from loading. Swapping xlsx for Odoo requires only a new adapter. |
| Schema as code | The LibSQL operational schema is defined in TypeScript via Drizzle ORM. The DuckDB analytical schema is defined in dbt models. Both are version-controlled. |
| Entity scoping | All data is scoped to a core_entity record. IONs and EPN never share data in queries. |
| Report as data | The Format layer reads dbt marts and assembles a structured report.json. The Present layer is a pure renderer — it applies no business logic. |
| Python where it makes sense | dbt-core and any data tooling that benefits from Python uses Python, managed via uv. Everything else is TypeScript with pnpm-managed tooling. |
Technology Stack
| Concern | Technology | Notes |
|---|---|---|
| Runtime | Node.js + pnpm | All TS scripts, packages, and services |
| Monorepo | Turborepo + pnpm workspaces | Task pipeline for build, lint, test |
| Language | TypeScript (primary), Python (dbt + data tooling) | Strict TS throughout; Python managed via uv |
| Python tooling | uv | Fast Python package manager; manages dbt venv |
| Analytical DB | DuckDB | In-process OLAP; native xlsx reading; hosts raw, staging, intermediate, mart layers |
| Operational DB | LibSQL (local) → Turso (remote) | SQLite-compatible; individual record lookup; dashboard drill-down |
| Operational ORM | Drizzle ORM (LibSQL dialect) | Type-safe queries; schema-as-code; migrations via Drizzle Kit |
| Transform layer | dbt-core + dbt-duckdb | Declarative SQL models; staging, intermediate, and mart layers |
| Extract + Load | SheetJS xlsx → CSV + DuckDB read_csv_auto | SheetJS extracts xlsx to canonical CSVs; DuckDB loads CSVs into raw layer |
| Odoo sync (future) | Odoo JSON-RPC via fetch | Same adapter interface as xlsx |
| Format | TypeScript + DuckDB node client | Reads marts, assembles report.json |
| Shared config | @config (TypeScript workspace package) | Entity/unit/brand themes shared across services |
| Present — PPTX | pptxgenjs | Programmatic slide generation |
| Present — PDF | LibreOffice CLI | Converts PPTX to PDF via headless LibreOffice |
| Present — Slidev | Slidev + Vue + UnoCSS + Playwright | Data-driven slides service with PDF/PPTX/web export |
| BI reporting service | Evidence OSS (@evidence-dev/evidence) | SQL + markdown site reading from both atlas.db and atlas-ops.db |
| API service | Cloudflare Workers + Drizzle (LibSQL) | Authenticated dashboard read API; Turso-first runtime data path |
| Dashboard | TanStack Start | SSR + client, file-based routing |
| Dashboard UI | shadcn/ui + Tailwind CSS | Component library |
| Dashboard data access | Server functions -> API Worker | Dashboard loaders call API routes and forward session cookies |
| Slides delivery | Cloudflare Worker + R2 | Auth-gated report paths (/reports/{entity}/{period-unit}/...) |
| Code quality | ESLint + Prettier + TypeScript strict | Shared configs from @core/lint and @core/typescript |
Repository Structure
ions/
├── @plan/ # Planning docs (not a workspace package)
│ ├── architecture.md
│ ├── analytics.md # DuckDB, dbt layers, mart definitions
│ ├── glossary.md
│ ├── model.md
│ ├── registry.md
│ ├── domains/
│ └── discussions/
│
├── @config/ # Shared entity/unit/brand definitions (framework-agnostic)
│ ├── brands/
│ │ ├── ions/
│ │ └── epn/
│ ├── entities.ts
│ ├── units.ts
│ ├── types.ts
│ └── index.ts
│
├── @core/ # Shared infra packages
│ ├── typescript/ # Shared tsconfig presets
│ ├── lint/ # Shared ESLint flat config
│ ├── ui/ # Shared shadcn/ui + Base UI component library
│ └── ai/ # AI CLI commands and sync functionality
│
├── @packages/ # Business-logic packages
│ ├── db/ # Drizzle schema + migrations (operational layer)
│ │ ├── @source/
│ │ │ ├── schema/
│ │ │ │ ├── core.ts # core_group, core_entity, core_unit
│ │ │ │ ├── db.ts # db_person, db_organization, db_account, db_relation
│ │ │ │ ├── catalog.ts # catalog_brand, catalog_item, catalog_item_variant, pricing
│ │ │ │ ├── commerce.ts # commerce_order + lookups
│ │ │ │ ├── finance.ts # finance_transaction + lookups
│ │ │ │ ├── marketing.ts # marketing_channel, marketing_agenda, marketing_activity
│ │ │ │ └── target.ts # target_item, target_metric
│ │ │ ├── client.ts # LibSQL client + Drizzle instance
│ │ │ └── index.ts
│ │ ├── drizzle.config.ts
│ │ ├── migrations/
│ │ └── package.json
│ │
│ └── pipeline/ # Unified pipeline CLI (seed/sync/publish/report + stage commands)
│ ├── @source/
│ │ ├── index.ts # CLI dispatcher (sub-commands)
│ │ ├── config.ts # @source/config loader + path normalization
│ │ ├── duck.ts # DuckDB helpers + query client
│ │ ├── seed.ts # Re-export seed from @packages/db
│ │ ├── extract/ # xlsx → canonical CSV snapshots (SheetJS)
│ │ ├── load/csv.ts # CSV → DuckDB raw loader (read_csv_auto)
│ │ ├── validate/ # SQL checks + JSON report
│ │ └── format/ # mart_* → report.json assembly
│ └── package.json
│
├── @python/ # Python workspaces
│ └── analytics/ # dbt project — analytical transform layer
│ ├── models/
│ │ ├── staging/ # stg_* — clean and rename raw columns
│ │ │ ├── stg_transactions.sql
│ │ │ ├── stg_students.sql
│ │ │ └── stg_targets.sql
│ │ ├── intermediate/ # int_* — join, classify, resolve FKs
│ │ │ ├── int_orders.sql
│ │ │ ├── int_enrollments.sql
│ │ │ └── audit_flagged_orders.sql
│ │ └── marts/ # mart_* — analysis-ready aggregations
│ │ ├── mart_revenue.sql
│ │ ├── mart_program_progress.sql
│ │ ├── mart_channel_marketing.sql
│ │ └── mart_school_progress.sql
│ ├── macros/
│ ├── tests/
│ ├── dbt_project.yml
│ ├── profiles.yml
│ └── pyproject.toml # uv project file
│
├── @services/ # Deployable apps
│ ├── present/ # PPTX + PDF generator
│ │ ├── @source/
│ │ │ ├── slides/
│ │ │ │ ├── cover.ts
│ │ │ │ ├── revenue-comparison.ts
│ │ │ │ ├── key-comparison.ts
│ │ │ │ ├── program-progress.ts
│ │ │ │ ├── school-progress.ts
│ │ │ │ └── channel-marketing.ts
│ │ │ ├── generator.ts
│ │ │ └── index.ts
│ │ └── package.json
│ │
│ ├── plan/ # Fumadocs documentation site (Cloudflare Worker)
│ │ └── ...
│ │
│ ├── api/ # Cloudflare Worker API (auth + dashboard data)
│ │ ├── src/
│ │ │ ├── worker.ts
│ │ │ ├── lib/
│ │ │ │ ├── auth.ts
│ │ │ │ └── db.ts
│ │ │ └── routes/
│ │ │ ├── auth.ts
│ │ │ └── dashboard.ts
│ │ ├── wrangler.jsonc
│ │ └── package.json
│ │
│ ├── slides/ # Slidev-based report presentation service
│ │ ├── @source/
│ │ │ ├── layouts/
│ │ │ ├── components/
│ │ │ ├── templates/
│ │ │ ├── reports/
│ │ │ └── styles/
│ │ ├── cli.ts
│ │ ├── scripts/
│ │ ├── src/ # Slides serving Worker entry + auth routes
│ │ ├── wrangler.jsonc
│ │ └── package.json
│ │
│ ├── evidence/ # Evidence OSS BI app (SQL + markdown)
│ │ ├── pages/
│ │ ├── sources/ # atlas (duckdb) + atlas_ops (sqlite)
│ │ ├── evidence.config.yaml
│ │ └── package.json
│ │
│ └── dashboard/ # TanStack Start web app
│ ├── src/
│ │ ├── routes/
│ │ │ ├── __root.tsx
│ │ │ ├── index.tsx
│ │ │ ├── _app.tsx
│ │ │ └── _app/
│ │ │ ├── revenue.tsx
│ │ │ ├── enrollments.tsx
│ │ │ ├── programs.tsx
│ │ │ ├── schools.tsx
│ │ │ ├── channels.tsx
│ │ │ ├── records.people.tsx
│ │ │ └── records.organizations.tsx
│ │ ├── components/
│ │ └── styles.css
│ ├── app.config.ts
│ ├── vite.config.ts
│ ├── tsconfig.json
│ ├── components.json
│ └── package.json
│
├── @source/ # Source data directory
│ ├── raw/ # Raw xlsx files — untouched input data
│ ├── clean/ # Canonical CSV snapshots (extracted from xlsx)
│ ├── invalid/ # Monthly QA exports from validate stage
│ └── config/ # Entity-year YAML configs (sheet → table mapping)
├── output/ # Generated reports and validation — gitignored
│ ├── monthly/
│ ├── weekly/
│ └── validation/
│
├── atlas.db # DuckDB file (analytical) — gitignored
├── atlas-ops.db # LibSQL file (operational) — gitignored
├── pnpm-lock.yaml
├── package.json
├── turbo.json
└── tsconfig.jsonELT Pipeline
Extract
Reads xlsx source files via SheetJS and writes canonical CSV snapshots to @source/clean/. Column headers are normalized to English at this stage. CSVs are frozen — they are not overwritten unless --overwrite is passed.
CLI usage:
pnpm run extract -- --entity IONS --year 2026Load
Loads canonical CSV snapshots from @source/clean/ into the DuckDB raw layer using read_csv_auto. No business logic — raw layer preserves the canonical CSV schema.
CLI usage:
pnpm run sync -- --entity IONS --year 2026
pnpm run sync -- --entity IONS --year 2026 --month 2 --mode appendValidate
Runs typed SQL checks against DuckDB raw tables and emits a JSON validation report. The same stage also writes monthly invalid QA CSVs to @source/invalid/ ({entity}-{YYYY}-{MM}-students.csv and {entity}-{YYYY}-{MM}-transactions.csv) with reason-code metadata for audit and triage before dbt transforms, plus pipeline phase comparison CSVs (@source/{entity}-students.csv, @source/{entity}-transactions.csv) with monthly Percentage (mart vs clean) for years 2023-2026.
CLI usage:
pnpm run validate -- --entity IONS --year 2026Transform (dbt)
Runs dbt models against DuckDB to produce clean, analysis-ready marts. Full layer definitions and mart schemas are in analytics.md.
CLI usage:
cd @python/analytics
uv run dbt run # run all models
uv run dbt run --select marts
uv run dbt test # run all tests
uv run dbt docs generate # generate lineage docsFormat
Reads dbt marts from DuckDB, applies period filtering and cross-mart assembly, and outputs report.json. Also returns data in-memory for direct consumption by the Present layer.
Output type:
type Report = {
meta: {
entity: string // e.g. "IONS"
period: string // e.g. "2026-02"
generated_at: string // ISO timestamp
scope: string[] // unit codes included
}
units: Record<string, UnitReport>
}CLI usage:
pnpm run format -- --entity IONS --period 2026-02
pnpm run format -- --entity IONS --period 2026-02 --unit TM,WLC_NON_ENGLISHPresent (PPTX service)
Consumes report.json and produces PPTX + PDF. No business logic — pure rendering.
CLI usage:
pnpm run present -- --report output/monthly/2026-02-report.json
pnpm run present -- --entity IONS --period 2026-02 # runs Format then PresentSlides (Slidev service)
Consumes report.json (copied or scaffolded to @services/slides/@source/reports/{entity}/{period-unit}/data.json) and renders branded Slidev decks with custom layouts/components.
CLI usage:
pnpm run slides -- dev --entity IONS --period 2026-02 --unit WLC
pnpm run slides -- export --entity IONS --period 2026-02 --unit WLC
pnpm run slides -- export --entity IONS --period 2026-02 --unit WLC --format pptx
pnpm run slides -- build --entity IONS --period 2026-02 --unit WLCDatabase Architecture
Two Layers
DuckDB (atlas.db) | LibSQL (atlas-ops.db) | |
|---|---|---|
| Role | Analytical | Operational |
| Contents | raw, staging, intermediate, mart layers | normalized tables from model.md |
| Schema defined in | dbt models (.sql files) | Drizzle ORM (TypeScript) |
| Query pattern | Aggregations, GROUP BY, window functions, PIVOT | Individual record lookup, FK joins |
| Used by | Format layer, present/slides artifact generation, Evidence analytics source | API Worker runtime reads (dashboard + auth-gated drill-down), Evidence operational source |
| Remote option | MotherDuck (future) | Turso |
Query routing in the dashboard
LibSQL environment config
# Local development
DATABASE_URL=file:atlas-ops.db
# Remote (Turso)
DATABASE_URL=libsql://<db-name>.turso.io
TURSO_AUTH_TOKEN=<token>Migrations (LibSQL / Drizzle)
pnpm exec drizzle-kit generate # generate migration from schema changes
pnpm exec drizzle-kit migrate # apply to database
pnpm exec drizzle-kit push # push schema directly to dev DBDashboard
TanStack Start app with authenticated server functions that call the API Worker. Dashboard does not query runtime databases directly.
Route to data source mapping:
| Route | Section | Data source |
|---|---|---|
/ | Overview | API /dashboard/overview -> Turso |
/revenue | Revenue Breakdown | API /reports/revenue -> Turso |
/enrollments | Enrollment Breakdown | API /reports/enrollments -> Turso |
/programs | Program Breakdown | API /reports/programs -> Turso |
/schools | School Breakdown | API /reports/schools -> Turso |
/channels | Channel Breakdown | API /reports/channels -> Turso |
/records/people | People records list | API /records/people -> Turso |
/records/organizations | Organization records list | API /records/organizations -> Turso |
/records/people/:id | Person detail | API /records/people/:id -> Turso |
/records/organizations/:id | Organization detail | API /records/organizations/:id -> Turso |
Dashboard filter model:
- URL params use
year+ optionalmonth(MMMMformat, orAll monthsfor yearly mode). - If
unitis not set, dashboard aggregate pages default to all units.
Adapter Pattern
The pipeline uses a config-driven approach. Each entity-year has a YAML config in @source/config/ that maps xlsx sheets to raw table names and defines column schemas. The Extract step (SheetJS) normalizes Indonesian column headers to English, with optional per-sheet header overrides (header_row, merge_header_rows, data_start_row) for inconsistent workbook layouts. Future sources (Odoo, APIs) can be added as new extract sub-commands.
The Extract step's reader.ts and schema.ts are the only places in the codebase where Indonesian source terms are mapped to English. All other layers use English terms exclusively.
Scope and Multi-tenancy
All operational tables in LibSQL include entity_id. All dbt models accept an entity variable for filtering. Queries never mix data across entities.
// LibSQL — entity scoping at query layer
const orders = await db.select().from(commerceOrder)
.where(eq(commerceOrder.entityId, entityId))-- dbt — entity variable passed at runtime
SELECT * FROM stg_transactions
WHERE entity_code = '{{ var("entity") }}'Deployment
Phase 1 — Local
1) Bootstrap (one-time setup)
# Seed LibSQL lookup tables
pnpm run seed -- --entity IONS
# Optional: rebuild canonical CSV snapshots first
pnpm run extract -- --entity IONS --year 20262) Recurring monthly refresh
# Full monthly sync chain (extract -> load -> validate -> transform)
pnpm run sync -- --entity IONS --year 2026
# Optional month-focused refresh
pnpm run sync -- --entity IONS --year 2026 --month 23) Publish runtime data
# Push monthly operational snapshot to LibSQL/Turso
pnpm run publish -- --entity IONS --year 2026 --month 24) Generate report artifacts on demand
# Monthly report (JSON + PPTX + PDF)
pnpm run report -- --entity IONS --year 2026 --month 2 --type monthly
# Quarterly report (aggregated)
pnpm run report -- --entity IONS --year 2026 --month 1-3 --type quarterly
# Yearly report (aggregated)
pnpm run report -- --entity IONS --year 2026 --type yearly
# Optional: skip staleness warnings
pnpm run report -- --entity IONS --year 2026 --month 2 --type monthly --force5) Stage-level debug commands (optional)
# Load CSV snapshots -> DuckDB raw layer (debug)
pnpm run pipeline -- load --entity IONS --year 2026
# Validate raw tables
pnpm run validate -- --entity IONS --year 2026
# Generate clean CSV QA + KPI workbook (`@source/{entity}-report.xlsx`)
pnpm run pipeline -- clean-report --entity IONS
# Run dbt transforms
pnpm run pipeline -- transform --entity IONS
# Assemble report JSON only
pnpm run format -- --entity IONS --period 2026-02
# Render PPTX/PDF from existing report JSON
pnpm run present -- --report output/monthly/IONS-2026-02.json
# Preview Slidev deck
pnpm run slides -- dev --entity IONS --period 2026-02 --unit WLC
# Export Slidev outputs
pnpm run slides -- export --entity IONS --period 2026-02 --unit WLC
pnpm run slides -- export --entity IONS --period 2026-02 --unit WLC --format pptx
# Start dashboard
pnpm run --filter @services/dashboard dev
# Refresh Evidence sources and run Evidence BI app
pnpm run evidence:sources
pnpm run evidencePhase 2 — Remote DB
Replace DATABASE_URL with a Turso connection string for LibSQL. Replace DuckDB local file with MotherDuck for the analytical layer (optional).
Phase 3 — Cloudflare Worker runtime
Deploy @services/api, @services/dashboard, @services/slides, and @services/evidence as Cloudflare Workers. Dashboard runtime reads are served through API routes backed by Turso.
Phase 4 — GitHub Actions environments and secrets
Workflows deploy-plan.yml, deploy-dashboard-api.yml, and deploy-slides.yml use workflow_dispatch and require GitHub environments named dev and prod.
| Environment | Required GitHub secrets | Notes |
|---|---|---|
dev | CLOUDFLARE_API_TOKEN | Used for non-production deploys (dev.* domains). |
prod | CLOUDFLARE_API_TOKEN | Used for production deploys (*.atlas.prata.ma) and should require reviewer approval. |
Approval policy guidance:
dev: no manual approval required.prod: require at least one maintainer reviewer before workflow execution.
Runtime Worker secrets/vars (AUTH_*, DATABASE_URL, TURSO_AUTH_TOKEN, etc.) are configured in Cloudflare Worker environments, not GitHub Actions secrets.
Phase 5 — Odoo Sync
Implement OdooAdapter. Raw Odoo data loads into DuckDB raw layer via the same adapter interface. dbt models are unchanged — they read from the raw layer regardless of source.
Resolved Decisions
Present Layer — Component Reuse Strategy
Decision: Data-only reuse. @services/present (PPTX/PDF), @services/slides (Slidev PDF/PPTX/Web), and @services/dashboard (React/Tailwind) consume the same Report type from report.json. No visual logic is shared — each layer is optimized for its medium. The PresentAdapter interface allows future backends (Google Slides, NotebookLM) to be added without changing the Format layer.