Architecture

System architecture for Atlas. All terminology follows glossary.md. All data entities follow model.md and analytics.md.

Overview

Atlas is a local-first Modern Data Stack for multi-entity business reporting. It normalizes data from operational sources using an ELT pipeline, transforms raw data into analytical marts via dbt, and serves outputs through static presentation layers (PPTX/PDF and Slidev exports), an Evidence OSS BI service, plus a Cloudflare Worker dashboard that reads through an authenticated API backed by Turso.

Principles

Principle	Description
Modern Data Stack	Modular, best-of-breed components for each layer. Each layer has a single responsibility and a clean interface to the next.
ELT over ETL	Raw source data is loaded first, untransformed. All business logic lives in the transform layer (dbt). Source data is always reprocessable.
Two-layer database	DuckDB for analytical workloads (aggregations, marts, reporting). LibSQL/Turso for operational workloads (individual record lookup, dashboard drill-down).
Declarative transforms	All business logic for reporting is expressed as SQL models in dbt. No imperative calculation code in the pipeline.
Adapter pattern	The Sync layer uses source adapters to decouple extraction from loading. Swapping xlsx for Odoo requires only a new adapter.
Schema as code	The LibSQL operational schema is defined in TypeScript via Drizzle ORM. The DuckDB analytical schema is defined in dbt models. Both are version-controlled.
Entity scoping	All data is scoped to a `core_entity` record. IONs and EPN never share data in queries.
Report as data	The Format layer reads dbt marts and assembles a structured `report.json`. The Present layer is a pure renderer — it applies no business logic.
Python where it makes sense	dbt-core and any data tooling that benefits from Python uses Python, managed via `uv`. Everything else is TypeScript with pnpm-managed tooling.

Technology Stack

Concern	Technology	Notes
Runtime	Node.js + pnpm	All TS scripts, packages, and services
Monorepo	Turborepo + pnpm workspaces	Task pipeline for build, lint, test
Language	TypeScript (primary), Python (dbt + data tooling)	Strict TS throughout; Python managed via `uv`
Python tooling	uv	Fast Python package manager; manages dbt venv
Analytical DB	DuckDB	In-process OLAP; native xlsx reading; hosts raw, staging, intermediate, mart layers
Operational DB	LibSQL (local) → Turso (remote)	SQLite-compatible; individual record lookup; dashboard drill-down
Operational ORM	Drizzle ORM (LibSQL dialect)	Type-safe queries; schema-as-code; migrations via Drizzle Kit
Transform layer	dbt-core + dbt-duckdb	Declarative SQL models; staging, intermediate, and mart layers
Extract + Load	SheetJS xlsx → CSV + DuckDB `read_csv_auto`	SheetJS extracts xlsx to canonical CSVs; DuckDB loads CSVs into raw layer
Odoo sync (future)	Odoo JSON-RPC via `fetch`	Same adapter interface as xlsx
Format	TypeScript + DuckDB node client	Reads marts, assembles report.json
Shared config	`@config` (TypeScript workspace package)	Entity/unit/brand themes shared across services
Present — PPTX	pptxgenjs	Programmatic slide generation
Present — PDF	LibreOffice CLI	Converts PPTX to PDF via headless LibreOffice
Present — Slidev	Slidev + Vue + UnoCSS + Playwright	Data-driven slides service with PDF/PPTX/web export
BI reporting service	Evidence OSS (`@evidence-dev/evidence`)	SQL + markdown site reading from both `atlas.db` and `atlas-ops.db`
API service	Cloudflare Workers + Drizzle (LibSQL)	Authenticated dashboard read API; Turso-first runtime data path
Dashboard	TanStack Start	SSR + client, file-based routing
Dashboard UI	shadcn/ui + Tailwind CSS	Component library
Dashboard data access	Server functions -> API Worker	Dashboard loaders call API routes and forward session cookies
Slides delivery	Cloudflare Worker + R2	Auth-gated report paths (`/reports/{entity}/{period-unit}/...`)
Code quality	ESLint + Prettier + TypeScript strict	Shared configs from `@core/lint` and `@core/typescript`

Repository Structure

ions/
├── @plan/                              # Planning docs (not a workspace package)
│   ├── architecture.md
│   ├── analytics.md                    # DuckDB, dbt layers, mart definitions
│   ├── glossary.md
│   ├── model.md
│   ├── registry.md
│   ├── domains/
│   └── discussions/
│
├── @config/                            # Shared entity/unit/brand definitions (framework-agnostic)
│   ├── brands/
│   │   ├── ions/
│   │   └── epn/
│   ├── entities.ts
│   ├── units.ts
│   ├── types.ts
│   └── index.ts
│
├── @core/                              # Shared infra packages
│   ├── typescript/                     # Shared tsconfig presets
│   ├── lint/                           # Shared ESLint flat config
│   ├── ui/                             # Shared shadcn/ui + Base UI component library
│   └── ai/                             # AI CLI commands and sync functionality
│
├── @packages/                          # Business-logic packages
│   ├── db/                             # Drizzle schema + migrations (operational layer)
│   │   ├── @source/
│   │   │   ├── schema/
│   │   │   │   ├── core.ts             # core_group, core_entity, core_unit
│   │   │   │   ├── db.ts               # db_person, db_organization, db_account, db_relation
│   │   │   │   ├── catalog.ts          # catalog_brand, catalog_item, catalog_item_variant, pricing
│   │   │   │   ├── commerce.ts         # commerce_order + lookups
│   │   │   │   ├── finance.ts          # finance_transaction + lookups
│   │   │   │   ├── marketing.ts        # marketing_channel, marketing_agenda, marketing_activity
│   │   │   │   └── target.ts           # target_item, target_metric
│   │   │   ├── client.ts               # LibSQL client + Drizzle instance
│   │   │   └── index.ts
│   │   ├── drizzle.config.ts
│   │   ├── migrations/
│   │   └── package.json
│   │
│   └── pipeline/                       # Unified pipeline CLI (seed/sync/publish/report + stage commands)
│       ├── @source/
│       │   ├── index.ts                # CLI dispatcher (sub-commands)
│       │   ├── config.ts               # @source/config loader + path normalization
│       │   ├── duck.ts                 # DuckDB helpers + query client
│       │   ├── seed.ts                 # Re-export seed from @packages/db
│       │   ├── extract/                # xlsx → canonical CSV snapshots (SheetJS)
│       │   ├── load/csv.ts             # CSV → DuckDB raw loader (read_csv_auto)
│       │   ├── validate/               # SQL checks + JSON report
│       │   └── format/                 # mart_* → report.json assembly
│       └── package.json
│
├── @python/                            # Python workspaces
│   └── analytics/                      # dbt project — analytical transform layer
│       ├── models/
│       │   ├── staging/                # stg_* — clean and rename raw columns
│       │   │   ├── stg_transactions.sql
│       │   │   ├── stg_students.sql
│       │   │   └── stg_targets.sql
│       │   ├── intermediate/           # int_* — join, classify, resolve FKs
│       │   │   ├── int_orders.sql
│       │   │   ├── int_enrollments.sql
│       │   │   └── audit_flagged_orders.sql
│       │   └── marts/                  # mart_* — analysis-ready aggregations
│       │       ├── mart_revenue.sql
│       │       ├── mart_program_progress.sql
│       │       ├── mart_channel_marketing.sql
│       │       └── mart_school_progress.sql
│       ├── macros/
│       ├── tests/
│       ├── dbt_project.yml
│       ├── profiles.yml
│       └── pyproject.toml              # uv project file
│
├── @services/                          # Deployable apps
│   ├── present/                        # PPTX + PDF generator
│   │   ├── @source/
│   │   │   ├── slides/
│   │   │   │   ├── cover.ts
│   │   │   │   ├── revenue-comparison.ts
│   │   │   │   ├── key-comparison.ts
│   │   │   │   ├── program-progress.ts
│   │   │   │   ├── school-progress.ts
│   │   │   │   └── channel-marketing.ts
│   │   │   ├── generator.ts
│   │   │   └── index.ts
│   │   └── package.json
│   │
│   ├── plan/                           # Fumadocs documentation site (Cloudflare Worker)
│   │   └── ...
│   │
│   ├── api/                            # Cloudflare Worker API (auth + dashboard data)
│   │   ├── src/
│   │   │   ├── worker.ts
│   │   │   ├── lib/
│   │   │   │   ├── auth.ts
│   │   │   │   └── db.ts
│   │   │   └── routes/
│   │   │       ├── auth.ts
│   │   │       └── dashboard.ts
│   │   ├── wrangler.jsonc
│   │   └── package.json
│   │
│   ├── slides/                         # Slidev-based report presentation service
│   │   ├── @source/
│   │   │   ├── layouts/
│   │   │   ├── components/
│   │   │   ├── templates/
│   │   │   ├── reports/
│   │   │   └── styles/
│   │   ├── cli.ts
│   │   ├── scripts/
│   │   ├── src/                        # Slides serving Worker entry + auth routes
│   │   ├── wrangler.jsonc
│   │   └── package.json
│   │
│   ├── evidence/                       # Evidence OSS BI app (SQL + markdown)
│   │   ├── pages/
│   │   ├── sources/                    # atlas (duckdb) + atlas_ops (sqlite)
│   │   ├── evidence.config.yaml
│   │   └── package.json
│   │
│   └── dashboard/                      # TanStack Start web app
│       ├── src/
│       │   ├── routes/
│       │   │   ├── __root.tsx
│       │   │   ├── index.tsx
│       │   │   ├── _app.tsx
│       │   │   └── _app/
│       │   │       ├── revenue.tsx
│       │   │       ├── enrollments.tsx
│       │   │       ├── programs.tsx
│       │   │       ├── schools.tsx
│       │   │       ├── channels.tsx
│       │   │       ├── records.people.tsx
│       │   │       └── records.organizations.tsx
│       │   ├── components/
│       │   └── styles.css
│       ├── app.config.ts
│       ├── vite.config.ts
│       ├── tsconfig.json
│       ├── components.json
│       └── package.json
│
├── @source/                            # Source data directory
│   ├── raw/                            # Raw xlsx files — untouched input data
│   ├── clean/                          # Canonical CSV snapshots (extracted from xlsx)
│   ├── invalid/                        # Monthly QA exports from validate stage
│   └── config/                         # Entity-year YAML configs (sheet → table mapping)
├── output/                             # Generated reports and validation — gitignored
│   ├── monthly/
│   ├── weekly/
│   └── validation/
│
├── atlas.db                            # DuckDB file (analytical) — gitignored
├── atlas-ops.db                        # LibSQL file (operational) — gitignored
├── pnpm-lock.yaml
├── package.json
├── turbo.json
└── tsconfig.json

ELT Pipeline

Extract

Reads xlsx source files via SheetJS and writes canonical CSV snapshots to @source/clean/. Column headers are normalized to English at this stage. CSVs are frozen — they are not overwritten unless --overwrite is passed.

CLI usage:

pnpm run extract -- --entity IONS --year 2026

Load

Loads canonical CSV snapshots from @source/clean/ into the DuckDB raw layer using read_csv_auto. No business logic — raw layer preserves the canonical CSV schema.

CLI usage:

pnpm run sync -- --entity IONS --year 2026
pnpm run sync -- --entity IONS --year 2026 --month 2 --mode append

Validate

Runs typed SQL checks against DuckDB raw tables and emits a JSON validation report. The same stage also writes monthly invalid QA CSVs to @source/invalid/ ({entity}-{YYYY}-{MM}-students.csv and {entity}-{YYYY}-{MM}-transactions.csv) with reason-code metadata for audit and triage before dbt transforms, plus pipeline phase comparison CSVs (@source/{entity}-students.csv, @source/{entity}-transactions.csv) with monthly Percentage (mart vs clean) for years 2023-2026.

CLI usage:

pnpm run validate -- --entity IONS --year 2026

Transform (dbt)

Runs dbt models against DuckDB to produce clean, analysis-ready marts. Full layer definitions and mart schemas are in analytics.md.

CLI usage:

cd @python/analytics

uv run dbt run            # run all models
uv run dbt run --select marts
uv run dbt test           # run all tests
uv run dbt docs generate  # generate lineage docs

Format

Reads dbt marts from DuckDB, applies period filtering and cross-mart assembly, and outputs report.json. Also returns data in-memory for direct consumption by the Present layer.

Output type:

type Report = {
  meta: {
    entity: string        // e.g. "IONS"
    period: string        // e.g. "2026-02"
    generated_at: string  // ISO timestamp
    scope: string[]       // unit codes included
  }
  units: Record<string, UnitReport>
}

CLI usage:

pnpm run format -- --entity IONS --period 2026-02
pnpm run format -- --entity IONS --period 2026-02 --unit TM,WLC_NON_ENGLISH

Present (PPTX service)

Consumes report.json and produces PPTX + PDF. No business logic — pure rendering.

CLI usage:

pnpm run present -- --report output/monthly/2026-02-report.json
pnpm run present -- --entity IONS --period 2026-02  # runs Format then Present

Slides (Slidev service)

Consumes report.json (copied or scaffolded to @services/slides/@source/reports/{entity}/{period-unit}/data.json) and renders branded Slidev decks with custom layouts/components.

CLI usage:

pnpm run slides -- dev --entity IONS --period 2026-02 --unit WLC
pnpm run slides -- export --entity IONS --period 2026-02 --unit WLC
pnpm run slides -- export --entity IONS --period 2026-02 --unit WLC --format pptx
pnpm run slides -- build --entity IONS --period 2026-02 --unit WLC

Database Architecture

Two Layers

	DuckDB (`atlas.db`)	LibSQL (`atlas-ops.db`)
Role	Analytical	Operational
Contents	raw, staging, intermediate, mart layers	normalized tables from `model.md`
Schema defined in	dbt models (`.sql` files)	Drizzle ORM (TypeScript)
Query pattern	Aggregations, GROUP BY, window functions, PIVOT	Individual record lookup, FK joins
Used by	Format layer, present/slides artifact generation, Evidence analytics source	API Worker runtime reads (dashboard + auth-gated drill-down), Evidence operational source
Remote option	MotherDuck (future)	Turso

Query routing in the dashboard

LibSQL environment config

# Local development
DATABASE_URL=file:atlas-ops.db

# Remote (Turso)
DATABASE_URL=libsql://<db-name>.turso.io
TURSO_AUTH_TOKEN=<token>

Migrations (LibSQL / Drizzle)

pnpm exec drizzle-kit generate   # generate migration from schema changes
pnpm exec drizzle-kit migrate    # apply to database
pnpm exec drizzle-kit push       # push schema directly to dev DB

Dashboard

TanStack Start app with authenticated server functions that call the API Worker. Dashboard does not query runtime databases directly.

Route to data source mapping:

Route	Section	Data source
`/`	Overview	API `/dashboard/overview` -> Turso
`/revenue`	Revenue Breakdown	API `/reports/revenue` -> Turso
`/enrollments`	Enrollment Breakdown	API `/reports/enrollments` -> Turso
`/programs`	Program Breakdown	API `/reports/programs` -> Turso
`/schools`	School Breakdown	API `/reports/schools` -> Turso
`/channels`	Channel Breakdown	API `/reports/channels` -> Turso
`/records/people`	People records list	API `/records/people` -> Turso
`/records/organizations`	Organization records list	API `/records/organizations` -> Turso
`/records/people/:id`	Person detail	API `/records/people/:id` -> Turso
`/records/organizations/:id`	Organization detail	API `/records/organizations/:id` -> Turso

Dashboard filter model:

URL params use year + optional month (MMMM format, or All months for yearly mode).
If unit is not set, dashboard aggregate pages default to all units.

Adapter Pattern

The pipeline uses a config-driven approach. Each entity-year has a YAML config in @source/config/ that maps xlsx sheets to raw table names and defines column schemas. The Extract step (SheetJS) normalizes Indonesian column headers to English, with optional per-sheet header overrides (header_row, merge_header_rows, data_start_row) for inconsistent workbook layouts. Future sources (Odoo, APIs) can be added as new extract sub-commands.

The Extract step's reader.ts and schema.ts are the only places in the codebase where Indonesian source terms are mapped to English. All other layers use English terms exclusively.

Scope and Multi-tenancy

All operational tables in LibSQL include entity_id. All dbt models accept an entity variable for filtering. Queries never mix data across entities.

// LibSQL — entity scoping at query layer
const orders = await db.select().from(commerceOrder)
  .where(eq(commerceOrder.entityId, entityId))

-- dbt — entity variable passed at runtime
SELECT * FROM stg_transactions
WHERE entity_code = '{{ var("entity") }}'

Deployment

Phase 1 — Local

1) Bootstrap (one-time setup)

# Seed LibSQL lookup tables
pnpm run seed -- --entity IONS

# Optional: rebuild canonical CSV snapshots first
pnpm run extract -- --entity IONS --year 2026

2) Recurring monthly refresh

# Full monthly sync chain (extract -> load -> validate -> transform)
pnpm run sync -- --entity IONS --year 2026

# Optional month-focused refresh
pnpm run sync -- --entity IONS --year 2026 --month 2

3) Publish runtime data

# Push monthly operational snapshot to LibSQL/Turso
pnpm run publish -- --entity IONS --year 2026 --month 2

4) Generate report artifacts on demand

# Monthly report (JSON + PPTX + PDF)
pnpm run report -- --entity IONS --year 2026 --month 2 --type monthly

# Quarterly report (aggregated)
pnpm run report -- --entity IONS --year 2026 --month 1-3 --type quarterly

# Yearly report (aggregated)
pnpm run report -- --entity IONS --year 2026 --type yearly

# Optional: skip staleness warnings
pnpm run report -- --entity IONS --year 2026 --month 2 --type monthly --force

5) Stage-level debug commands (optional)

# Load CSV snapshots -> DuckDB raw layer (debug)
pnpm run pipeline -- load --entity IONS --year 2026

# Validate raw tables
pnpm run validate -- --entity IONS --year 2026

# Generate clean CSV QA + KPI workbook (`@source/{entity}-report.xlsx`)
pnpm run pipeline -- clean-report --entity IONS

# Run dbt transforms
pnpm run pipeline -- transform --entity IONS

# Assemble report JSON only
pnpm run format -- --entity IONS --period 2026-02

# Render PPTX/PDF from existing report JSON
pnpm run present -- --report output/monthly/IONS-2026-02.json

# Preview Slidev deck
pnpm run slides -- dev --entity IONS --period 2026-02 --unit WLC

# Export Slidev outputs
pnpm run slides -- export --entity IONS --period 2026-02 --unit WLC
pnpm run slides -- export --entity IONS --period 2026-02 --unit WLC --format pptx

# Start dashboard
pnpm run --filter @services/dashboard dev

# Refresh Evidence sources and run Evidence BI app
pnpm run evidence:sources
pnpm run evidence

Phase 2 — Remote DB

Replace DATABASE_URL with a Turso connection string for LibSQL. Replace DuckDB local file with MotherDuck for the analytical layer (optional).

Phase 3 — Cloudflare Worker runtime

Deploy @services/api, @services/dashboard, @services/slides, and @services/evidence as Cloudflare Workers. Dashboard runtime reads are served through API routes backed by Turso.

Phase 4 — GitHub Actions environments and secrets

Workflows deploy-plan.yml, deploy-dashboard-api.yml, and deploy-slides.yml use workflow_dispatch and require GitHub environments named dev and prod.

Environment	Required GitHub secrets	Notes
`dev`	`CLOUDFLARE_API_TOKEN`	Used for non-production deploys (`dev.*` domains).
`prod`	`CLOUDFLARE_API_TOKEN`	Used for production deploys (`*.atlas.prata.ma`) and should require reviewer approval.

Approval policy guidance:

dev: no manual approval required.
prod: require at least one maintainer reviewer before workflow execution.

Runtime Worker secrets/vars (AUTH_*, DATABASE_URL, TURSO_AUTH_TOKEN, etc.) are configured in Cloudflare Worker environments, not GitHub Actions secrets.

Phase 5 — Odoo Sync

Implement OdooAdapter. Raw Odoo data loads into DuckDB raw layer via the same adapter interface. dbt models are unchanged — they read from the raw layer regardless of source.

Resolved Decisions

Present Layer — Component Reuse Strategy

Decision: Data-only reuse. @services/present (PPTX/PDF), @services/slides (Slidev PDF/PPTX/Web), and @services/dashboard (React/Tailwind) consume the same Report type from report.json. No visual logic is shared — each layer is optimized for its medium. The PresentAdapter interface allows future backends (Google Slides, NotebookLM) to be added without changing the Format layer.

Architecture

On this page