Atlas Plan

Architecture

Architecture

System architecture for Atlas. All terminology follows glossary.md. All data entities follow model.md and analytics.md.


Overview

Atlas is a local-first Modern Data Stack for multi-entity business reporting. It normalizes data from operational sources using an ELT pipeline, transforms raw data into analytical marts via dbt, and serves outputs through static presentation layers (PPTX/PDF and Slidev exports), an Evidence OSS BI service, plus a Cloudflare Worker dashboard that reads through an authenticated API backed by Turso.


Principles

PrincipleDescription
Modern Data StackModular, best-of-breed components for each layer. Each layer has a single responsibility and a clean interface to the next.
ELT over ETLRaw source data is loaded first, untransformed. All business logic lives in the transform layer (dbt). Source data is always reprocessable.
Two-layer databaseDuckDB for analytical workloads (aggregations, marts, reporting). LibSQL/Turso for operational workloads (individual record lookup, dashboard drill-down).
Declarative transformsAll business logic for reporting is expressed as SQL models in dbt. No imperative calculation code in the pipeline.
Adapter patternThe Sync layer uses source adapters to decouple extraction from loading. Swapping xlsx for Odoo requires only a new adapter.
Schema as codeThe LibSQL operational schema is defined in TypeScript via Drizzle ORM. The DuckDB analytical schema is defined in dbt models. Both are version-controlled.
Entity scopingAll data is scoped to a core_entity record. IONs and EPN never share data in queries.
Report as dataThe Format layer reads dbt marts and assembles a structured report.json. The Present layer is a pure renderer — it applies no business logic.
Python where it makes sensedbt-core and any data tooling that benefits from Python uses Python, managed via uv. Everything else is TypeScript with pnpm-managed tooling.

Technology Stack

ConcernTechnologyNotes
RuntimeNode.js + pnpmAll TS scripts, packages, and services
MonorepoTurborepo + pnpm workspacesTask pipeline for build, lint, test
LanguageTypeScript (primary), Python (dbt + data tooling)Strict TS throughout; Python managed via uv
Python toolinguvFast Python package manager; manages dbt venv
Analytical DBDuckDBIn-process OLAP; native xlsx reading; hosts raw, staging, intermediate, mart layers
Operational DBLibSQL (local) → Turso (remote)SQLite-compatible; individual record lookup; dashboard drill-down
Operational ORMDrizzle ORM (LibSQL dialect)Type-safe queries; schema-as-code; migrations via Drizzle Kit
Transform layerdbt-core + dbt-duckdbDeclarative SQL models; staging, intermediate, and mart layers
Extract + LoadSheetJS xlsx → CSV + DuckDB read_csv_autoSheetJS extracts xlsx to canonical CSVs; DuckDB loads CSVs into raw layer
Odoo sync (future)Odoo JSON-RPC via fetchSame adapter interface as xlsx
FormatTypeScript + DuckDB node clientReads marts, assembles report.json
Shared config@config (TypeScript workspace package)Entity/unit/brand themes shared across services
Present — PPTXpptxgenjsProgrammatic slide generation
Present — PDFLibreOffice CLIConverts PPTX to PDF via headless LibreOffice
Present — SlidevSlidev + Vue + UnoCSS + PlaywrightData-driven slides service with PDF/PPTX/web export
BI reporting serviceEvidence OSS (@evidence-dev/evidence)SQL + markdown site reading from both atlas.db and atlas-ops.db
API serviceCloudflare Workers + Drizzle (LibSQL)Authenticated dashboard read API; Turso-first runtime data path
DashboardTanStack StartSSR + client, file-based routing
Dashboard UIshadcn/ui + Tailwind CSSComponent library
Dashboard data accessServer functions -> API WorkerDashboard loaders call API routes and forward session cookies
Slides deliveryCloudflare Worker + R2Auth-gated report paths (/reports/{entity}/{period-unit}/...)
Code qualityESLint + Prettier + TypeScript strictShared configs from @core/lint and @core/typescript

Repository Structure

ions/
├── @plan/                              # Planning docs (not a workspace package)
│   ├── architecture.md
│   ├── analytics.md                    # DuckDB, dbt layers, mart definitions
│   ├── glossary.md
│   ├── model.md
│   ├── registry.md
│   ├── domains/
│   └── discussions/

├── @config/                            # Shared entity/unit/brand definitions (framework-agnostic)
│   ├── brands/
│   │   ├── ions/
│   │   └── epn/
│   ├── entities.ts
│   ├── units.ts
│   ├── types.ts
│   └── index.ts

├── @core/                              # Shared infra packages
│   ├── typescript/                     # Shared tsconfig presets
│   ├── lint/                           # Shared ESLint flat config
│   ├── ui/                             # Shared shadcn/ui + Base UI component library
│   └── ai/                             # AI CLI commands and sync functionality

├── @packages/                          # Business-logic packages
│   ├── db/                             # Drizzle schema + migrations (operational layer)
│   │   ├── @source/
│   │   │   ├── schema/
│   │   │   │   ├── core.ts             # core_group, core_entity, core_unit
│   │   │   │   ├── db.ts               # db_person, db_organization, db_account, db_relation
│   │   │   │   ├── catalog.ts          # catalog_brand, catalog_item, catalog_item_variant, pricing
│   │   │   │   ├── commerce.ts         # commerce_order + lookups
│   │   │   │   ├── finance.ts          # finance_transaction + lookups
│   │   │   │   ├── marketing.ts        # marketing_channel, marketing_agenda, marketing_activity
│   │   │   │   └── target.ts           # target_item, target_metric
│   │   │   ├── client.ts               # LibSQL client + Drizzle instance
│   │   │   └── index.ts
│   │   ├── drizzle.config.ts
│   │   ├── migrations/
│   │   └── package.json
│   │
│   └── pipeline/                       # Unified pipeline CLI (seed/sync/publish/report + stage commands)
│       ├── @source/
│       │   ├── index.ts                # CLI dispatcher (sub-commands)
│       │   ├── config.ts               # @source/config loader + path normalization
│       │   ├── duck.ts                 # DuckDB helpers + query client
│       │   ├── seed.ts                 # Re-export seed from @packages/db
│       │   ├── extract/                # xlsx → canonical CSV snapshots (SheetJS)
│       │   ├── load/csv.ts             # CSV → DuckDB raw loader (read_csv_auto)
│       │   ├── validate/               # SQL checks + JSON report
│       │   └── format/                 # mart_* → report.json assembly
│       └── package.json

├── @python/                            # Python workspaces
│   └── analytics/                      # dbt project — analytical transform layer
│       ├── models/
│       │   ├── staging/                # stg_* — clean and rename raw columns
│       │   │   ├── stg_transactions.sql
│       │   │   ├── stg_students.sql
│       │   │   └── stg_targets.sql
│       │   ├── intermediate/           # int_* — join, classify, resolve FKs
│       │   │   ├── int_orders.sql
│       │   │   ├── int_enrollments.sql
│       │   │   └── audit_flagged_orders.sql
│       │   └── marts/                  # mart_* — analysis-ready aggregations
│       │       ├── mart_revenue.sql
│       │       ├── mart_program_progress.sql
│       │       ├── mart_channel_marketing.sql
│       │       └── mart_school_progress.sql
│       ├── macros/
│       ├── tests/
│       ├── dbt_project.yml
│       ├── profiles.yml
│       └── pyproject.toml              # uv project file

├── @services/                          # Deployable apps
│   ├── present/                        # PPTX + PDF generator
│   │   ├── @source/
│   │   │   ├── slides/
│   │   │   │   ├── cover.ts
│   │   │   │   ├── revenue-comparison.ts
│   │   │   │   ├── key-comparison.ts
│   │   │   │   ├── program-progress.ts
│   │   │   │   ├── school-progress.ts
│   │   │   │   └── channel-marketing.ts
│   │   │   ├── generator.ts
│   │   │   └── index.ts
│   │   └── package.json
│   │
│   ├── plan/                           # Fumadocs documentation site (Cloudflare Worker)
│   │   └── ...
│   │
│   ├── api/                            # Cloudflare Worker API (auth + dashboard data)
│   │   ├── src/
│   │   │   ├── worker.ts
│   │   │   ├── lib/
│   │   │   │   ├── auth.ts
│   │   │   │   └── db.ts
│   │   │   └── routes/
│   │   │       ├── auth.ts
│   │   │       └── dashboard.ts
│   │   ├── wrangler.jsonc
│   │   └── package.json
│   │
│   ├── slides/                         # Slidev-based report presentation service
│   │   ├── @source/
│   │   │   ├── layouts/
│   │   │   ├── components/
│   │   │   ├── templates/
│   │   │   ├── reports/
│   │   │   └── styles/
│   │   ├── cli.ts
│   │   ├── scripts/
│   │   ├── src/                        # Slides serving Worker entry + auth routes
│   │   ├── wrangler.jsonc
│   │   └── package.json
│   │
│   ├── evidence/                       # Evidence OSS BI app (SQL + markdown)
│   │   ├── pages/
│   │   ├── sources/                    # atlas (duckdb) + atlas_ops (sqlite)
│   │   ├── evidence.config.yaml
│   │   └── package.json
│   │
│   └── dashboard/                      # TanStack Start web app
│       ├── src/
│       │   ├── routes/
│       │   │   ├── __root.tsx
│       │   │   ├── index.tsx
│       │   │   ├── _app.tsx
│       │   │   └── _app/
│       │   │       ├── revenue.tsx
│       │   │       ├── enrollments.tsx
│       │   │       ├── programs.tsx
│       │   │       ├── schools.tsx
│       │   │       ├── channels.tsx
│       │   │       ├── records.people.tsx
│       │   │       └── records.organizations.tsx
│       │   ├── components/
│       │   └── styles.css
│       ├── app.config.ts
│       ├── vite.config.ts
│       ├── tsconfig.json
│       ├── components.json
│       └── package.json

├── @source/                            # Source data directory
│   ├── raw/                            # Raw xlsx files — untouched input data
│   ├── clean/                          # Canonical CSV snapshots (extracted from xlsx)
│   ├── invalid/                        # Monthly QA exports from validate stage
│   └── config/                         # Entity-year YAML configs (sheet → table mapping)
├── output/                             # Generated reports and validation — gitignored
│   ├── monthly/
│   ├── weekly/
│   └── validation/

├── atlas.db                            # DuckDB file (analytical) — gitignored
├── atlas-ops.db                        # LibSQL file (operational) — gitignored
├── pnpm-lock.yaml
├── package.json
├── turbo.json
└── tsconfig.json

ELT Pipeline

Extract

Reads xlsx source files via SheetJS and writes canonical CSV snapshots to @source/clean/. Column headers are normalized to English at this stage. CSVs are frozen — they are not overwritten unless --overwrite is passed.

CLI usage:

pnpm run extract -- --entity IONS --year 2026

Load

Loads canonical CSV snapshots from @source/clean/ into the DuckDB raw layer using read_csv_auto. No business logic — raw layer preserves the canonical CSV schema.

CLI usage:

pnpm run sync -- --entity IONS --year 2026
pnpm run sync -- --entity IONS --year 2026 --month 2 --mode append

Validate

Runs typed SQL checks against DuckDB raw tables and emits a JSON validation report. The same stage also writes monthly invalid QA CSVs to @source/invalid/ ({entity}-{YYYY}-{MM}-students.csv and {entity}-{YYYY}-{MM}-transactions.csv) with reason-code metadata for audit and triage before dbt transforms, plus pipeline phase comparison CSVs (@source/{entity}-students.csv, @source/{entity}-transactions.csv) with monthly Percentage (mart vs clean) for years 2023-2026.

CLI usage:

pnpm run validate -- --entity IONS --year 2026

Transform (dbt)

Runs dbt models against DuckDB to produce clean, analysis-ready marts. Full layer definitions and mart schemas are in analytics.md.

CLI usage:

cd @python/analytics

uv run dbt run            # run all models
uv run dbt run --select marts
uv run dbt test           # run all tests
uv run dbt docs generate  # generate lineage docs

Format

Reads dbt marts from DuckDB, applies period filtering and cross-mart assembly, and outputs report.json. Also returns data in-memory for direct consumption by the Present layer.

Output type:

type Report = {
  meta: {
    entity: string        // e.g. "IONS"
    period: string        // e.g. "2026-02"
    generated_at: string  // ISO timestamp
    scope: string[]       // unit codes included
  }
  units: Record<string, UnitReport>
}

CLI usage:

pnpm run format -- --entity IONS --period 2026-02
pnpm run format -- --entity IONS --period 2026-02 --unit TM,WLC_NON_ENGLISH

Present (PPTX service)

Consumes report.json and produces PPTX + PDF. No business logic — pure rendering.

CLI usage:

pnpm run present -- --report output/monthly/2026-02-report.json
pnpm run present -- --entity IONS --period 2026-02  # runs Format then Present

Slides (Slidev service)

Consumes report.json (copied or scaffolded to @services/slides/@source/reports/{entity}/{period-unit}/data.json) and renders branded Slidev decks with custom layouts/components.

CLI usage:

pnpm run slides -- dev --entity IONS --period 2026-02 --unit WLC
pnpm run slides -- export --entity IONS --period 2026-02 --unit WLC
pnpm run slides -- export --entity IONS --period 2026-02 --unit WLC --format pptx
pnpm run slides -- build --entity IONS --period 2026-02 --unit WLC

Database Architecture

Two Layers

DuckDB (atlas.db)LibSQL (atlas-ops.db)
RoleAnalyticalOperational
Contentsraw, staging, intermediate, mart layersnormalized tables from model.md
Schema defined indbt models (.sql files)Drizzle ORM (TypeScript)
Query patternAggregations, GROUP BY, window functions, PIVOTIndividual record lookup, FK joins
Used byFormat layer, present/slides artifact generation, Evidence analytics sourceAPI Worker runtime reads (dashboard + auth-gated drill-down), Evidence operational source
Remote optionMotherDuck (future)Turso

Query routing in the dashboard

LibSQL environment config

# Local development
DATABASE_URL=file:atlas-ops.db

# Remote (Turso)
DATABASE_URL=libsql://<db-name>.turso.io
TURSO_AUTH_TOKEN=<token>

Migrations (LibSQL / Drizzle)

pnpm exec drizzle-kit generate   # generate migration from schema changes
pnpm exec drizzle-kit migrate    # apply to database
pnpm exec drizzle-kit push       # push schema directly to dev DB

Dashboard

TanStack Start app with authenticated server functions that call the API Worker. Dashboard does not query runtime databases directly.

Route to data source mapping:

RouteSectionData source
/OverviewAPI /dashboard/overview -> Turso
/revenueRevenue BreakdownAPI /reports/revenue -> Turso
/enrollmentsEnrollment BreakdownAPI /reports/enrollments -> Turso
/programsProgram BreakdownAPI /reports/programs -> Turso
/schoolsSchool BreakdownAPI /reports/schools -> Turso
/channelsChannel BreakdownAPI /reports/channels -> Turso
/records/peoplePeople records listAPI /records/people -> Turso
/records/organizationsOrganization records listAPI /records/organizations -> Turso
/records/people/:idPerson detailAPI /records/people/:id -> Turso
/records/organizations/:idOrganization detailAPI /records/organizations/:id -> Turso

Dashboard filter model:

  • URL params use year + optional month (MMMM format, or All months for yearly mode).
  • If unit is not set, dashboard aggregate pages default to all units.

Adapter Pattern

The pipeline uses a config-driven approach. Each entity-year has a YAML config in @source/config/ that maps xlsx sheets to raw table names and defines column schemas. The Extract step (SheetJS) normalizes Indonesian column headers to English, with optional per-sheet header overrides (header_row, merge_header_rows, data_start_row) for inconsistent workbook layouts. Future sources (Odoo, APIs) can be added as new extract sub-commands.

The Extract step's reader.ts and schema.ts are the only places in the codebase where Indonesian source terms are mapped to English. All other layers use English terms exclusively.


Scope and Multi-tenancy

All operational tables in LibSQL include entity_id. All dbt models accept an entity variable for filtering. Queries never mix data across entities.

// LibSQL — entity scoping at query layer
const orders = await db.select().from(commerceOrder)
  .where(eq(commerceOrder.entityId, entityId))
-- dbt — entity variable passed at runtime
SELECT * FROM stg_transactions
WHERE entity_code = '{{ var("entity") }}'

Deployment

Phase 1 — Local

1) Bootstrap (one-time setup)

# Seed LibSQL lookup tables
pnpm run seed -- --entity IONS

# Optional: rebuild canonical CSV snapshots first
pnpm run extract -- --entity IONS --year 2026

2) Recurring monthly refresh

# Full monthly sync chain (extract -> load -> validate -> transform)
pnpm run sync -- --entity IONS --year 2026

# Optional month-focused refresh
pnpm run sync -- --entity IONS --year 2026 --month 2

3) Publish runtime data

# Push monthly operational snapshot to LibSQL/Turso
pnpm run publish -- --entity IONS --year 2026 --month 2

4) Generate report artifacts on demand

# Monthly report (JSON + PPTX + PDF)
pnpm run report -- --entity IONS --year 2026 --month 2 --type monthly

# Quarterly report (aggregated)
pnpm run report -- --entity IONS --year 2026 --month 1-3 --type quarterly

# Yearly report (aggregated)
pnpm run report -- --entity IONS --year 2026 --type yearly

# Optional: skip staleness warnings
pnpm run report -- --entity IONS --year 2026 --month 2 --type monthly --force

5) Stage-level debug commands (optional)

# Load CSV snapshots -> DuckDB raw layer (debug)
pnpm run pipeline -- load --entity IONS --year 2026

# Validate raw tables
pnpm run validate -- --entity IONS --year 2026

# Generate clean CSV QA + KPI workbook (`@source/{entity}-report.xlsx`)
pnpm run pipeline -- clean-report --entity IONS

# Run dbt transforms
pnpm run pipeline -- transform --entity IONS

# Assemble report JSON only
pnpm run format -- --entity IONS --period 2026-02

# Render PPTX/PDF from existing report JSON
pnpm run present -- --report output/monthly/IONS-2026-02.json

# Preview Slidev deck
pnpm run slides -- dev --entity IONS --period 2026-02 --unit WLC

# Export Slidev outputs
pnpm run slides -- export --entity IONS --period 2026-02 --unit WLC
pnpm run slides -- export --entity IONS --period 2026-02 --unit WLC --format pptx

# Start dashboard
pnpm run --filter @services/dashboard dev

# Refresh Evidence sources and run Evidence BI app
pnpm run evidence:sources
pnpm run evidence

Phase 2 — Remote DB

Replace DATABASE_URL with a Turso connection string for LibSQL. Replace DuckDB local file with MotherDuck for the analytical layer (optional).

Phase 3 — Cloudflare Worker runtime

Deploy @services/api, @services/dashboard, @services/slides, and @services/evidence as Cloudflare Workers. Dashboard runtime reads are served through API routes backed by Turso.

Phase 4 — GitHub Actions environments and secrets

Workflows deploy-plan.yml, deploy-dashboard-api.yml, and deploy-slides.yml use workflow_dispatch and require GitHub environments named dev and prod.

EnvironmentRequired GitHub secretsNotes
devCLOUDFLARE_API_TOKENUsed for non-production deploys (dev.* domains).
prodCLOUDFLARE_API_TOKENUsed for production deploys (*.atlas.prata.ma) and should require reviewer approval.

Approval policy guidance:

  • dev: no manual approval required.
  • prod: require at least one maintainer reviewer before workflow execution.

Runtime Worker secrets/vars (AUTH_*, DATABASE_URL, TURSO_AUTH_TOKEN, etc.) are configured in Cloudflare Worker environments, not GitHub Actions secrets.

Phase 5 — Odoo Sync

Implement OdooAdapter. Raw Odoo data loads into DuckDB raw layer via the same adapter interface. dbt models are unchanged — they read from the raw layer regardless of source.


Resolved Decisions

Present Layer — Component Reuse Strategy

Decision: Data-only reuse. @services/present (PPTX/PDF), @services/slides (Slidev PDF/PPTX/Web), and @services/dashboard (React/Tailwind) consume the same Report type from report.json. No visual logic is shared — each layer is optimized for its medium. The PresentAdapter interface allows future backends (Google Slides, NotebookLM) to be added without changing the Format layer.

On this page