Analytics

The analytics layer for Atlas. Covers the DuckDB analytical database, the dbt transform pipeline, and all mart definitions.

For the overall pipeline context see architecture.md. For the operational database schema see model.md.

Overview

The analytics layer is a Modern Data Stack pipeline built on DuckDB and dbt-core. It implements the ELT pattern — raw source data is loaded first, then transformed through layered SQL models into analysis-ready marts that power report assembly (report.json) and presentation generation.

DuckDB

Role

DuckDB is the analytical engine. It runs in-process — no server required. All four transform layers (raw, staging, intermediate, mart) live in a single DuckDB file (atlas.db).

Why DuckDB

Capability	Usage in Atlas
CSV loading	`read_csv_auto()` loads canonical CSV snapshots extracted by the pipeline
In-process OLAP	Runs embedded in pipeline/format/presentation scripts without a database server
Fast aggregations	Revenue totals, year-over-year comparisons, cohort counts — vectorized execution
Full SQL	Window functions, CTEs, PIVOT, UNPIVOT, QUALIFY — used extensively in mart models
dbt compatible	`dbt-duckdb` adapter is the official DuckDB dbt backend
Local file	`atlas.db` — no network dependency for local development

Connection

import { DuckDBInstance } from '@duckdb/node-api'

const instance = await DuckDBInstance.create('atlas.db')
const conn = await instance.connect()

Raw layer setup

Each xlsx source file is first extracted to canonical CSV snapshots (via SheetJS in @packages/pipeline), then loaded into DuckDB via read_csv_auto. Tables are created or replaced on each load run — the raw layer always reflects the latest source data.

-- Transactions ledger (from canonical CSV)
CREATE OR REPLACE TABLE raw_transactions AS
SELECT * FROM read_csv_auto(
  '@source/clean/ions-2026-transactions-*.csv'
);

-- Student database (from canonical CSV)
CREATE OR REPLACE TABLE raw_students AS
SELECT * FROM read_csv_auto(
  '@source/clean/ions-2026-students-*.csv'
);

dbt Project

Setup

The dbt project lives at @python/analytics/. It uses the dbt-duckdb adapter and is managed via uv.

# Install
cd @python/analytics
uv sync

# Run all models
uv run dbt run

# Run specific layer
uv run dbt run --select staging
uv run dbt run --select intermediate
uv run dbt run --select marts

# Test
uv run dbt test

# Generate and serve docs
uv run dbt docs generate
uv run dbt docs serve

`profiles.yml`

atlas:
  target: dev
  outputs:
    dev:
      type: duckdb
      path: ../../atlas.db
      threads: 4

`dbt_project.yml`

name: atlas
version: '1.0.0'
profile: atlas

model-paths: ['models']
test-paths: ['tests']

models:
  atlas:
    staging:
      +materialized: view
    intermediate:
      +materialized: view
    marts:
      +materialized: table

Staging and intermediate models are materialized as views (no storage cost, always fresh). Mart models are materialized as tables (pre-aggregated, fast reads for the Format and presentation layers).

Transform Layers

Layer 1 — Raw (`raw_*`)

Loaded from canonical CSV snapshots in @source/clean/. Column names are already in English (normalized during the extract step). No type casting, no filtering at this stage. Validation-stage QA exports write monthly flagged rows to @source/invalid/ with reason metadata while preserving raw tables unchanged.

Table	Source file	Source sheet
`raw_transactions`	LAPORAN EVALUASI IONs [year] TM-WLC.xlsx	Monthly ledger sheets (JANUARI, FEBRUARI, …)
`raw_students`	DATABASE SISWA IONS [year] TM-WLC.xlsx	Monthly enrollment sheets (JANUARI26, FEBRUARI26, …)
`raw_organizations`	DATABASE SISWA IONS [year] TM-WLC.xlsx	REFERENSI sheet
`raw_targets`	LAPORAN EVALUASI IONs [year] TM-WLC.xlsx	EVALUASI DATA SISWA sheet

Layer 2 — Staging (`stg_*`)

Light cleaning only. Column names are already in English (canonical CSV headers). Types cast. Rows deduplicated. No joins, no business logic.

Naming rule: one staging model per raw table. raw_transactions → stg_transactions.

`stg_transactions`

Raw CSV columns are already English canonical names. Staging applies deduplication and type casting.

SELECT
    receipt                     AS receipt_number,
    TRY_CAST(date AS DATE)      AS date,
    unit                        AS unit_code,
    program                     AS product_name,
    class                       AS variant_name,
    intake                      AS period_label,
    student                     AS student_name,
    TRY_CAST(amount AS DECIMAL(18,2)) AS amount,
    channel                     AS payment_channel,
    method                      AS payment_method_name,
    type                        AS customer_type_name,
    status                      AS payment_status_name,
    category                    AS product_category,
    barsheet                    AS batch_ref
FROM raw_transactions
WHERE receipt IS NOT NULL
QUALIFY ROW_NUMBER() OVER (PARTITION BY receipt ORDER BY date) = 1

`stg_students`

SELECT
    student                     AS full_name,
    date                        AS enrollment_date,
    organization                AS organization_name,
    unit                        AS unit_code,
    age_cluster,
    course_type                 AS course_type_name,
    program                     AS product_name,
    class                       AS variant_name,
    intake                      AS period_label,
    source                      AS channel_name,
    type                        AS customer_type_name,
    payment                     AS payment_channel,
    sex                         AS sex_code,
    id_raw
FROM raw_students
WHERE student IS NOT NULL

`stg_targets`

SELECT
    "TENANT"                              AS unit_code,
    TRY_CAST("TARGET REVENUE" AS DECIMAL(18,2))  AS revenue_target,
    TRY_CAST("TARGET SISWA STANDAR" AS DECIMAL)  AS student_target,
    TRY_CAST("REALISASI REVENUE" AS DECIMAL(18,2)) AS revenue_actuals,
    TRY_CAST("REALISASI SISWA RIIL" AS INTEGER)  AS student_actuals
FROM raw_targets
WHERE "TENANT" IS NOT NULL

Layer 3 — Intermediate (`int_*`)

Business logic applied. Joins across staging tables. FK resolution. Customer type classification. No aggregation.

`int_orders`

Joins transactions with students to produce one row per order with all relevant dimensions.

WITH base AS (
    SELECT
        t.receipt_number,
        t.date,
        t.unit_code,
        t.product_name,
        t.variant_name,
        t.amount,
        t.payment_channel,
        t.payment_method_name,
        t.payment_status_name,
        t.customer_type_name,
        t.product_category,
        t.batch_ref,
        s.organization_name,
        s.age_cluster,
        s.course_type_name,
        s.channel_name,
        s.sex_code,
        s.full_name AS student_name,
        -- Extract period year and month from period_label
        CAST(SPLIT_PART(t.period_label, ' ', 2) AS INTEGER) AS period_year,
        CASE SPLIT_PART(t.period_label, ' ', 1)
            WHEN 'JANUARI'   THEN 1  WHEN 'FEBRUARI'  THEN 2
            WHEN 'MARET'     THEN 3  WHEN 'APRIL'     THEN 4
            WHEN 'MEI'       THEN 5  WHEN 'JUNI'      THEN 6
            WHEN 'JULI'      THEN 7  WHEN 'AGUSTUS'   THEN 8
            WHEN 'SEPTEMBER' THEN 9  WHEN 'OKTOBER'   THEN 10
            WHEN 'NOVEMBER'  THEN 11 WHEN 'DESEMBER'  THEN 12
        END AS period_month,
        -- Normalize customer type
        CASE customer_type_name
            WHEN 'B' THEN 'New'
            WHEN 'L' THEN 'Renewal'
            WHEN 'A' THEN 'Alumni'
        END AS customer_type,
        -- Normalize unit code
        CASE unit_code
            WHEN 'WLC INGGRIS'     THEN 'WLC_ENGLISH'
            WHEN 'WLC NON INGGRIS' THEN 'WLC_NON_ENGLISH'
            ELSE unit_code
        END AS unit_code_normalized
    FROM stg_transactions t
    LEFT JOIN stg_students s
        ON t.student_name = s.full_name
        AND t.period_label = s.period_label
)
-- Quality gate: is_valid + invalid_reason computed downstream
SELECT * FROM base
WHERE receipt_number IS NOT NULL

`int_enrollments`

Filters int_orders to Enrollment-type orders only (excludes book sales, event fees, etc.).

SELECT *
FROM int_orders
WHERE product_category NOT IN ('BUKU REAL', 'BUKU WLC', 'BUKU MANDARIN', 'EVENT - WLC',
                                'EVENT - ICC', 'SIMULASI JLPT', 'SIMULASI HSK',
                                'KIRIM - REAL', 'KIRIM - TM', 'TOP UP POIN')
  AND amount > 0

Layer 4 — Marts (`mart_*`)

Pre-aggregated, analysis-ready tables. One mart per report section. All mart models are materialized as tables for fast Format and presentation-layer reads.

Runtime boundary note:

Dashboard runtime no longer reads DuckDB marts directly.
Dashboard data is served by @services/api from Turso/LibSQL.
DuckDB marts remain the source for report generation and presentation workflows.

`mart_revenue`

Powers the Revenue Comparison slide and dashboard revenue view.

SELECT
    unit_code_normalized                                    AS unit_code,
    period_year,
    period_month,
    SUM(amount)                                             AS revenue_actuals,
    COUNT(DISTINCT receipt_number)                          AS transaction_count,
    -- Period labels for presentation layer
    period_year || '-' || LPAD(period_month::VARCHAR, 2, '0') AS period
FROM int_orders
GROUP BY unit_code_normalized, period_year, period_month

Joined with targets at query time (in the Format layer):

SELECT
    r.*,
    t.revenue_target,
    t.revenue_target - r.revenue_actuals AS gap_to_target,
    ROUND(r.revenue_actuals / t.revenue_target * 100, 2) AS achievement_pct
FROM mart_revenue r
LEFT JOIN stg_targets t
    ON r.unit_code = t.unit_code
    AND r.period_year = {{ var("year") }}
    AND r.period_month = {{ var("month") }}
WHERE r.period_year = {{ var("year") }}
  AND r.period_month = {{ var("month") }}
  AND r.unit_code = '{{ var("unit") }}'

`mart_program_progress`

Powers the Key Comparison and Program Progress slides.

SELECT
    unit_code_normalized                AS unit_code,
    product_name,
    product_category,
    course_type_name,
    period_year,
    period_month,
    customer_type,
    COUNT(DISTINCT receipt_number)      AS order_count,
    COUNT(DISTINCT student_name)        AS student_count,
    SUM(amount)                         AS revenue
FROM int_enrollments
GROUP BY
    unit_code_normalized,
    product_name,
    product_category,
    course_type_name,
    period_year,
    period_month,
    customer_type

`mart_channel_marketing`

Powers the Channel Marketing slide and dashboard channels view.

SELECT
    unit_code_normalized                AS unit_code,
    channel_name,
    period_year,
    period_month,
    customer_type,
    COUNT(DISTINCT receipt_number)      AS closings,
    COUNT(DISTINCT student_name)        AS students,
    ROUND(
        COUNT(DISTINCT receipt_number) * 100.0 /
        SUM(COUNT(DISTINCT receipt_number)) OVER (
            PARTITION BY unit_code_normalized, period_year, period_month
        ), 2
    )                                   AS contribution_pct
FROM int_enrollments
WHERE channel_name IS NOT NULL
GROUP BY
    unit_code_normalized,
    channel_name,
    period_year,
    period_month,
    customer_type

`mart_school_progress`

Powers the School Progress slide — shows which organizations produced students each year.

SELECT
    unit_code_normalized                AS unit_code,
    organization_name,
    period_year,
    COUNT(DISTINCT student_name)        AS student_count
FROM int_enrollments
WHERE organization_name IS NOT NULL
GROUP BY
    unit_code_normalized,
    organization_name,
    period_year

Pivoted at query time in the Format layer to produce the year × organization matrix shown in the presentation.

dbt Tests

Each model has built-in dbt tests defined in schema.yml files.

# models/staging/schema.yml
models:
  - name: stg_transactions
    columns:
      - name: receipt_number
        tests:
          - not_null
          - unique
      - name: amount
        tests:
          - not_null
      - name: unit_code
        tests:
          - accepted_values:
              values: ['REAL', 'WLC INGGRIS', 'WLC NON INGGRIS', 'TM', 'NP', 'ICC', 'ICA', 'DTC', 'IONs']

  - name: stg_targets
    columns:
      - name: unit_code
        tests:
          - not_null
          - unique

Lineage

Full data lineage from source to mart:

Analytics

On this page