Atlas Plan
Plans003 2026 02 20 Transform Layer

Phase 1: Python/dbt Scaffold

  • Purpose: Set up the Python project, dbt configuration, and directory structure

T-001 - Create @packages/transform scaffold

Create all configuration files: pyproject.toml, requirements.txt, profiles.yml, dbt_project.yml, and directory structure.

  • Status: completed
  • Priority: P0
  • Dependencies: none

Acceptance

  • @packages/transform/pyproject.toml exists with [project] name atlas-transform, requires dbt-core and dbt-duckdb
  • requirements.txt lists dbt-core>=1.8 and dbt-duckdb>=1.8
  • profiles.yml targets dev → DuckDB at ../../atlas.db, 4 threads
  • dbt_project.yml: name atlas, staging +materialized: view, intermediate +materialized: view, marts +materialized: table
  • Directory structure created: models/staging/, models/intermediate/, models/marts/, tests/
  • uv venv && uv pip install -r requirements.txt succeeds
  • uv run dbt debug connects to atlas.db without error

Files

  • @packages/transform/pyproject.toml
  • @packages/transform/requirements.txt
  • @packages/transform/profiles.yml
  • @packages/transform/dbt_project.yml

Phase 2: Raw Source Declarations

  • Purpose: Register raw tables as dbt sources for lineage tracking and ref() usage

T-002 - Create sources.yml

Declare all 5 raw tables as dbt sources so they can be referenced with {{ source('raw', 'raw_transactions') }} in staging models.

  • Status: completed
  • Priority: P0
  • Dependencies: T-001

Acceptance

  • models/staging/sources.yml exists
  • Declares source raw with database atlas, schema main
  • Tables declared: raw_transactions, raw_students, raw_organizations, raw_targets, raw_marketing_activity
  • Each table has a description

Files

  • @packages/transform/models/staging/sources.yml

Phase 3: Staging Models

  • Purpose: Clean and rename raw columns to English; handle per-year column variations via COALESCE

T-003 - stg_transactions.sql

Staging model for LAPORAN EVALUASI monthly sheets. Handles 2026 new columns (NO. KWITANSI, NAMA KELAS) and deduplicates on receipt number.

  • Status: completed
  • Priority: P0
  • Dependencies: T-002

Acceptance

  • Selects from {{ source('raw', 'raw_transactions') }}
  • receipt_number: uses NO. KWITANSI when present, with deterministic fallback key for prior-year rows without receipt numbers
  • variant_name: COALESCE("NAMA KELAS", "PROGRAM") — 2026 has NAMA KELAS, prior years use PROGRAM
  • All other columns mapped per analytics.md stg_transactions spec
  • TRY_CAST on JUMLAHamount (DECIMAL 18,2)
  • TRY_CAST on TANGGALdate (DATE)
  • WHERE "NO. KWITANSI" IS NOT NULL OR "PROGRAM" IS NOT NULL to filter empty rows
  • QUALIFY ROW_NUMBER() OVER (PARTITION BY receipt_number ORDER BY date) = 1 deduplication

Files

  • @packages/transform/models/staging/stg_transactions.sql

T-004 - stg_students.sql

Staging model for DATABASE SISWA monthly sheets. Handles 2026 split of BULAN/INTAKEBULAN + INTAKE, renamed TANGGAL column, and added JENIS KELAMIN.

  • Status: completed
  • Priority: P0
  • Dependencies: T-002

Acceptance

  • Selects from {{ source('raw', 'raw_students') }}
  • period_label: COALESCE("INTAKE", "BULAN/INTAKE") — 2026 has separate INTAKE col; prior years use combined
  • enrollment_date: TRY_CAST(COALESCE("TANGGAL", "TANGGAL/HARI/JAM") AS DATE)
  • sex_code: "JENIS KELAMIN" (NULL for 2023 which lacks this column)
  • All other columns per analytics.md stg_students spec
  • WHERE "NAMA" IS NOT NULL

Files

  • @packages/transform/models/staging/stg_students.sql

T-005 - stg_targets.sql

Staging model for the EVALUASI DATA SISWA sheet (2026+) and fallback for prior years without this sheet.

  • Status: completed
  • Priority: P1
  • Dependencies: T-002

Acceptance

  • Selects from {{ source('raw', 'raw_targets') }}
  • All columns per analytics.md stg_targets spec with TRY_CAST for all numeric columns
  • WHERE "TENANT" IS NOT NULL

Files

  • @packages/transform/models/staging/stg_targets.sql

T-006 - stg_marketing_activity.sql

Staging model for REKAP monthly sheets. Maps channel lead columns with resilient fallbacks; weekly decomposition is deferred until raw REKAP headers are stabilized.

  • Status: completed
  • Priority: P1
  • Dependencies: T-002

Acceptance

  • Selects from {{ source('raw', 'raw_marketing_activity') }}
  • All columns per analytics.md stg_marketing_activity spec
  • Uses tenant fallback when TENANT is absent in raw sheets and filters non-data rows

Files

  • @packages/transform/models/staging/stg_marketing_activity.sql

T-007 - Staging schema tests (schema.yml)

Define dbt schema tests for all staging models: not_null, unique, accepted_values.

  • Status: completed
  • Priority: P1
  • Dependencies: T-003, T-004, T-005, T-006

Acceptance

  • models/staging/schema.yml exists
  • stg_transactions: receipt_number not_null; amount not_null; unit_code accepted_values (all known unit codes)
  • stg_students: full_name not_null; unit_code accepted_values
  • stg_targets: unit_code not_null, unique
  • uv run dbt test --select staging passes

Files

  • @packages/transform/models/staging/schema.yml

Phase 4: Intermediate Models

  • Purpose: Apply business logic — joins, customer type classification, FK resolution, enrollment filtering

T-008 - int_orders.sql

Joins stg_transactions × stg_students to produce one enriched row per transaction. Normalizes unit codes and customer types. Uses receipt number join where available.

  • Status: completed
  • Priority: P0
  • Dependencies: T-003, T-004

Acceptance

  • LEFT JOINs stg_students on receipt_number (where not null) OR student_name match
  • unit_code_normalized: CASE mapping WLC INGGRISWLC_ENGLISH, WLC NON INGGRISWLC_NON_ENGLISH, else as-is
  • customer_type: CASE mapping BNew, LRenewal, AAlumni
  • period_year and period_month extracted from period_label using Indonesian month name CASE
  • All columns from both staging models included
  • WHERE receipt_number IS NOT NULL

Files

  • @packages/transform/models/intermediate/int_orders.sql

T-009 - int_enrollments.sql

Filters int_orders to enrollment-type rows only. Excludes book sales, event fees, zero-amount rows.

  • Status: completed
  • Priority: P0
  • Dependencies: T-008

Acceptance

  • Selects from {{ ref('int_orders') }}
  • Excludes all non-enrollment product_category values per analytics.md exclusion list
  • Excludes amount <= 0
  • All int_orders columns passed through unchanged

Files

  • @packages/transform/models/intermediate/int_enrollments.sql

Phase 5: Mart Models

  • Purpose: Pre-aggregate to analysis-ready tables powering the Format layer and dashboard

T-010 - mart_revenue.sql

Revenue totals per unit per period. Joined with targets at query time in the Format layer (not in the mart itself).

  • Status: completed
  • Priority: P0
  • Dependencies: T-009

Acceptance

  • Groups by unit_code_normalized, period_year, period_month
  • revenue_actuals: SUM(amount)
  • transaction_count: COUNT(DISTINCT receipt_number)
  • period: formatted YYYY-MM string
  • Selects from {{ ref('int_orders') }} (not just enrollments — revenue includes all order types)

Files

  • @packages/transform/models/marts/mart_revenue.sql

T-011 - mart_program_progress.sql

Student and order counts per program per period, broken down by customer type. Powers Key Comparison and Program Progress slides.

  • Status: completed
  • Priority: P0
  • Dependencies: T-009

Acceptance

  • Groups by unit_code_normalized, product_name, product_category, course_type_name, period_year, period_month, customer_type
  • order_count: COUNT(DISTINCT receipt_number)
  • student_count: COUNT(DISTINCT student_name)
  • revenue: SUM(amount)
  • Selects from {{ ref('int_enrollments') }}

Files

  • @packages/transform/models/marts/mart_program_progress.sql

T-012 - mart_channel_marketing.sql

Closings and student counts per channel per period with contribution percentage. Powers Channel Marketing slide.

  • Status: completed
  • Priority: P0
  • Dependencies: T-009

Acceptance

  • Groups by unit_code_normalized, channel_name, period_year, period_month, customer_type
  • closings: COUNT(DISTINCT receipt_number)
  • students: COUNT(DISTINCT student_name)
  • contribution_pct: window function over unit × period partition, rounded to 2 decimal places
  • WHERE channel_name IS NOT NULL
  • Selects from {{ ref('int_enrollments') }}

Files

  • @packages/transform/models/marts/mart_channel_marketing.sql

T-013 - mart_school_progress.sql

Student counts per organization per year (annual, not monthly). Powers School Progress slide.

  • Status: completed
  • Priority: P0
  • Dependencies: T-009

Acceptance

  • Groups by unit_code_normalized, organization_name, period_year
  • student_count: COUNT(DISTINCT student_name)
  • WHERE organization_name IS NOT NULL
  • No period_month grouping — school analysis is annual

Files

  • @packages/transform/models/marts/mart_school_progress.sql

T-014 - Mart schema tests

Define schema tests for mart models.

  • Status: completed
  • Priority: P1
  • Dependencies: T-010, T-011, T-012, T-013

Acceptance

  • models/marts/schema.yml exists
  • mart_revenue: unit_code not_null, revenue_actuals not_null
  • mart_program_progress: unit_code not_null, order_count not_null
  • mart_channel_marketing: unit_code not_null, channel_name not_null
  • mart_school_progress: unit_code not_null, organization_name not_null
  • uv run dbt test --select marts passes

Files

  • @packages/transform/models/marts/schema.yml

Phase 6: Verification

  • Purpose: Run the full dbt pipeline against real data and confirm all models and tests pass

T-015 - Full dbt run and test verification

Run the complete dbt pipeline against real 2026 data loaded by Plan 002. Confirm all models compile and run, all tests pass.

  • Status: completed
  • Priority: P0
  • Dependencies: T-007, T-009, T-014

Acceptance

  • uv run dbt run completes without errors (all 10 models: 4 staging, 2 intermediate, 4 marts)
  • uv run dbt test passes all tests
  • mart_revenue has rows with real unit codes (TM, WLC_ENGLISH, WLC_NON_ENGLISH, etc.)
  • mart_program_progress has rows with real program names
  • mart_channel_marketing rows include KBM, Sosial Media channels
  • mart_school_progress rows include organization names from 2026 data

T-016 - Update AGENTS.md with transform commands

Add transform commit scope and dbt CLI usage to AGENTS.md.

  • Status: completed
  • Priority: P1
  • Dependencies: T-015

Acceptance

  • AGENTS.md commit scopes list includes transform
  • AGENTS.md ## Commands section shows dbt usage (uv run dbt run, uv run dbt test)

Files

  • AGENTS.md

On this page