Atlas Plan

Completed

T-001: Create @plan/audit/ and sheet index — done in planning session
T-002: Create per-file audit snapshots — done in planning session
T-003: Fix ions-2026.yaml
T-004: Fix ions-2025.yaml and ions-2024.yaml
T-005: Fix ions-2023.yaml
T-006: Define canonical CSV schema and extend SourceConfig type
T-007: Implement extract/reader.ts and extract/referensi.ts
T-008: Implement extract/writer.ts
T-009: Implement extract/index.ts and wire CLI
T-010: Extract all years to @source/clean/ and commit
T-011: Refactor load/csv.ts to read from @source/clean/
T-012: Implement validate/checks.ts
T-013: Implement validate/index.ts and wire CLI
T-014: Remove stg_marketing_activity and raw_marketing_activity
T-015: Add is_valid and invalid_reason to int_orders.sql
T-016: Filter int_enrollments to is_valid = true
T-017: Create audit_flagged_orders dbt view
T-018: Add dbt schema tests for is_valid columns
T-019a: Update extract step — amount formatting, date ISO, identity → id_raw
T-019b: Simplify staging SQL — direct English refs, identity split, period parsing
T-019c: Final end-to-end verification

In Progress

None

Cancelled

T-019: Superseded by T-019a/b/c (all completed)

Learnings

2023 is the ONLY year with a 2-row merged header in transactions.
xlsx autoFilter is display-only — read ALL rows regardless of hidden state.
stg_marketing_activity was a dead model. Removed with zero downstream impact.
raw_students bloat was caused by OLAH/SUMMARY sheets being loaded. Fixed via CSV layer.
raw_targets had 702 garbage columns due to wrong header row. Fixed via merge_header_rows config.
Staging SQL column name mismatch: stg_transactions/stg_students look up Indonesian names (TENANT, NAMA SISWA, JUMLAH) but raw tables now have English canonical names (unit, student, amount). All mart models are 0-row as a result. T-019b fixes this.
amount in CSVs is "100,000" (thousands-separator comma). TRY_CAST fails silently. Fix: T-019a strips commas in extract → plain "100000" string.
188 students appear with >1 distinct identity value across time. Causes: 93 formatting differences, 85 data entry errors, 9 genuine KTP/Kartu Pelajar transitions (student turned 17).
Identity columns decided: id_raw in CSV (raw string); id_ktp, id_kp, id_sim, id_pass, id_nis parsed in staging SQL. No merging/deduplication — separate rows. dim_students deferred.
birth_date in students is a text string ("18 Desember 2003") — NOT a machine date. Do not apply date formatting to it.
Period parsing (JANUARI 2026 → year/month integers) moved to staging SQL (not CSV).
Unit code normalization and customer type translation (B→New) stay in staging SQL.
Clean CSV boundary: amount (plain number), date (ISO), id_raw (raw string). That's all.
Direct canonical-column refs in staging remove the alias mismatch and restore populated marts/format outputs.
For mixed-year verification, sync must use append mode on subsequent years (--mode append) or later loads replace earlier raw data.
dbt accepted-values tests surfaced mixed-case Real variants, so unit normalization now runs in stg_transactions too.
Final closure tasks can be evidenced directly from merged PR artifacts (#19) and merge SHA traces in Progress logs.

Questions

None — all resolved.

Plan complete. No remaining work.

Completed

In Progress

Cancelled

Learnings

Questions

Next

On this page