Progress
2026-02-21 21:43 - T-003
Overview: Updated 2026 source config for targets merged headers, referensi typing, and full REKAP removal.
Completed:
- fix(pipeline): set
merge_header_rowsanddata_start_rowfor 2026 targets sheet - refactor(pipeline): remove 2026 marketing activity config block
- refactor(pipeline): add explicit
typefield for 2026 file configs
Files:
- @source/config/ions-2026.yaml
2026-02-21 21:43 - T-004
Overview: Cleaned 2024 and 2025 YAML configs by removing REKAP entries and normalizing typed organizations entries.
Completed:
- refactor(pipeline): remove marketing activity config from 2024 and 2025 YAML files
- refactor(pipeline): add explicit
typemetadata to transactions/students/organizations for 2024 and 2025
Files:
- @source/config/ions-2025.yaml
- @source/config/ions-2024.yaml
2026-02-21 21:43 - T-005
Overview: Fixed 2023 transaction header structure to use merged rows and removed all marketing REKAP config.
Completed:
- fix(pipeline): configure 2023 transactions with
merge_header_rows: [6, 7] - fix(pipeline): map 2023 merged transaction sub-columns (
PROGRAM,INTAKE,NAMA SISWA,JUMLAH) - refactor(pipeline): remove 2023 marketing activity config section entirely
Files:
- @source/config/ions-2023.yaml
2026-02-21 21:43 - T-006
Overview: Added canonical extract schema constants and extended SourceConfig typing for merged headers and referensi file type.
Completed:
- feat(pipeline): add canonical CSV column definitions in extract schema module
- type(pipeline): extend
SourceFileConfigwithtype,merge_header_rows, anddata_start_row - test(pipeline): verify TypeScript integrity with
pnpm --filter @packages/pipeline test:type
Files:
- @packages/pipeline/@source/extract/schema.ts
- @packages/pipeline/@source/config.ts
2026-02-21 22:04 - T-007
Overview: Implemented SheetJS readers for regular/merged-header sheets and REFERENSI unpivot extraction with canonical row outputs.
Completed:
- feat(pipeline): add
readSheetwith single-row and two-row merged header support - feat(pipeline): add students
T/T/Lsplit intobirth_placeandbirth_date - feat(pipeline): add
readReferensiSheetmatrix unpivot for organizations - test(pipeline): add extract reader tests for single-row header, merged header, missing columns, birth split, and referensi unpivot
Decisions:
- Keep mapping generic by inverting YAML source labels and using canonical header sets from
schema.ts. - For merged headers with ambiguous sub-labels (e.g. duplicated labels), fall back to outer-label candidates by column position.
Learnings:
- In
REFERENSI,ORGANISASIcan be positioned at column D while age cluster labels remain in column B; parser must not assume same column.
Files:
- @packages/pipeline/@source/extract/reader.ts
- @packages/pipeline/@source/extract/referensi.ts
- @packages/pipeline/@source/tests/extract.test.ts
2026-02-21 22:04 - T-008
Overview: Implemented CSV writer with frozen-snapshot overwrite guard and proper CSV escaping semantics.
Completed:
- feat(pipeline): add
writeCsvwith default no-overwrite behavior - feat(pipeline): add RFC-compatible CSV escaping for commas, quotes, and newlines
- test(pipeline): add writer tests for overwrite guard, quoting edge cases, and header-only files
Files:
- @packages/pipeline/@source/extract/writer.ts
- @packages/pipeline/@source/tests/writer.test.ts
2026-02-21 22:04 - T-009
Overview: Replaced extract stub with full orchestrator, wired CLI extract command, and verified normal/overwrite execution.
Completed:
- feat(pipeline): implement
extractAll(config, { cleanDir, overwrite })orchestration - feat(pipeline): implement filename resolution for monthly, targets period-label month, and organizations outputs
- feat(pipeline): wire
extractCLI path with--overwriteflag support - test(pipeline): extend CLI arg parsing tests for overwrite flag
- chore(pipeline): run extraction CLI for 2026 in skip mode and overwrite mode to validate behavior
Learnings:
- Root extraction command routes through
pnpm --filter @packages/pipeline start -- extract, and boolean flags need explicit parsing in CLI token handling. - Current 2026 organizations extraction yields 25 rows after referensi parser correction.
Files:
- @packages/pipeline/@source/extract/index.ts
- @packages/pipeline/@source/index.ts
- @packages/pipeline/@source/tests/config.test.ts
2026-02-21 22:15 - T-011
Overview: Refactored load step to read frozen CSV snapshots from @source/clean/ and removed marketing activity loading.
Completed:
- refactor(pipeline): replace xlsx
read_xlsxload path withread_csv_autotemp-table loading - refactor(pipeline): align CSV filename resolution with extract conventions for monthly, targets, and organizations files
- refactor(pipeline): switch CLI load default source to csv and wire
syncCsv - fix(pipeline): drop stale
raw_marketing_activityandraw_referensitables during sync bootstrap - chore(pipeline): verify
pnpm run sync -- --entity IONS --year 2026and validate raw table row counts/columns
Learnings:
COUNT(*)from DuckDB can surface as bigint/string depending on context; row-count utilities must normalize types before comparisons.- Header-only CSV handling requires explicit empty-sheet guards, otherwise replace mode can zero out destination tables.
Files:
- @packages/pipeline/@source/load/csv.ts
- @packages/pipeline/@source/index.ts
- @packages/pipeline/@source/duck.ts
2026-02-21 22:17 - Amendment
Overview: Added explicit boundary evidence for TypeScript verification coverage across completed extract/load tasks.
Changes:
- boundary: Recorded that
pnpm --filter @packages/pipeline test:typewas run after TypeScript changes spanning T-007 through T-011. - task: Compliance documentation updated to capture boundary proof for tasks that previously referenced tests without explicit
test:typemention.
Rationale:
- Sub-agent checkpoint flagged missing explicit per-task boundary evidence despite passing typechecks.
- Capturing explicit command evidence in append-only history improves compliance traceability across sessions.
2026-02-21 22:19 - T-012
Overview: Implemented typed validation checks catalog for raw-layer quality rules with unit tests.
Completed:
- feat(pipeline): add
ValidationChecktype andVALIDATION_CHECKSdefinitions in validate checks module - feat(pipeline): include required empty, schema, bloat, year, unknown unit, and negative amount checks
- test(pipeline): add validate checks tests for severity mapping and core SQL clause coverage
- test(pipeline): verify
pnpm --filter @packages/pipeline test:typeandpnpm --filter @packages/pipeline test
Files:
- @packages/pipeline/@source/validate/checks.ts
- @packages/pipeline/@source/tests/validate.test.ts
2026-02-21 22:21 - T-013
Overview: Implemented validation runner orchestration with grouped console output, JSON report writing, and CLI integration.
Completed:
- feat(pipeline): add
validateAllorchestrator to execute all checks against DuckDB and collect structured results - feat(pipeline): write JSON report to
output/validation/{entity}-{year}-validation.json - feat(pipeline): wire validate command in CLI with
--year+ config loading and exit code behavior - test(pipeline): verify
pnpm --filter @packages/pipeline test:typeandpnpm --filter @packages/pipeline test - chore(pipeline): run
pnpm run validate -- --entity IONS --year 2026and confirm report generation
Files:
- @packages/pipeline/@source/validate/index.ts
- @packages/pipeline/@source/index.ts
2026-02-21 22:24 - T-014
Overview: Removed dead marketing staging source/model and validated dbt graph health.
Completed:
- refactor(analytics): delete
stg_marketing_activitymodel - refactor(analytics): remove
raw_marketing_activitysource declaration - test(analytics): run
uv run dbt runsuccessfully after removal - test(analytics): run
uv run dbt testsuccessfully after removal
Learnings:
- Current 2026 dataset has zero non-null
channel_namerows inint_enrollments, somart_channel_marketingmaterializes with 0 rows but remains structurally healthy.
Files:
- @python/analytics/models/staging/stg_marketing_activity.sql (deleted)
- @python/analytics/models/staging/sources.yml
2026-02-21 22:26 - T-015
Overview: Added is_valid and invalid_reason flags in int_orders with ordered data-quality rule evaluation.
Completed:
- feat(analytics): add ordered validity CASE logic for null/invalid period, invalid amount, and unknown unit
- feat(analytics): add
invalid_reasonreason codes aligned to validity checks - test(analytics): run
uv run dbt run --select int_ordersand verify no nullis_valid
Files:
- @python/analytics/models/intermediate/int_orders.sql
2026-02-21 22:26 - T-016
Overview: Filtered intermediate enrollments to valid orders only.
Completed:
- refactor(analytics): add
AND is_valid = truetoint_enrollments - test(analytics): run
uv run dbt runanduv run dbt testafter validity filter rollout
Files:
- @python/analytics/models/intermediate/int_enrollments.sql
2026-02-21 22:26 - T-017
Overview: Added audit_flagged_orders view model for invalid order inspection.
Completed:
- feat(analytics): create
audit_flagged_ordersview fromint_ordersinvalid rows - docs(analytics): add intermediate schema metadata for new audit model
- test(analytics): run
uv run dbt run --select audit_flagged_orders
Files:
- @python/analytics/models/intermediate/audit_flagged_orders.sql
- @python/analytics/models/intermediate/schema.yml
2026-02-21 22:26 - T-018
Overview: Added schema tests to enforce validity flag contract.
Completed:
- test(analytics): add
not_nulltest forint_orders.is_valid - test(analytics): add conditional
not_nulltest forinvalid_reasonwhenis_valid = false - test(analytics): run
uv run dbt run && uv run dbt testwith new tests passing
Files:
- @python/analytics/models/intermediate/schema.yml
2026-02-21 22:26 - Amendment
Overview: Added explicit Phase 6 boundary verification evidence for full dbt run/test coverage across T-014 through T-018.
Changes:
- boundary: Recorded that
uv run dbt run && uv run dbt testwas executed after Phase 6 model/test changes were completed. - task: Supplemented targeted task-level verification with end-of-phase full-suite validation evidence.
Rationale:
- Compliance review flagged delayed per-task boundary evidence even though full validation passed by phase end.
- Explicit phase-level run/test evidence preserves strict auditability without rewriting append-only task entries.
2026-02-21 22:28 - T-019
Overview: Completed all end-to-end verification steps except format-content validation; task is blocked pending boundary decision.
Completed:
- test(pipeline): verified extract frozen-snapshot skip behavior
- test(pipeline): verified CSV sync row counts and absence of
raw_marketing_activity - test(pipeline): verified validate command output and JSON report generation
- docs(*): updated root/pipeline/analytics AGENTS files for extract/validate/is_valid changes
Blockers:
pnpm run format -- --entity IONS --period 2026-02produces an emptyunitspayload because mart models are zero-row.- Resolving likely requires edits to
stg_transactions.sqlandstg_students.sqlalias handling, which is restricted by an Ask-first boundary in Plan.md.
Files:
- AGENTS.md
- @packages/pipeline/AGENTS.md
- @python/analytics/AGENTS.md
2026-02-22 11:24 - T-019a
Overview: Implemented extract-level amount/date normalization and identity column rename, then re-extracted all years.
Completed:
- feat(pipeline): add
formatAmountandformatDatein extract reader, apply to transaction amount/date and student datetime fields - refactor(pipeline): rename students canonical
identitycolumn toid_rawin schema and alias mapping - test(pipeline): add unit coverage for amount/date formatting edge cases in
extract.test.ts - test(pipeline): run
pnpm --filter @packages/pipeline test:typeandpnpm --filter @packages/pipeline test - chore(data): run extract overwrite for 2023, 2024, 2025, 2026 and refresh
@source/clean/snapshots - verify(data): spot-check 2026 transactions (
amountplain numeric,dateISO) and students header (id_raw)
Pending:
- Commit updated CSV snapshots (awaiting explicit user commit request)
Files:
- @packages/pipeline/@source/extract/reader.ts
- @packages/pipeline/@source/extract/schema.ts
- @packages/pipeline/@source/tests/extract.test.ts
- @source/clean/*.csv
2026-02-22 11:24 - T-019b
Overview: Rewrote staging SQL to use direct canonical columns, moved period/customer parsing into staging, and split student identity fields.
Completed:
- refactor(analytics): replace dynamic raw-column macros with direct canonical refs in
stg_transactionsandstg_students - feat(analytics): add
period_year,period_month, and translatedcustomer_typeoutputs in staging - feat(analytics): add identity passthrough/split columns (
id_raw,id_ktp,id_kp,id_sim,id_pass,id_nis) instg_students - refactor(analytics): simplify
int_ordersto consume staging period/customer fields and pass through identity columns - docs(analytics): add identity columns to
models/intermediate/schema.yml - test(analytics): run
uv run dbt run && uv run dbt testwith marts populated and tests passing
Files:
- @python/analytics/models/staging/stg_transactions.sql
- @python/analytics/models/staging/stg_students.sql
- @python/analytics/models/intermediate/int_orders.sql
- @python/analytics/models/intermediate/schema.yml
2026-02-22 11:24 - T-019c
Overview: Completed end-to-end verification across extract/load/validate/dbt/format with mixed-year raw loading and updated analytics docs.
Completed:
- test(pipeline): run
pnpm run sync -- --entity IONS --year 2026and verify raw-layer loads from updated CSVs - test(pipeline): run
pnpm run validate -- --entity IONS --year 2026and confirm warning-only validation report - test(analytics): run dbt full build/test cycles after sync updates; confirm non-zero
int_orders,int_enrollments, and marts - test(format): run
pnpm run format -- --entity IONS --period 2026-02and verify populated monthly report output - test(data): load 2026 then append 2023 (
--mode append) and verify non-zero historical 2023 rows in marts - docs(analytics): update
@python/analytics/AGENTS.mdwithstg_studentsidentity split guidance
Files:
- @python/analytics/AGENTS.md
- output/monthly/2026-02-report.json
- output/validation/ions-2026-validation.json
2026-02-22 11:37 - T-010
Overview: Finalized frozen CSV extraction deliverable by committing all refreshed @source/clean/ snapshots in granular data commits and merging to main.
Completed:
- data(pipeline): committed 2023 transaction and student CSV snapshots
- data(pipeline): committed 2024 transaction and student CSV snapshots
- data(pipeline): committed 2025 transaction and student CSV snapshots
- data(pipeline): committed 2026 transaction and student CSV snapshots
- chore(git): pushed branch
feat/data-quality-remediation, created PR #19, and merged via rebase tomain
Evidence:
- PR: https://github.com/prata-ma/atlas/pull/19
- Merge commit on
main:1b49ab5d9b084bfbbb488612074b28835a0bf714
Files:
- @source/clean/*.csv
2026-02-22 11:37 - T-019a
Overview: Closed the remaining T-019a completion gate by committing refreshed CSV outputs and shipping extract/schema/test updates to main.
Completed:
- feat(pipeline): committed extract normalization (
formatAmount,formatDate) andid_rawmapping changes - test(pipeline): committed updated extract tests for amount/date normalization
- data(pipeline): committed all 100 regenerated CSV snapshots produced by
--overwrite - chore(git): delivered via PR #19 merged to
main
Evidence:
- PR: https://github.com/prata-ma/atlas/pull/19
- Merge commit on
main:1b49ab5d9b084bfbbb488612074b28835a0bf714
Files:
- @packages/pipeline/@source/extract/reader.ts
- @packages/pipeline/@source/extract/schema.ts
- @packages/pipeline/@source/tests/extract.test.ts
- @source/clean/*.csv