diff --git a/.opencode/skills/dbt-test/SKILL.md b/.opencode/skills/dbt-test/SKILL.md index 1d8fb4a733..3c3cdf82db 100644 --- a/.opencode/skills/dbt-test/SKILL.md +++ b/.opencode/skills/dbt-test/SKILL.md @@ -81,6 +81,8 @@ altimate-dbt build --model # build + test together ## Unit Test Workflow +**For automated unit test generation, use the `dbt-unit-tests` skill instead.** It analyzes model SQL, generates type-correct mock data, and assembles complete YAML automatically. + See [references/unit-test-guide.md](references/unit-test-guide.md) for the full unit test framework. ### Quick Pattern diff --git a/.opencode/skills/dbt-unit-tests/SKILL.md b/.opencode/skills/dbt-unit-tests/SKILL.md new file mode 100644 index 0000000000..5dc4498d46 --- /dev/null +++ b/.opencode/skills/dbt-unit-tests/SKILL.md @@ -0,0 +1,209 @@ +--- +name: dbt-unit-tests +description: Generate dbt unit tests automatically for any model. Analyzes SQL logic (CASE/WHEN, JOINs, window functions, NULLs), creates type-correct mock inputs from manifest schema, and assembles complete YAML. Use when a user says "generate tests", "add unit tests", "test this model", or "test coverage" for dbt models. +--- + +# dbt Unit Test Generation + +## Requirements +**Agent:** builder or migrator (requires file write access) +**Tools used:** dbt_unit_test_gen, dbt_manifest, dbt_lineage, altimate_core_validate, altimate_core_testgen, bash (runs `altimate-dbt` commands), read, glob, write, edit + +## When to Use This Skill + +**Use when the user wants to:** +- Generate unit tests for a dbt model +- Add test coverage to an existing model +- Create mock data for testing +- Test-driven development (TDD) for dbt +- Verify CASE/WHEN logic, NULL handling, JOIN behavior, or aggregation correctness +- Test incremental model logic + +**Do NOT use for:** +- Adding schema tests (not_null, unique, accepted_values) -> use `dbt-test` +- Creating or modifying model SQL -> use `dbt-develop` +- Writing descriptions -> use `dbt-docs` +- Debugging build failures -> use `dbt-troubleshoot` + +## The Iron Rules + +1. **Never guess expected outputs.** Compute them by running SQL against mock data when possible. If you cannot run SQL, clearly mark expected outputs as placeholders that need verification. +2. **Never skip upstream dependencies.** Every ref() and source() the model touches MUST have a mock input. Miss one and the test won't compile. +3. **Use sql format for ephemeral models.** Dict format fails silently for ephemeral upstreams. +4. **Never weaken a test to make it pass.** If the test fails, the model logic may be wrong. Investigate before changing expected values. +5. **Compile before committing.** Always run `altimate-dbt test --model ` to verify tests compile and execute. + +## Core Workflow: Analyze -> Generate -> Refine -> Validate -> Write + +### Phase 1: Analyze the Model + +Before generating any tests, deeply understand the model: + +```bash +# 1. Ensure manifest is compiled +altimate-dbt compile --model + +# 2. Read the model SQL +read + +# 3. Parse the manifest for dependencies +dbt_unit_test_gen(manifest_path: "target/manifest.json", model: "") +``` + +**What to look for:** +- Which upstream refs/sources does this model depend on? +- What SQL constructs need testing? (CASE/WHEN, JOINs, window functions, aggregations) +- What edge cases exist? (NULLs, empty strings, zero values, boundary dates) +- Is this an incremental model? (needs `is_incremental` override tests) +- Are any upstream models ephemeral? (need sql format) + +### Phase 2: Generate Tests + +The `dbt_unit_test_gen` tool does the heavy lifting: + +```text +dbt_unit_test_gen( + manifest_path: "target/manifest.json", + model: "fct_orders", + max_scenarios: 5 +) +``` + +This returns: +- Complete YAML with mock inputs and expected outputs +- Semantic context: model/column descriptions, column lineage, compiled SQL +- List of anti-patterns that informed edge case generation +- Warnings about ephemeral deps, missing columns, etc. + +**If the tool reports missing columns** (placeholder rows in the YAML), discover them: +```bash +altimate-dbt columns --model +altimate-dbt columns-source --source --table +``` +Then update the generated YAML with real column names. + +### Phase 3: Refine Expected Outputs + +**This is the critical step that differentiates good tests from bad ones.** + +The tool generates placeholder expected outputs based on column types. You MUST refine them: + +**Option A: Compute by running SQL (preferred)** +```bash +# Run the model against mock data to get actual output +altimate-dbt test --model +# If the test fails, the error shows actual vs expected — use actual as expected +``` + +**Option B: Manual computation** +Read the model SQL carefully and mentally execute it against the mock inputs. +For each test case: +1. Look at the mock input rows +2. Trace through the SQL logic (CASE/WHEN branches, JOINs, aggregations) +3. Write the correct expected output + +**Option C: Use the warehouse (most accurate)** +```bash +# Build a CTE query with mock data and run the model SQL against it +altimate-dbt execute --query "WITH mock_stg_orders AS (SELECT 1 AS order_id, 100.00 AS amount) SELECT * FROM () sub" +``` + +### Phase 4: Validate + +```bash +# 1. Run the unit tests +altimate-dbt test --model + +# 2. If tests fail, read the error carefully +# - Compilation error? Missing ref, wrong column name, type mismatch +# - Assertion error? Expected output doesn't match actual + +# 3. Fix and retry (max 3 iterations) +``` + +### Phase 5: Write to File + +Place unit tests in one of these locations (match project convention): +- `models//_unit_tests.yml` (dedicated file) +- `models//schema.yml` (append to existing) + +```bash +# Check existing convention +glob models/**/*unit_test*.yml models/**/*schema*.yml + +# Write or append +edit # if file exists +write # if creating new +``` + +## Test Case Categories + +### Happy Path (always generate) +Standard inputs that exercise the main logic path. 2 rows minimum. + +### NULL Handling +Set nullable columns to NULL in the last row. Verify COALESCE/NVL/IFNULL behavior. + +### Boundary Values +Zero amounts, empty strings, epoch dates, MAX values. Tests robustness. + +### Edge Cases +- Division by zero (if model divides) +- Non-matching JOINs (LEFT JOIN with no match) +- Single-row aggregation +- Duplicate key handling + +### Incremental +For incremental models only. Use `overrides.macros.is_incremental: true` to test the incremental path. + +## Common Mistakes + +| Mistake | Fix | +|---------|-----| +| Missing a ref() in given | Parse manifest for ALL depends_on nodes | +| Wrong column names in mock data | Use manifest columns, not guesses | +| Wrong data types | Use schema catalog types | +| Expected output is just mock input | Actually compute the transformation | +| Dict format for ephemeral model | Use `format: sql` with raw SQL | +| Not testing NULL path in COALESCE | Add null_handling test case | +| Hardcoded dates with current_timestamp | Use overrides.macros to mock timestamps | +| Testing trivial pass-through | Skip models with no logic | + +## YAML Format Reference + +```yaml +unit_tests: + - name: test__ + description: "What this test verifies" + model: + overrides: # optional + macros: + is_incremental: true # for incremental models + vars: + run_date: "2024-01-15" # for date-dependent logic + given: + - input: ref('upstream_model') + rows: + - { col1: value1, col2: value2 } + - input: source('source_name', 'table_name') + rows: + - { col1: value1 } + - input: ref('ephemeral_model') + format: sql + rows: | + SELECT 1 AS id, 'test' AS name + UNION ALL + SELECT 2 AS id, 'other' AS name + expect: + rows: + - { output_col1: expected1, output_col2: expected2 } +``` + +## Reference Guides + +| Guide | Use When | +|-------|----------| +| [references/unit-test-yaml-spec.md](references/unit-test-yaml-spec.md) | Full YAML specification and format details | +| [references/edge-case-patterns.md](references/edge-case-patterns.md) | Catalog of edge cases by SQL construct | +| [references/incremental-testing.md](references/incremental-testing.md) | Testing incremental models | +| [references/altimate-dbt-commands.md](references/altimate-dbt-commands.md) | Full CLI reference | diff --git a/.opencode/skills/dbt-unit-tests/references/altimate-dbt-commands.md b/.opencode/skills/dbt-unit-tests/references/altimate-dbt-commands.md new file mode 100644 index 0000000000..8109ac84d2 --- /dev/null +++ b/.opencode/skills/dbt-unit-tests/references/altimate-dbt-commands.md @@ -0,0 +1,66 @@ +# altimate-dbt Command Reference + +All dbt operations use the `altimate-dbt` CLI. Output is JSON to stdout; logs go to stderr. + +```bash +altimate-dbt [args...] +altimate-dbt [args...] --format text # Human-readable output +``` + +## First-Time Setup + +```bash +altimate-dbt init # Auto-detect project root +altimate-dbt init --project-root /path # Explicit root +altimate-dbt init --python-path /path # Override Python +altimate-dbt doctor # Verify setup +altimate-dbt info # Project name, adapter, root +``` + +## Build & Run + +```bash +altimate-dbt build # full project build (compile + run + test) +altimate-dbt build --model [--downstream] # build a single model +altimate-dbt run --model [--downstream] # materialize only +altimate-dbt test --model # run tests only +``` + +## Compile + +```bash +altimate-dbt compile --model +altimate-dbt compile-query --query "SELECT * FROM {{ ref('stg_orders') }}" [--model ] +``` + +## Execute SQL + +```bash +altimate-dbt execute --query "SELECT count(*) FROM {{ ref('orders') }}" --limit 100 +``` + +## Schema & DAG + +```bash +altimate-dbt columns --model # column names and types +altimate-dbt columns-source --source --table # source table columns +altimate-dbt column-values --model --column # sample values +altimate-dbt children --model # downstream models +altimate-dbt parents --model # upstream models +``` + +## Packages + +```bash +altimate-dbt deps # install packages.yml +altimate-dbt add-packages --packages dbt-utils,dbt-expectations +``` + +## Error Handling + +All errors return JSON with `error` and `fix` fields: +```json +{ "error": "dbt-core is not installed", "fix": "Install it: python3 -m pip install dbt-core" } +``` + +Run `altimate-dbt doctor` as the first diagnostic step for any failure. diff --git a/.opencode/skills/dbt-unit-tests/references/edge-case-patterns.md b/.opencode/skills/dbt-unit-tests/references/edge-case-patterns.md new file mode 100644 index 0000000000..a12d5226e8 --- /dev/null +++ b/.opencode/skills/dbt-unit-tests/references/edge-case-patterns.md @@ -0,0 +1,189 @@ +# Edge Case Patterns by SQL Construct + +## CASE/WHEN + +**What to test:** Every branch, including ELSE/default. + +```yaml +# Test the TRUE branch +- { status: "completed", amount: 100 } +# Expected: { category: "done" } + +# Test the FALSE/ELSE branch +- { status: "unknown", amount: 100 } +# Expected: { category: "other" } + +# Test NULL input +- { status: null, amount: 100 } +# Expected: depends on whether NULL matches any WHEN +``` + +**Common bugs:** +- NULL doesn't match `WHEN status = 'active'` — it falls to ELSE +- Multiple WHEN clauses: first match wins, test ordering + +## COALESCE / NVL / IFNULL + +**What to test:** NULL in each position. + +```yaml +# COALESCE(a, b, c) — test a=NULL +- { a: null, b: "fallback", c: "default" } +# Expected: { result: "fallback" } + +# COALESCE(a, b, c) — test a=NULL, b=NULL +- { a: null, b: null, c: "default" } +# Expected: { result: "default" } + +# All non-null +- { a: "primary", b: "fallback", c: "default" } +# Expected: { result: "primary" } +``` + +## JOINs + +**What to test:** Matching rows, non-matching rows, NULL join keys. + +```yaml +# LEFT JOIN — matching row +orders: [{ order_id: 1, customer_id: 1 }] +customers: [{ customer_id: 1, name: "Alice" }] +# Expected: { order_id: 1, name: "Alice" } + +# LEFT JOIN — no match (customer missing) +orders: [{ order_id: 2, customer_id: 99 }] +customers: [{ customer_id: 1, name: "Alice" }] +# Expected: { order_id: 2, name: null } + +# JOIN with NULL key +orders: [{ order_id: 3, customer_id: null }] +customers: [{ customer_id: 1, name: "Alice" }] +# Expected: depends on join type +``` + +**Common bugs:** +- INNER JOIN drops rows when key is NULL or missing +- Fan-out: duplicate keys in right table multiply left rows + +## Window Functions + +**What to test:** Ordering, partitioning, boundary rows. + +```yaml +# ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) +- { customer_id: 1, order_date: "2024-01-01", amount: 50 } +- { customer_id: 1, order_date: "2024-01-15", amount: 75 } +- { customer_id: 2, order_date: "2024-01-10", amount: 30 } +# Expected: +# { customer_id: 1, order_date: "2024-01-01", row_num: 1 } +# { customer_id: 1, order_date: "2024-01-15", row_num: 2 } +# { customer_id: 2, order_date: "2024-01-10", row_num: 1 } +``` + +**What to test for LAG/LEAD:** +- First/last row in partition (LAG returns NULL for first row) +- Single-row partition + +## Aggregations (GROUP BY) + +**What to test:** Multiple groups, single group, empty group. + +```yaml +# SUM(amount) GROUP BY customer_id +- { customer_id: 1, amount: 50 } +- { customer_id: 1, amount: 25 } +- { customer_id: 2, amount: 100 } +# Expected: +# { customer_id: 1, total_amount: 75 } +# { customer_id: 2, total_amount: 100 } + +# Single row group +- { customer_id: 3, amount: 10 } +# Expected: { customer_id: 3, total_amount: 10 } + +# NULL in aggregated column +- { customer_id: 4, amount: null } +# Expected: { customer_id: 4, total_amount: null } # SUM of NULLs = NULL +``` + +## Division + +**What to test:** Normal, divide by zero, NULL. + +```yaml +# amount / quantity +- { amount: 100, quantity: 4 } +# Expected: { unit_price: 25 } + +# Divide by zero +- { amount: 100, quantity: 0 } +# Expected: depends — NULL, error, or COALESCE fallback? + +# NULL divisor +- { amount: 100, quantity: null } +# Expected: { unit_price: null } +``` + +## Date/Timestamp Logic + +**What to test:** Boundaries, NULL dates, timezone edge cases. + +```yaml +# DATEDIFF or date filtering +- { event_date: "2024-01-01" } # start of year +- { event_date: "2024-12-31" } # end of year +- { event_date: "2024-02-29" } # leap year +- { event_date: null } # NULL date +``` + +## Type Coercion + +**What to test:** Implicit casts that may fail. + +```yaml +# String that looks like a number +- { amount_str: "100.50" } # CAST to DECIMAL +- { amount_str: "not_a_number" } # should this fail? +- { amount_str: "" } # empty string cast +- { amount_str: null } # NULL cast +``` + +## Incremental Models + +**What to test:** Full refresh vs incremental path. + +```yaml +# Test 1: Full refresh (is_incremental = false, default) +# All rows processed + +# Test 2: Incremental (is_incremental = true) +unit_tests: + - name: test_incremental_new_rows_only + model: fct_orders + overrides: + macros: + is_incremental: true + given: + - input: this # existing table state + rows: + - { order_id: 1, updated_at: "2024-01-14" } + - input: ref('stg_orders') + rows: + - { order_id: 1, updated_at: "2024-01-14" } # old, should be skipped + - { order_id: 2, updated_at: "2024-01-15" } # new, should be processed + expect: + rows: + - { order_id: 2, updated_at: "2024-01-15" } +``` + +## Empty Inputs + +**What to test:** Model behavior when upstream has zero rows. + +```yaml +given: + - input: ref('stg_orders') + rows: [] +expect: + rows: [] # or specific default behavior +``` diff --git a/.opencode/skills/dbt-unit-tests/references/incremental-testing.md b/.opencode/skills/dbt-unit-tests/references/incremental-testing.md new file mode 100644 index 0000000000..f4b60ad766 --- /dev/null +++ b/.opencode/skills/dbt-unit-tests/references/incremental-testing.md @@ -0,0 +1,109 @@ +# Testing Incremental dbt Models + +Incremental models have two code paths controlled by `{% if is_incremental() %}`. Both paths must be tested. + +## The Two Paths + +```sql +SELECT * FROM {{ ref('stg_orders') }} + +{% if is_incremental() %} + -- Incremental path: only process new rows + WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }}) +{% endif %} +``` + +## Test 1: Full Refresh + +```yaml +unit_tests: + - name: test_fct_orders_full_refresh + description: "Full refresh processes all rows" + model: fct_orders + # No overrides needed — is_incremental defaults to false + given: + - input: ref('stg_orders') + rows: + - { order_id: 1, amount: 100, updated_at: "2024-01-10" } + - { order_id: 2, amount: 200, updated_at: "2024-01-15" } + expect: + rows: + - { order_id: 1, amount: 100, updated_at: "2024-01-10" } + - { order_id: 2, amount: 200, updated_at: "2024-01-15" } +``` + +## Test 2: Incremental — New Rows Only + +```yaml +unit_tests: + - name: test_fct_orders_incremental_new_only + description: "Incremental run only processes rows newer than existing max" + model: fct_orders + overrides: + macros: + is_incremental: true + given: + - input: this # mock the existing target table + rows: + - { order_id: 1, amount: 100, updated_at: "2024-01-10" } + - input: ref('stg_orders') + rows: + - { order_id: 1, amount: 100, updated_at: "2024-01-10" } # old + - { order_id: 2, amount: 200, updated_at: "2024-01-15" } # new + expect: + rows: + - { order_id: 2, amount: 200, updated_at: "2024-01-15" } +``` + +## Test 3: Incremental — Updated Rows + +If your model uses `unique_key` for merge/upsert: + +```yaml +unit_tests: + - name: test_fct_orders_incremental_update + description: "Updated rows are captured in incremental run" + model: fct_orders + overrides: + macros: + is_incremental: true + given: + - input: this + rows: + - { order_id: 1, amount: 100, updated_at: "2024-01-10" } + - input: ref('stg_orders') + rows: + - { order_id: 1, amount: 150, updated_at: "2024-01-15" } # updated + expect: + rows: + - { order_id: 1, amount: 150, updated_at: "2024-01-15" } +``` + +## Test 4: Incremental — Empty Source + +```yaml +unit_tests: + - name: test_fct_orders_incremental_no_new_data + description: "No new rows when source has nothing newer" + model: fct_orders + overrides: + macros: + is_incremental: true + given: + - input: this + rows: + - { order_id: 1, amount: 100, updated_at: "2024-01-15" } + - input: ref('stg_orders') + rows: + - { order_id: 1, amount: 100, updated_at: "2024-01-10" } # older + expect: + rows: [] +``` + +## Key Points + +1. **Always mock `this`** when testing incremental path — it represents the existing target table +2. **Set `is_incremental: true`** in overrides.macros to activate the incremental code path +3. **Test both paths** — full refresh AND incremental +4. **Include overlap rows** — rows that exist in both `this` and source to verify filtering +5. **Test the merge key** — if `unique_key` is set, verify upsert behavior diff --git a/.opencode/skills/dbt-unit-tests/references/unit-test-yaml-spec.md b/.opencode/skills/dbt-unit-tests/references/unit-test-yaml-spec.md new file mode 100644 index 0000000000..da43b1238e --- /dev/null +++ b/.opencode/skills/dbt-unit-tests/references/unit-test-yaml-spec.md @@ -0,0 +1,174 @@ +# dbt Unit Test YAML Specification + +Available in dbt-core 1.8+ (released mid-2024). + +## Top-Level Structure + +Unit tests are defined under the `unit_tests:` key in any YAML file within your dbt project (typically `schema.yml` or `_unit_tests.yml`). + +```yaml +unit_tests: + - name: # required, snake_case + description: # optional but recommended + model: # required, the model being tested + given: # required, mock input data + expect: # required, expected output rows + overrides: # optional, macro/var overrides + config: # optional, test configuration + tags: # optional, for filtering +``` + +## Input Formats + +### Dict Format (default, preferred) + +```yaml +given: + - input: ref('stg_orders') + rows: + - { order_id: 1, amount: 100.00, status: "completed" } + - { order_id: 2, amount: null, status: "pending" } +``` + +**Rules:** +- Only include columns that the model actually uses +- Column names must match the upstream model exactly +- Use `null` for NULL values (not empty string) +- Dates as strings: `"2024-01-15"` +- Timestamps as strings: `"2024-01-15 10:30:00"` +- Booleans: `true` / `false` +- Numbers: no quotes (`100.00`, not `"100.00"`) + +### SQL Format (required for ephemeral models) + +```yaml +given: + - input: ref('ephemeral_model') + format: sql + rows: | + SELECT 1 AS id, 'test' AS name + UNION ALL + SELECT 2 AS id, 'other' AS name +``` + +**When to use SQL format:** +- Upstream model is materialized as `ephemeral` +- Complex data types that dict can't represent +- Need to use SQL functions in mock data + +### Empty Input + +```yaml +given: + - input: ref('stg_orders') + rows: [] +``` + +Tests behavior with no input rows (empty table). + +## Expected Output + +```yaml +expect: + rows: + - { order_id: 1, net_revenue: 85.00 } + - { order_id: 2, net_revenue: 50.00 } +``` + +**Rules:** +- Only include columns you want to assert on (subset is OK) +- Row order matters — rows are compared positionally +- Use exact values for numeric assertions +- `null` to assert NULL output + +## Overrides + +### Macro Overrides + +```yaml +overrides: + macros: + is_incremental: true # boolean + current_timestamp: "2024-01-15 00:00:00" # string +``` + +Common macros to override: +- `is_incremental` — test incremental vs full-refresh path +- `current_timestamp` / `current_date` — deterministic date testing + +### Variable Overrides + +```yaml +overrides: + vars: + run_date: "2024-01-15" + lookback_days: 30 +``` + +## Input Sources + +### ref() — Model references + +```yaml +- input: ref('model_name') +``` + +### source() — Source table references + +```yaml +- input: source('source_name', 'table_name') +``` + +### this — Self-reference for incremental models + +```yaml +- input: this + rows: + - { order_id: 1, updated_at: "2024-01-14" } +``` + +Used with `overrides.macros.is_incremental: true` to mock the existing table state. + +## Configuration + +Tags can be set at the top level (sibling of `config`) or nested under `config`: + +```yaml +unit_tests: + - name: test_example + model: fct_orders + tags: ["unit-test", "revenue"] + # ... rest of test +``` + +Or via config: + +```yaml +unit_tests: + - name: test_example + model: fct_orders + config: + tags: ["unit-test", "revenue"] +``` + +## Naming Conventions + +- Test names: `test__` +- Examples: + - `test_fct_orders_happy_path` + - `test_fct_orders_null_discount` + - `test_fct_orders_zero_quantity` + - `test_fct_orders_incremental_new_rows` + +## Running Unit Tests + +```bash +dbt test --select test_type:unit # all unit tests +dbt test --select test_type:unit,model_name:fct_orders # unit tests for one model +dbt build --select +fct_orders # build + all tests +``` + +## Official Documentation + +- https://docs.getdbt.com/docs/build/unit-tests +- https://docs.getdbt.com/reference/resource-properties/unit-tests diff --git a/CHANGELOG.md b/CHANGELOG.md index 3585406287..b76572c22d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,15 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [Unreleased] + +### Added + +- **Automated dbt unit test generation** — new `dbt_unit_test_gen` tool and `/dbt-unit-tests` skill for generating dbt unit tests (v1.8+) from a compiled manifest. Parses `manifest.json` via the shared `parseManifest()` helper, uses `dbtLineage()` for column lineage, detects testable SQL constructs (CASE/WHEN, JOINs, NULLs, window functions, division, incremental), and assembles complete YAML via the `yaml` library. Includes `input: this` mock for incremental models, `format: sql` fallback for ephemeral deps (even with no known columns), cross-database support via `database` param in `schema.inspect`, deterministic test names (no `Date.now()`), and rich `UnitTestContext` (descriptions, lineage, compiled SQL) for LLM-driven test value refinement. Handles seeds and snapshots as first-class ref() deps. Warns when upstream deps cannot be resolved. (#673) +- **Manifest parse cache** — `loadRawManifest()` helper caches by path+mtime; `parseManifest()` and `dbtLineage()` both go through it, so a 128MB manifest is read and parsed once per request instead of once per call. Benefits any tool that makes multiple manifest-backed calls in sequence. +- **`description` on `DbtModelInfo`/`DbtSourceInfo`** — surfaces model and source descriptions from schema.yml in the parsed manifest result, enabling downstream tools to provide richer semantic context to the LLM. +- **`adapter_type` on `DbtManifestResult`** — exposes the dbt adapter type (snowflake, bigquery, etc.) from manifest metadata for dialect auto-detection. + ## [0.5.20] - 2026-04-09 ### Added diff --git a/docs/docs/configure/skills.md b/docs/docs/configure/skills.md index 259c1fe775..7b682b70bf 100644 --- a/docs/docs/configure/skills.md +++ b/docs/docs/configure/skills.md @@ -81,7 +81,8 @@ altimate ships with built-in skills for common data engineering tasks. Type `/` | `/query-optimize` | Query optimization suggestions | | `/data-viz` | Interactive data visualization and dashboards | | `/dbt-develop` | dbt model development and scaffolding | -| `/dbt-test` | dbt test generation | +| `/dbt-test` | dbt schema test generation | +| `/dbt-unit-tests` | Automated dbt unit test generation (v1.8+) | | `/dbt-docs` | dbt documentation generation | | `/dbt-analyze` | dbt project analysis | | `/dbt-troubleshoot` | dbt issue diagnosis | diff --git a/docs/docs/data-engineering/tools/dbt-tools.md b/docs/docs/data-engineering/tools/dbt-tools.md index 89f62e7c95..3cf12e9de9 100644 --- a/docs/docs/data-engineering/tools/dbt-tools.md +++ b/docs/docs/data-engineering/tools/dbt-tools.md @@ -72,6 +72,69 @@ Source Freshness: --- +## dbt_unit_test_gen + +Generate dbt unit tests (v1.8+) from a compiled manifest. Analyzes model SQL for testable logic (CASE/WHEN, JOINs, NULLs, window functions, division, incremental), generates type-correct mock inputs, and assembles complete YAML. + +```text +> dbt_unit_test_gen --manifest_path target/manifest.json --model fct_orders --max_scenarios 5 + +Unit Test Gen: 4 test(s) for fct_orders + +=== Unit Test Generation Summary === +Model: fct_orders +Description: Daily order totals by order ID +Materialization: table +Upstream dependencies: 2 +Tests generated: 4 + +=== Upstream Dependencies === + +ref('stg_orders') + Staged orders from raw source + Columns: + order_id (INTEGER) — Primary key for orders + quantity (INTEGER) — Number of items ordered + unit_price (NUMERIC) — Price per unit in USD + +=== Column Lineage (output ← inputs) === + order_total ← stg_orders.quantity, stg_orders.unit_price + +=== YAML (paste into schema.yml) === +unit_tests: + - name: test_fct_orders_happy_path + description: Verify correct output for standard input data + model: fct_orders + given: + - input: ref('stg_orders') + rows: + - { order_id: 1, quantity: 3, unit_price: 100 } + - { order_id: 2, quantity: 1, unit_price: 50 } + expect: + rows: + - { order_id: 1, order_total: 300 } + - { order_id: 2, order_total: 50 } + # ... null_handling, edge_case, incremental tests +``` + +**Parameters:** +- `manifest_path` (required): Path to compiled `manifest.json` (run `dbt compile` first) +- `model` (required): Model name or unique_id (e.g. `fct_orders` or `model.project.fct_orders`) +- `dialect` (optional): SQL dialect override (auto-detected from manifest adapter_type) +- `max_scenarios` (optional, default 3): Maximum number of test scenarios to generate + +**What it generates:** +- **Scenarios:** `happy_path`, `null_handling` (for CASE/COALESCE), `edge_case` (for JOINs, window functions, division), `incremental` (for incremental models with `input: this` mock) +- **Mock data:** Type-correct values from dialect-aware type mapping (Snowflake, BigQuery, Postgres, Redshift, Databricks, DuckDB, MySQL) +- **Dependencies:** Handles `ref()` for models/seeds/snapshots, `source()` for raw tables, `format: sql` for ephemeral models +- **Context:** Returns model/column descriptions, column lineage, and compiled SQL for the LLM to refine test values + +**Skill:** `/dbt-unit-tests` — 5-phase workflow (Analyze → Generate → Refine → Validate → Write) with reference guides for YAML spec, edge-case patterns, and incremental testing. + +**Important:** The tool generates scaffold tests with type-correct placeholder values. The LLM skill layer refines expected outputs by running SQL against mock data — always review and verify before committing. + +--- + ## altimate-dbt CLI `altimate-dbt` is a standalone CLI for dbt workflows. It auto-detects your dbt project directory, Python environment, and adapter type (Snowflake, BigQuery, Databricks, Redshift, etc.). @@ -104,6 +167,27 @@ All commands provide friendly error diagnostics with actionable fix suggestions ## dbt Skills +### /dbt-unit-tests + +Automated dbt unit test generation (v1.8+). Uses `dbt_unit_test_gen` to produce scaffold YAML, then refines expected outputs by reading the compiled SQL and running it against the mock data. + +```text +You: /dbt-unit-tests fct_orders + +> dbt_unit_test_gen --manifest_path target/manifest.json --model fct_orders +> altimate-dbt test --select fct_orders + +Generated 4 unit tests for fct_orders: + ✓ test_fct_orders_happy_path + ✓ test_fct_orders_null_handling + ✓ test_fct_orders_edge_case_1 (division) + ✓ test_fct_orders_incremental + +All tests passing. YAML written to models/marts/_unit_tests.yml. +``` + +Workflow: Analyze → Generate → Refine → Validate → Write. See [reference guides](https://github.com/AltimateAI/altimate-code/tree/main/.opencode/skills/dbt-unit-tests/references) for edge-case patterns and incremental testing. + ### /generate-tests Auto-generate dbt test definitions from table metadata. diff --git a/docs/docs/data-engineering/tools/index.md b/docs/docs/data-engineering/tools/index.md index 5df590cc31..246768a129 100644 --- a/docs/docs/data-engineering/tools/index.md +++ b/docs/docs/data-engineering/tools/index.md @@ -8,7 +8,7 @@ altimate has 100+ specialized tools organized by function. | [Schema Tools](schema-tools.md) | 7 tools | Inspection, search, PII detection, tagging, diffing | | [FinOps Tools](finops-tools.md) | 8 tools | Cost analysis, warehouse sizing, unused resources, RBAC | | [Lineage Tools](lineage-tools.md) | 1 tool | Column-level lineage tracing with confidence scoring | -| [dbt Tools](dbt-tools.md) | 2 tools + 5 skills | Run, manifest parsing, test generation, scaffolding, `altimate-dbt` CLI | +| [dbt Tools](dbt-tools.md) | 3 tools + 6 skills | Run, manifest parsing, unit test generation, scaffolding, `altimate-dbt` CLI | | [Warehouse Tools](warehouse-tools.md) | 6 tools | Environment scanning, connection management, discovery, testing | | [Altimate Memory](memory-tools.md) | 3 tools | Persistent cross-session memory for warehouse config, conventions, and preferences | | [Training](../training/index.md) | 3 tools + 3 skills | Correct the agent once, it remembers forever, your team inherits it | diff --git a/packages/opencode/src/altimate/native/connections/register.ts b/packages/opencode/src/altimate/native/connections/register.ts index ef8ac86861..867ebd6d3e 100644 --- a/packages/opencode/src/altimate/native/connections/register.ts +++ b/packages/opencode/src/altimate/native/connections/register.ts @@ -384,6 +384,10 @@ register("schema.inspect", async (params: SchemaInspectParams): Promise, model: string): any | null { + if (model in nodes && nodes[model]?.resource_type === "model") return nodes[model] + for (const [, node] of Object.entries(nodes)) { + if (node.resource_type !== "model") continue + if (node.name === model) return node + } + return null +} + +/** + * Get the unique_id for a model (by name or unique_id lookup). + * Only matches nodes where resource_type === "model". + */ +export function getUniqueId(nodes: Record, model: string): string | undefined { + if (model in nodes && nodes[model]?.resource_type === "model") return model + for (const [nodeId, node] of Object.entries(nodes)) { + if (node.resource_type === "model" && node.name === model) return nodeId + } + return undefined +} + +/** + * Detect SQL dialect from manifest metadata.adapter_type. + */ +export function detectDialect(manifest: any): string { + const adapter = manifest.metadata?.adapter_type || "" + const dialectMap: Record = { + snowflake: "snowflake", + bigquery: "bigquery", + databricks: "databricks", + spark: "spark", + postgres: "postgres", + redshift: "redshift", + duckdb: "duckdb", + clickhouse: "clickhouse", + mysql: "mysql", + sqlserver: "tsql", + trino: "trino", + } + return dialectMap[adapter] || adapter || "snowflake" +} + +/** + * Build a schema context object from upstream dependency nodes. + * Returns the { tables, version } format expected by core.Schema.fromJson(). + */ +export function buildSchemaContext( + nodes: Record, + sources: Record, + upstreamIds: string[], +): Record | null { + const tables: Record = {} + + for (const uid of upstreamIds) { + const node = nodes[uid] || sources[uid] + if (!node) continue + + const tableName = node.alias || node.name || "" + if (!tableName) continue + + const columnsDict = node.columns || {} + if (Object.keys(columnsDict).length === 0) continue + + const cols = Object.entries(columnsDict).map(([colName, col]: [string, any]) => ({ + name: col.name || colName, + type: col.data_type || col.type || "", + })) + + if (cols.length > 0) { + tables[tableName] = { columns: cols } + } + } + + if (Object.keys(tables).length === 0) return null + return { tables, version: "1" } +} + +/** + * Extract typed ModelColumn[] from a raw node's columns dict. + */ +export function extractColumns(columnsDict: Record): ModelColumn[] { + return Object.entries(columnsDict).map(([colName, col]: [string, any]) => ({ + name: col.name || colName, + data_type: col.data_type || col.type || "", + description: col.description || undefined, + })) +} + +/** + * List model names from manifest nodes (for error messages). + */ +export function listModelNames(nodes: Record): string[] { + return Object.values(nodes) + .filter((n: any) => n.resource_type === "model") + .map((n: any) => n.name) +} diff --git a/packages/opencode/src/altimate/native/dbt/lineage.ts b/packages/opencode/src/altimate/native/dbt/lineage.ts index 8136fba1b8..49b4cf9a96 100644 --- a/packages/opencode/src/altimate/native/dbt/lineage.ts +++ b/packages/opencode/src/altimate/native/dbt/lineage.ts @@ -4,9 +4,9 @@ * Ported from Python altimate_engine.dbt.lineage. */ -import * as fs from "fs" import * as core from "@altimateai/altimate-core" import type { DbtLineageParams, DbtLineageResult } from "../types" +import { loadRawManifest, findModel, getUniqueId, detectDialect, buildSchemaContext } from "./helpers" /** * Compute column-level lineage for a dbt model. @@ -22,17 +22,15 @@ export function dbtLineage(params: DbtLineageParams): DbtLineageResult { confidence_factors: factors, }) - if (!fs.existsSync(params.manifest_path)) { - return emptyResult(["Manifest file not found"]) - } - let manifest: any try { - const raw = fs.readFileSync(params.manifest_path, "utf-8") - manifest = JSON.parse(raw) + manifest = loadRawManifest(params.manifest_path) } catch (e) { return emptyResult([`Failed to parse manifest: ${e}`]) } + if (!manifest) { + return emptyResult(["Manifest file not found"]) + } const nodes = manifest.nodes || {} const sources = manifest.sources || {} @@ -50,10 +48,7 @@ export function dbtLineage(params: DbtLineageParams): DbtLineageResult { } // Detect dialect - let dialect = params.dialect - if (!dialect) { - dialect = detectDialect(manifest, modelNode) - } + const dialect = params.dialect || detectDialect(manifest) // Build schema context from upstream dependencies const upstreamIds: string[] = modelNode.depends_on?.nodes || [] @@ -78,70 +73,3 @@ export function dbtLineage(params: DbtLineageParams): DbtLineageResult { confidence_factors: rawLineage.error ? [String(rawLineage.error)] : [], } } - -function findModel(nodes: Record, model: string): any | null { - if (model in nodes) return nodes[model] - for (const [, node] of Object.entries(nodes)) { - if (node.resource_type !== "model") continue - if (node.name === model) return node - } - return null -} - -function getUniqueId(nodes: Record, model: string): string | undefined { - if (model in nodes) return model - for (const [nodeId, node] of Object.entries(nodes)) { - if (node.resource_type === "model" && node.name === model) return nodeId - } - return undefined -} - -function detectDialect(manifest: any, modelNode: any): string { - const metadata = manifest.metadata || {} - const adapter = metadata.adapter_type || "" - if (adapter) { - const dialectMap: Record = { - snowflake: "snowflake", - bigquery: "bigquery", - databricks: "databricks", - spark: "spark", - postgres: "postgres", - redshift: "redshift", - duckdb: "duckdb", - clickhouse: "clickhouse", - } - return dialectMap[adapter] || adapter - } - return "snowflake" -} - -function buildSchemaContext( - nodes: Record, - sources: Record, - upstreamIds: string[], -): Record | null { - const tables: Record = {} - - for (const uid of upstreamIds) { - const node = nodes[uid] || sources[uid] - if (!node) continue - - const tableName = node.alias || node.name || "" - if (!tableName) continue - - const columnsDict = node.columns || {} - if (Object.keys(columnsDict).length === 0) continue - - const cols = Object.entries(columnsDict).map(([colName, col]: [string, any]) => ({ - name: col.name || colName, - type: col.data_type || col.type || "", - })) - - if (cols.length > 0) { - tables[tableName] = { columns: cols } - } - } - - if (Object.keys(tables).length === 0) return null - return { tables, version: "1" } -} diff --git a/packages/opencode/src/altimate/native/dbt/manifest.ts b/packages/opencode/src/altimate/native/dbt/manifest.ts index 3680ae3f4a..9e08254cb7 100644 --- a/packages/opencode/src/altimate/native/dbt/manifest.ts +++ b/packages/opencode/src/altimate/native/dbt/manifest.ts @@ -4,7 +4,6 @@ * Ported from Python altimate_engine.dbt.manifest. */ -import * as fs from "fs" import type { DbtManifestParams, DbtManifestResult, @@ -13,8 +12,7 @@ import type { DbtTestInfo, ModelColumn, } from "../types" - -const LARGE_MANIFEST_BYTES = 50 * 1024 * 1024 // 50 MB +import { loadRawManifest } from "./helpers" function extractColumns(columnsDict: Record): ModelColumn[] { return Object.entries(columnsDict).map(([colName, col]) => ({ @@ -26,12 +24,17 @@ function extractColumns(columnsDict: Record): ModelColumn[] { /** * Parse a dbt manifest.json and extract model, source, and node information. + * + * Uses the shared `loadRawManifest` helper which caches by path+mtime, so + * repeated calls (e.g. parseManifest → dbtLineage) don't re-read large files. */ export async function parseManifest(params: DbtManifestParams): Promise { const emptyResult: DbtManifestResult = { models: [], sources: [], tests: [], + seeds: [], + snapshots: [], source_count: 0, model_count: 0, test_count: 0, @@ -39,56 +42,40 @@ export async function parseManifest(params: DbtManifestParams): Promise LARGE_MANIFEST_BYTES) { - // Log warning but continue - } - raw = await fs.promises.readFile(params.path, "utf-8") - } catch { - return emptyResult - } - let manifest: any try { - manifest = JSON.parse(raw) + manifest = loadRawManifest(params.path) } catch { return emptyResult } - - if (typeof manifest !== "object" || manifest === null) { - return emptyResult - } + if (!manifest) return emptyResult const nodes = manifest.nodes || {} const sourcesDict = manifest.sources || {} const models: DbtModelInfo[] = [] const tests: DbtTestInfo[] = [] + const seeds: DbtModelInfo[] = [] + const snapshots: DbtModelInfo[] = [] let testCount = 0 - let snapshotCount = 0 - let seedCount = 0 for (const [nodeId, node] of Object.entries(nodes)) { const resourceType = node.resource_type - if (resourceType === "model") { - const dependsOnNodes = node.depends_on?.nodes || [] - const columns = extractColumns(node.columns || {}) - models.push({ + if (resourceType === "model" || resourceType === "seed" || resourceType === "snapshot") { + const info: DbtModelInfo = { unique_id: nodeId, name: node.name || "", + description: node.description || undefined, schema_name: node.schema || undefined, database: node.database || undefined, materialized: node.config?.materialized || undefined, - depends_on: dependsOnNodes, - columns, - }) + depends_on: node.depends_on?.nodes || [], + columns: extractColumns(node.columns || {}), + } + if (resourceType === "model") models.push(info) + else if (resourceType === "seed") seeds.push(info) + else snapshots.push(info) } else if (resourceType === "test") { testCount++ tests.push({ @@ -96,10 +83,6 @@ export async function parseManifest(params: DbtManifestParams): Promise => { + return generateDbtUnitTests(params) +}) + } // end registerAll // Auto-register on module load diff --git a/packages/opencode/src/altimate/native/dbt/unit-tests.ts b/packages/opencode/src/altimate/native/dbt/unit-tests.ts new file mode 100644 index 0000000000..bf2e72cdec --- /dev/null +++ b/packages/opencode/src/altimate/native/dbt/unit-tests.ts @@ -0,0 +1,580 @@ +/** + * dbt unit test generator. + * + * Pipeline: + * 1. Parse manifest (reuses helpers) → model, deps, columns, descriptions + * 2. Column lineage (reuses dbtLineage) → input→output mapping + * 3. Keyword-based scenario detection → which test categories to generate + * 4. Type-correct mock data → placeholder rows for the LLM to refine + * 5. YAML assembly (via `yaml` library) → ready to paste into schema.yml + * + * The tool generates scaffold tests with type-correct placeholder values. + * The LLM skill layer refines values by reading the compiled SQL, column + * descriptions, and lineage to craft rows that target specific logic branches. + */ + +import YAML from "yaml" +import { call as dispatcherCall } from "../dispatcher" +import type { + DbtUnitTestGenParams, + DbtUnitTestGenResult, + DbtModelInfo, + DbtSourceInfo, + ModelColumn, + UnitTestCase, + UnitTestContext, + UnitTestMockInput, +} from "../types" +import { parseManifest } from "./manifest" +import { dbtLineage } from "./lineage" + +// --------------------------------------------------------------------------- +// Constants +// --------------------------------------------------------------------------- + +const DEFAULT_MAX_SCENARIOS = 3 + +/** Sample values by broad data type category. */ +const MOCK_VALUES: Record = { + integer: [1, 2, 3, 0, -1], + float: [10.5, 25.0, 0.0, -5.5, 100.99], + string: ["alpha", "beta", "gamma", "", "test_value"], + boolean: [true, false, true], + date: ["2024-01-15", "2024-06-30", "2023-12-31"], + timestamp: ["2024-01-15 10:30:00", "2024-06-30 23:59:59", "2023-12-31 00:00:00"], + numeric: [100.00, 50.00, 0.00, -25.00, 999.99], +} + +/** + * Map SQL type names (across dialects) to mock value categories. + * Covers Snowflake, BigQuery, Postgres, Redshift, Databricks, DuckDB, MySQL. + */ +const TYPE_MAP: Record = { + int: "integer", integer: "integer", bigint: "integer", smallint: "integer", + tinyint: "integer", int64: "integer", int32: "integer", + number: "numeric", numeric: "numeric", decimal: "numeric", + float: "float", double: "float", float64: "float", real: "float", + varchar: "string", string: "string", text: "string", char: "string", + character: "string", "character varying": "string", + boolean: "boolean", bool: "boolean", + date: "date", + timestamp: "timestamp", timestamp_ntz: "timestamp", timestamp_ltz: "timestamp", + timestamp_tz: "timestamp", datetime: "timestamp", +} + +// --------------------------------------------------------------------------- +// Upstream dependency resolution (from parseManifest output, no raw manifest) +// --------------------------------------------------------------------------- + +interface UpstreamDep { + unique_id: string + name: string + source_name?: string + schema_name?: string + database?: string + description?: string + resource_type: "model" | "source" | "seed" | "snapshot" + materialized?: string + columns: ModelColumn[] +} + +function resolveUpstream( + upstreamIds: string[], + models: DbtModelInfo[], + sources: DbtSourceInfo[], + seeds: DbtModelInfo[], + snapshots: DbtModelInfo[], +): UpstreamDep[] { + // Map each unique_id to its info + resource_type. + // Seeds, snapshots, and models all use ref() so they share handling. + const typedMap = new Map() + for (const m of models) typedMap.set(m.unique_id, { info: m, kind: "model" }) + for (const s of seeds) typedMap.set(s.unique_id, { info: s, kind: "seed" }) + for (const s of snapshots) typedMap.set(s.unique_id, { info: s, kind: "snapshot" }) + const sourceMap = new Map(sources.map((s) => [s.unique_id, s])) + + const result: UpstreamDep[] = [] + for (const uid of upstreamIds) { + const entry = typedMap.get(uid) + if (entry) { + result.push({ + unique_id: uid, + name: entry.info.name, + schema_name: entry.info.schema_name, + database: entry.info.database, + description: entry.info.description, + resource_type: entry.kind, + materialized: entry.info.materialized, + columns: entry.info.columns, + }) + continue + } + const source = sourceMap.get(uid) + if (source) { + result.push({ + unique_id: uid, + name: source.name, + source_name: source.source_name, + schema_name: source.schema_name, + database: source.database, + description: source.description, + resource_type: "source", + columns: source.columns, + }) + } + } + return result +} + +function depRef(dep: UpstreamDep): string { + // Models, seeds, and snapshots all use ref(); only sources use source() + return dep.resource_type === "source" + ? `source('${dep.source_name}', '${dep.name}')` + : `ref('${dep.name}')` +} + +// --------------------------------------------------------------------------- +// Column enrichment (warehouse fallback when manifest has no columns) +// --------------------------------------------------------------------------- + +/** + * Enrich deps that have no manifest columns by querying the warehouse. + * Uses schema.inspect (dialect-agnostic, no dbt subprocess needed). + * Runs in parallel across all deps. All calls are best-effort. + * + * If both manifest and schema.inspect return nothing, the generated test + * will have placeholder rows. The skill layer can then call + * `altimate-dbt columns --model ` via bash to discover columns + * through dbt's own adapter (which handles venv/pyenv/conda resolution). + */ +async function enrichColumns(deps: UpstreamDep[]): Promise { + await Promise.all( + deps.map(async (dep) => { + if (dep.materialized === "ephemeral" || dep.columns.length > 0) return + + const tableName = dep.name + if (!tableName) return + + try { + const r = await dispatcherCall("schema.inspect", { + table: tableName, + schema_name: dep.schema_name, + ...(dep.database && { database: dep.database }), + }) + if (r.columns?.length) { + dep.columns = r.columns.map((c) => ({ + name: c.name, data_type: c.data_type, description: undefined, + })) + } + } catch { /* warehouse unavailable — will use placeholder rows */ } + }), + ) +} + +// --------------------------------------------------------------------------- +// Main entry point +// --------------------------------------------------------------------------- + +export async function generateDbtUnitTests( + params: DbtUnitTestGenParams, +): Promise { + const warnings: string[] = [] + const antiPatterns: string[] = [] + const maxScenarios = params.max_scenarios ?? DEFAULT_MAX_SCENARIOS + + // 1. Parse manifest via existing parseManifest() — no raw manifest reading + const manifest = await parseManifest({ path: params.manifest_path }) + if (manifest.model_count === 0 && manifest.source_count === 0) { + return failResult(params.model, "Manifest file not found or invalid. Run `dbt compile` first.") + } + + // 2. Find model in parsed manifest + const model = manifest.models.find( + (m) => m.name === params.model || m.unique_id === params.model, + ) + if (!model) { + return failResult(params.model, `Model '${params.model}' not found in manifest. Available models: ${manifest.models.slice(0, 10).map((m) => m.name).join(", ")}`) + } + + // 3. Get compiled SQL + lineage via existing dbtLineage() (reads manifest once, cached) + const dialect = params.dialect || manifest.adapter_type || undefined + const lineageResult = dbtLineage({ manifest_path: params.manifest_path, model: params.model, dialect }) + const compiledSql = lineageResult.compiled_sql || "" + if (!compiledSql) { + return failResult(model.name, "No compiled SQL found. Run `dbt compile` first, then retry.") + } + + // 4. Extract lineage map from lineage result + let lineageMap: Record = {} + if (lineageResult.confidence !== "low") { + lineageMap = extractLineageMap(lineageResult.raw_lineage) + } else { + warnings.push("Column lineage analysis failed — generating tests without lineage context") + } + + // 5. Resolve upstream deps (models, sources, seeds, snapshots) + const upstreamDeps = resolveUpstream( + model.depends_on, manifest.models, manifest.sources, manifest.seeds, manifest.snapshots, + ) + // Warn if any deps couldn't be resolved (e.g., unknown resource types like + // semantic_model.*, or deps missing from the manifest). This prevents the + // generated YAML from silently missing required `given` inputs. + const resolvedIds = new Set(upstreamDeps.map((d) => d.unique_id)) + const unresolved = model.depends_on.filter((id) => !resolvedIds.has(id)) + if (unresolved.length > 0) { + warnings.push( + `Could not resolve ${unresolved.length} upstream dep(s) — generated YAML may be missing mock inputs: ${unresolved.join(", ")}`, + ) + } + const materialized = model.materialized || "view" + + // 6. Enrich columns from warehouse (parallel, best-effort) + await enrichColumns(upstreamDeps) + + // 7. Anti-patterns via existing sql.optimize + try { + // Build schema context from upstream dep columns + const schemaContext = buildSchemaContextFromDeps(upstreamDeps) + const r = await dispatcherCall("sql.optimize", { sql: compiledSql, dialect, schema_context: schemaContext ?? undefined }) + for (const ap of r.anti_patterns || []) { if (ap.message) antiPatterns.push(ap.message) } + } catch { /* non-critical */ } + + // 8. Detect scenarios from SQL keywords + const scenarios = detectScenarios(compiledSql, materialized) + + // 9. Ephemeral deps + const ephemeralDeps = new Set() + for (const dep of upstreamDeps) { + if (dep.materialized === "ephemeral") { + ephemeralDeps.add(dep.unique_id) + warnings.push(`Upstream '${dep.name}' is ephemeral — using sql format for its mock input`) + } + } + + // 10. Output columns (manifest → warehouse fallback) + let outputColumns = model.columns + if (outputColumns.length === 0) { + try { + const r = await dispatcherCall("schema.inspect", { table: model.name, schema_name: model.schema_name, ...(model.database && { database: model.database }) }) + if (r.columns?.length) outputColumns = r.columns.map((c: any) => ({ name: c.name, data_type: c.data_type, description: undefined })) + } catch { /* model may not be materialized yet */ } + } + + // 11. Generate test cases + const tests = buildTests(model.name, upstreamDeps, scenarios, outputColumns, ephemeralDeps, maxScenarios) + + // 12. YAML + const yaml = assembleYaml(model.name, tests) + + // 13. Semantic context for LLM refinement + const context: UnitTestContext = { + model_description: model.description, + compiled_sql: compiledSql, + column_lineage: lineageMap, + upstream: upstreamDeps.map((d) => ({ + name: d.name, ref: depRef(d), description: d.description, columns: d.columns, + })), + output_columns: outputColumns, + } + + return { + success: true, + model_name: model.name, + model_unique_id: model.unique_id, + materialized, + dependency_count: model.depends_on.length, + tests, + yaml, + context, + anti_patterns: antiPatterns, + warnings, + } +} + +/** Build schema context from resolved upstream deps (for sql.optimize). */ +function buildSchemaContextFromDeps(deps: UpstreamDep[]): Record | null { + const tables: Record = {} + for (const dep of deps) { + if (dep.columns.length === 0) continue + tables[dep.name] = { + columns: dep.columns.map((c) => ({ name: c.name, type: c.data_type || "" })), + } + } + if (Object.keys(tables).length === 0) return null + return { tables, version: "1" } +} + +// --------------------------------------------------------------------------- +// Lineage extraction +// --------------------------------------------------------------------------- + +function extractLineageMap(rawLineage: Record): Record { + const map: Record = {} + try { + const dict = (rawLineage.column_dict || rawLineage.columns || {}) as Record + for (const [col, srcList] of Object.entries(dict)) { + if (Array.isArray(srcList)) { + map[col] = srcList.map((s: any) => + `${s.source_table || s.table || "?"}.${s.source_column || s.column || "?"}`, + ) + } + } + } catch { /* ignore */ } + return map +} + +// --------------------------------------------------------------------------- +// Scenario detection (simple keyword checks — LLM reads SQL for details) +// --------------------------------------------------------------------------- + +interface Scenario { + category: string + description: string + mockStyle: "happy_path" | "null_edge" | "boundary" + rowCount: number +} + +/** + * Detect which test scenarios to generate based on SQL keyword presence. + * Intentionally simple — the LLM skill layer reads the compiled SQL + * directly for nuanced logic analysis. This just determines the scaffold. + */ +function detectScenarios(sql: string, materialized: string): Scenario[] { + // Strip SQL comments AND string literals to avoid false positives + // (e.g., "-- old/code", "'2024/01/15'", "'a/b'" matching division). + const cleaned = sql + .replace(/--.*$/gm, "") // line comments + .replace(/\/\*[\s\S]*?\*\//g, "") // block comments + .replace(/'(?:[^'\\]|\\.|'')*'/g, "''") // single-quoted strings + .replace(/"(?:[^"\\]|\\.|"")*"/g, '""') // double-quoted identifiers/strings + const upper = cleaned.toUpperCase() + const scenarios: Scenario[] = [ + { category: "happy_path", description: "Verify correct output for standard input data", mockStyle: "happy_path", rowCount: 2 }, + ] + + if (/\bCASE\b/.test(upper) || /\bCOALESCE\b/.test(upper) || /\bNVL\b/.test(upper) || /\bIFNULL\b/.test(upper)) { + scenarios.push({ category: "null_handling", description: "Verify NULL/conditional handling", mockStyle: "null_edge", rowCount: 2 }) + } + if (/\bJOIN\b/.test(upper)) { + scenarios.push({ category: "edge_case", description: "Verify JOIN behavior with non-matching rows", mockStyle: "boundary", rowCount: 2 }) + } + if (/\bGROUP\s+BY\b/.test(upper) || /\bOVER\s*\(/.test(upper)) { + scenarios.push({ category: "edge_case", description: "Verify aggregation/window with multiple rows", mockStyle: "happy_path", rowCount: 3 }) + } + // Division detection — match `/` between two "operand-like" tokens. + // An operand is: identifier, dotted identifier (a.b), function call like + // SUM(...), CAST(... AS ...), COALESCE(...), NULLIF(...), or parenthesized + // expression. String literals are already stripped above. + // We deliberately exclude `/*` (block comment open, already stripped) and + // `//` (some dialects use it but not in compiled dbt SQL). + const operand = /(?:\w+\s*\([^)]*\)|\w+(?:\.\w+)?|\([^)]*\))/.source + const divisionRegex = new RegExp(`${operand}\\s*\\/(?!\\*|\\/)\\s*${operand}`) + if (divisionRegex.test(cleaned)) { + scenarios.push({ category: "edge_case", description: "Verify divide-by-zero protection", mockStyle: "boundary", rowCount: 2 }) + } + if (materialized === "incremental") { + scenarios.push({ category: "incremental", description: "Verify incremental logic processes only new rows", mockStyle: "happy_path", rowCount: 2 }) + } + + return scenarios +} + +// --------------------------------------------------------------------------- +// Mock data generation +// --------------------------------------------------------------------------- + +function mockValueForType(dataType: string, rowIndex: number): unknown { + const normalized = (dataType || "string").toLowerCase().replace(/\(.*\)/, "").trim() + const values = MOCK_VALUES[TYPE_MAP[normalized] || "string"] || MOCK_VALUES.string! + return values[rowIndex % values.length] +} + +function isKeyColumn(name: string): boolean { + const l = name.toLowerCase() + return l.endsWith("_id") || l === "id" || l.endsWith("_key") || l === "key" +} + +function boundaryValue(dataType: string): unknown { + const cat = TYPE_MAP[(dataType || "string").toLowerCase().replace(/\(.*\)/, "").trim()] || "string" + const map: Record = { + integer: 0, float: 0.0, numeric: 0.0, string: "", boolean: false, + date: "1970-01-01", timestamp: "1970-01-01 00:00:00", + } + return map[cat] ?? null +} + +function generateRows( + columns: ModelColumn[], + rowCount: number, + style: "happy_path" | "null_edge" | "boundary" | "empty", +): Record[] { + if (style === "empty") return [] + const rows: Record[] = [] + for (let i = 0; i < rowCount; i++) { + const row: Record = {} + for (const col of columns) { + if (style === "null_edge" && i === rowCount - 1 && !isKeyColumn(col.name)) { + row[col.name] = null + } else if (style === "boundary" && i === rowCount - 1) { + row[col.name] = boundaryValue(col.data_type) + } else { + row[col.name] = mockValueForType(col.data_type, i) + } + } + rows.push(row) + } + return rows +} + +// --------------------------------------------------------------------------- +// Test case generation +// --------------------------------------------------------------------------- + +function buildTests( + modelName: string, + deps: UpstreamDep[], + scenarios: Scenario[], + outputColumns: ModelColumn[], + ephemeralDeps: Set, + maxScenarios: number, +): UnitTestCase[] { + // Preserve the incremental scenario even when truncating to maxScenarios. + // Otherwise SQL with enough non-incremental triggers (JOIN + CASE + division) + // would push the incremental test out of the capped window, losing the + // `input: this` mock entirely for incremental models. + const capped = scenarios.slice(0, maxScenarios) + const incremental = scenarios.find((s) => s.category === "incremental") + if (incremental && !capped.includes(incremental)) { + // Replace the last non-happy-path scenario with the incremental one. + // Happy path is always first and must be kept. + capped[capped.length - 1] = incremental + } + + return capped.map((scenario, idx) => { + // Build the scenario suffix first, then truncate the model-name portion + // so the suffix is always preserved (prevents collisions for long names). + const suffix = `_${scenario.category}${idx > 0 ? `_${idx}` : ""}` + const prefix = "test_" + const maxLen = 64 + const modelBudget = maxLen - prefix.length - suffix.length + const truncatedModel = modelName.length > modelBudget ? modelName.slice(0, Math.max(1, modelBudget)) : modelName + const testName = sanitizeName(`${prefix}${truncatedModel}${suffix}`) + + const given: UnitTestMockInput[] = deps.map((dep) => { + const input = depRef(dep) + const isEphemeral = ephemeralDeps.has(dep.unique_id) + + if (dep.columns.length === 0) { + // Ephemeral models MUST use format: sql — dict format crashes dbt test + if (isEphemeral) { + return { + input, rows: [], format: "sql" as const, + sql: "SELECT 1 AS _placeholder -- REPLACE_WITH_ACTUAL_COLUMNS", + } + } + return { input, rows: [{ _placeholder: "REPLACE_WITH_ACTUAL_COLUMNS" }] } + } + + const rows = generateRows(dep.columns, scenario.rowCount, scenario.mockStyle) + + if (isEphemeral) { + return { + input, + rows: [], + format: "sql" as const, + sql: rows.map((row) => + `SELECT ${Object.entries(row).map(([k, v]) => `${formatSqlLiteral(v)} AS ${quoteIdent(k)}`).join(", ")}`, + ).join("\nUNION ALL\n"), + } + } + return { input, rows } + }) + + const expect_rows = outputColumns.length > 0 + ? generateRows(outputColumns, scenario.rowCount, scenario.mockStyle) + : [{ _note: "REPLACE — run `dbt test` to compute expected output" }] + + const test: UnitTestCase = { + name: testName, + description: scenario.description, + category: scenario.category, + target_logic: scenario.description, + given, + expect_rows, + } + + if (scenario.category === "incremental") { + test.overrides = { macros: { is_incremental: true } } + // dbt's incremental path references {{ this }} — must mock the existing target table + test.given.push({ + input: "this", + rows: outputColumns.length > 0 + ? generateRows(outputColumns, 1, "happy_path") + : [{ _placeholder: "REPLACE_WITH_EXISTING_TABLE_STATE" }], + }) + } + + return test + }) +} + +function sanitizeName(name: string): string { + return name.toLowerCase().replace(/[^a-z0-9_]/g, "_").replace(/_+/g, "_").replace(/^_|_$/g, "").slice(0, 64) +} + +function formatSqlLiteral(value: unknown): string { + if (value === null || value === undefined) return "NULL" + if (typeof value === "string") return `'${value.replace(/'/g, "''")}'` + if (typeof value === "boolean") return value ? "TRUE" : "FALSE" + return String(value) +} + +/** + * Quote a column identifier with double quotes (ANSI SQL / dbt standard). + * Handles reserved keywords (`select`, `order`, `group`), mixed case, and + * names with special characters. Escapes embedded double quotes. + */ +function quoteIdent(name: string): string { + return `"${name.replace(/"/g, '""')}"` +} + +// --------------------------------------------------------------------------- +// YAML assembly (uses `yaml` library — no hand-built string concatenation) +// --------------------------------------------------------------------------- + +export function assembleYaml(modelName: string, tests: UnitTestCase[]): string { + const doc = { + unit_tests: tests.map((test) => { + const entry: Record = { + name: test.name, + description: test.description, + model: modelName, + } + + if (test.overrides) entry.overrides = test.overrides + + entry.given = test.given.map((input) => { + if (input.format === "sql" && input.sql) { + return { input: input.input, format: "sql", rows: input.sql } + } + return { input: input.input, rows: input.rows } + }) + + entry.expect = { rows: test.expect_rows } + return entry + }), + } + + return YAML.stringify(doc, { lineWidth: 0 }) +} + +// --------------------------------------------------------------------------- +// Failure helper +// --------------------------------------------------------------------------- + +function failResult(modelName: string, error: string): DbtUnitTestGenResult { + return { + success: false, model_name: modelName, materialized: undefined, + dependency_count: 0, tests: [], yaml: "", anti_patterns: [], warnings: [], error, + } +} diff --git a/packages/opencode/src/altimate/native/types.ts b/packages/opencode/src/altimate/native/types.ts index 16a7f4e062..a6d8a90441 100644 --- a/packages/opencode/src/altimate/native/types.ts +++ b/packages/opencode/src/altimate/native/types.ts @@ -104,6 +104,8 @@ export interface SqlOptimizeResult { export interface SchemaInspectParams { table: string schema_name?: string + /** Database/catalog name — needed for cross-database queries (Snowflake, BigQuery) */ + database?: string warehouse?: string } @@ -172,6 +174,7 @@ export interface ModelColumn { export interface DbtModelInfo { unique_id: string name: string + description?: string schema_name?: string database?: string materialized?: string @@ -182,6 +185,7 @@ export interface DbtModelInfo { export interface DbtSourceInfo { unique_id: string name: string + description?: string source_name: string schema_name?: string database?: string @@ -198,11 +202,102 @@ export interface DbtManifestResult { models: DbtModelInfo[] sources: DbtSourceInfo[] tests: DbtTestInfo[] + /** Seeds parsed from the manifest (extracted like models for ref() resolution) */ + seeds: DbtModelInfo[] + /** Snapshots parsed from the manifest (extracted like models for ref() resolution) */ + snapshots: DbtModelInfo[] source_count: number model_count: number test_count: number snapshot_count: number seed_count: number + /** Adapter type from manifest metadata (e.g. "snowflake", "bigquery") */ + adapter_type?: string +} + +// --- dbt Unit Test Generation --- + +export interface DbtUnitTestGenParams { + /** Path to dbt manifest.json (must be compiled first) */ + manifest_path: string + /** Model name or unique_id to generate tests for */ + model: string + /** SQL dialect override (auto-detected from manifest if omitted) */ + dialect?: string + /** Number of test scenarios to generate (default: 3) */ + max_scenarios?: number +} + +/** A single mock input for a ref() or source() dependency */ +export interface UnitTestMockInput { + /** e.g. ref('stg_orders') or source('raw', 'orders') */ + input: string + /** Mock rows in dict format */ + rows: Record[] + /** Use sql format instead of dict (required for ephemeral models) */ + format?: "dict" | "sql" + /** Raw SQL when format is "sql" */ + sql?: string +} + +/** A single generated unit test case */ +export interface UnitTestCase { + /** Test name (snake_case, descriptive) */ + name: string + /** Human-readable description of what this test verifies */ + description: string + /** Category: happy_path, null_handling, edge_case, boundary, incremental */ + category: string + /** Which logic branch or SQL construct this test targets */ + target_logic: string + /** Mock inputs for upstream dependencies */ + given: UnitTestMockInput[] + /** Expected output rows */ + expect_rows: Record[] + /** Macro overrides (e.g., is_incremental) */ + overrides?: { + macros?: Record + vars?: Record + } +} + +/** Semantic context about the model and its lineage for LLM-assisted refinement */ +export interface UnitTestContext { + /** Model-level description from schema.yml */ + model_description?: string + /** Compiled SQL of the model under test */ + compiled_sql: string + /** Column lineage: output_col → ["input_table.input_col", ...] */ + column_lineage: Record + /** Upstream dependency context: name, description, columns with descriptions */ + upstream: Array<{ + name: string + ref: string // e.g. "ref('stg_orders')" or "source('raw', 'orders')" + description?: string + columns: Array<{ name: string; data_type: string; description?: string }> + }> + /** Output column descriptions */ + output_columns: Array<{ name: string; data_type: string; description?: string }> +} + +export interface DbtUnitTestGenResult { + success: boolean + model_name: string + model_unique_id?: string + materialized?: string + /** Number of upstream dependencies */ + dependency_count: number + /** Generated test cases */ + tests: UnitTestCase[] + /** Complete YAML output ready to paste into schema.yml */ + yaml: string + /** Semantic context for LLM-assisted test refinement */ + context?: UnitTestContext + /** SQL anti-patterns that informed edge case generation */ + anti_patterns: string[] + /** Warnings (e.g., missing compiled SQL, ephemeral deps) */ + warnings: string[] + error?: string } // --- Warehouse --- @@ -983,6 +1078,7 @@ export const BridgeMethods = { "dbt.run": {} as { params: DbtRunParams; result: DbtRunResult }, "dbt.manifest": {} as { params: DbtManifestParams; result: DbtManifestResult }, "dbt.lineage": {} as { params: DbtLineageParams; result: DbtLineageResult }, + "dbt.unit_test_gen": {} as { params: DbtUnitTestGenParams; result: DbtUnitTestGenResult }, "warehouse.list": {} as { params: WarehouseListParams; result: WarehouseListResult }, "warehouse.test": {} as { params: WarehouseTestParams; result: WarehouseTestResult }, "warehouse.add": {} as { params: WarehouseAddParams; result: WarehouseAddResult }, diff --git a/packages/opencode/src/altimate/prompts/builder.txt b/packages/opencode/src/altimate/prompts/builder.txt index d4a880869a..6816abbbf9 100644 --- a/packages/opencode/src/altimate/prompts/builder.txt +++ b/packages/opencode/src/altimate/prompts/builder.txt @@ -123,7 +123,8 @@ Skills are specialized workflows that compose multiple tools. Invoke them proact | Skill | Invoke When | |-------|-------------| | `/dbt-develop` | User wants to create, modify, or scaffold dbt models (staging, intermediate, marts, incremental). Always use for model creation. | -| `/dbt-test` | User wants to add tests (schema tests, unit tests, data quality checks). Also auto-generates edge-case tests via `altimate_core_testgen`. | +| `/dbt-test` | User wants to add schema tests (not_null, unique, relationships, accepted_values) or debug a failing test. | +| `/dbt-unit-tests` | User wants to generate dbt unit tests (v1.8+) — mock inputs + expected outputs for testing model logic. Uses `dbt_unit_test_gen` to scaffold YAML from compiled manifest. | | `/dbt-docs` | User wants to document models — column descriptions, model descriptions, doc blocks in schema.yml. | | `/dbt-troubleshoot` | Something is broken — compilation errors, runtime failures, wrong data, slow builds. Uses `altimate_core_fix` and `sql_fix` for auto-repair. | | `/dbt-analyze` | User wants to understand impact before shipping — downstream consumers, breaking changes, blast radius. Uses `dbt_lineage` for column-level analysis. | diff --git a/packages/opencode/src/altimate/tools/dbt-unit-test-gen.ts b/packages/opencode/src/altimate/tools/dbt-unit-test-gen.ts new file mode 100644 index 0000000000..6fbad97c21 --- /dev/null +++ b/packages/opencode/src/altimate/tools/dbt-unit-test-gen.ts @@ -0,0 +1,186 @@ +// altimate_change start — dbt unit test generation tool +import z from "zod" +import { Tool } from "../../tool/tool" +import { Dispatcher } from "../native" +import type { DbtUnitTestGenResult } from "../native/types" + +export const DbtUnitTestGenTool = Tool.define("dbt_unit_test_gen", { + description: + "Generate dbt unit tests for a model. Parses manifest to extract dependencies, analyzes SQL for testable logic (CASE/WHEN, NULLs, JOINs, window functions), generates type-correct mock inputs, and assembles complete YAML ready to paste into schema.yml. Requires a compiled manifest (run `dbt compile` first).", + parameters: z.object({ + manifest_path: z + .string() + .describe("Path to compiled dbt manifest.json (e.g. target/manifest.json)"), + model: z + .string() + .describe("Model name (e.g. 'fct_orders') or unique_id (e.g. 'model.project.fct_orders')"), + dialect: z + .string() + .optional() + .describe("SQL dialect override (auto-detected from manifest if omitted)"), + max_scenarios: z + .number() + .optional() + .describe("Maximum number of test scenarios to generate (default: 3)"), + }), + async execute(args, ctx) { + try { + const result = await Dispatcher.call("dbt.unit_test_gen", { + manifest_path: args.manifest_path, + model: args.model, + dialect: args.dialect, + max_scenarios: args.max_scenarios, + }) + + if (!result.success) { + return { + title: "Unit Test Gen: FAILED", + metadata: { + success: false, + model_name: result.model_name, + error: result.error, + }, + output: `Failed to generate unit tests: ${result.error}`, + } + } + + return { + title: `Unit Test Gen: ${result.tests.length} test(s) for ${result.model_name}`, + metadata: { + success: true, + model_name: result.model_name, + model_unique_id: result.model_unique_id, + materialized: result.materialized, + test_count: result.tests.length, + dependency_count: result.dependency_count, + anti_pattern_count: result.anti_patterns.length, + warning_count: result.warnings.length, + }, + output: formatOutput(result), + } + } catch (e) { + const msg = e instanceof Error ? e.message : String(e) + return { + title: "Unit Test Gen: ERROR", + metadata: { success: false, error: msg }, + output: `Failed: ${msg}`, + } + } + }, +}) + +function formatOutput(result: DbtUnitTestGenResult): string { + const lines: string[] = [] + + // Summary + lines.push("=== Unit Test Generation Summary ===") + lines.push(`Model: ${result.model_name}`) + if (result.context?.model_description) { + lines.push(`Description: ${result.context.model_description}`) + } + if (result.materialized) lines.push(`Materialization: ${result.materialized}`) + lines.push(`Upstream dependencies: ${result.dependency_count}`) + lines.push(`Tests generated: ${result.tests.length}`) + + // Semantic context — helps the LLM refine test values + if (result.context) { + const ctx = result.context + + // Upstream dependency context with descriptions + if (ctx.upstream.length > 0) { + lines.push("") + lines.push("=== Upstream Dependencies ===") + for (const up of ctx.upstream) { + lines.push(`\n${up.ref}`) + if (up.description) lines.push(` ${up.description}`) + const described = up.columns.filter((c) => c.description) + if (described.length > 0) { + lines.push(" Columns:") + for (const col of up.columns) { + const desc = col.description ? ` — ${col.description}` : "" + lines.push(` ${col.name} (${col.data_type || "?"})${desc}`) + } + } else if (up.columns.length > 0) { + lines.push(` Columns: ${up.columns.map((c) => `${c.name} (${c.data_type || "?"})`).join(", ")}`) + } + } + } + + // Column lineage — which inputs drive which outputs + const lineageEntries = Object.entries(ctx.column_lineage) + if (lineageEntries.length > 0) { + lines.push("") + lines.push("=== Column Lineage (output ← inputs) ===") + for (const [outputCol, sources] of lineageEntries) { + lines.push(` ${outputCol} ← ${sources.join(", ")}`) + } + } + + // Output column descriptions + const describedOutputs = ctx.output_columns.filter((c) => c.description) + if (describedOutputs.length > 0) { + lines.push("") + lines.push("=== Output Columns ===") + for (const col of ctx.output_columns) { + const desc = col.description ? ` — ${col.description}` : "" + lines.push(` ${col.name} (${col.data_type || "?"})${desc}`) + } + } + } + + // Warnings + if (result.warnings.length > 0) { + lines.push("") + lines.push("=== Warnings ===") + for (const w of result.warnings) { + lines.push(`- ${w}`) + } + } + + // Anti-patterns that informed test generation + if (result.anti_patterns.length > 0) { + lines.push("") + lines.push("=== Anti-patterns detected (edge cases generated) ===") + for (const ap of result.anti_patterns) { + lines.push(`- ${ap}`) + } + } + + // Test case descriptions + lines.push("") + lines.push("=== Generated Test Cases ===") + for (const test of result.tests) { + lines.push(`\n--- ${test.name} [${test.category}] ---`) + lines.push(` ${test.description}`) + lines.push(` Target: ${test.target_logic}`) + lines.push(` Inputs: ${test.given.length} upstream ref(s)`) + lines.push(` Expected rows: ${test.expect_rows.length}`) + if (test.overrides) { + if (test.overrides.macros) { + lines.push(` Macro overrides: ${JSON.stringify(test.overrides.macros)}`) + } + if (test.overrides.vars) { + lines.push(` Var overrides: ${JSON.stringify(test.overrides.vars)}`) + } + } + } + + // YAML output + lines.push("") + lines.push("=== YAML (paste into schema.yml or _unit_tests.yml) ===") + lines.push("") + lines.push(result.yaml) + + // Next steps + lines.push("") + lines.push("=== Next Steps ===") + lines.push("1. Review the generated YAML — adjust expected output values if needed") + lines.push("2. The expected outputs are placeholder values based on column types") + lines.push("3. For accurate expected outputs, run the model SQL against the mock data:") + lines.push(" altimate-dbt test --model ") + lines.push("4. If tests fail, use the error message to fix expected values") + lines.push("5. Add the YAML to your schema.yml or a dedicated _unit_tests.yml file") + + return lines.join("\n") +} +// altimate_change end diff --git a/packages/opencode/src/tool/registry.ts b/packages/opencode/src/tool/registry.ts index 075291248f..1a19066168 100644 --- a/packages/opencode/src/tool/registry.ts +++ b/packages/opencode/src/tool/registry.ts @@ -47,6 +47,9 @@ import { WarehouseDiscoverTool } from "../altimate/tools/warehouse-discover" import { McpDiscoverTool } from "../altimate/tools/mcp-discover" import { DbtManifestTool } from "../altimate/tools/dbt-manifest" +// altimate_change start - import dbt unit test generation tool +import { DbtUnitTestGenTool } from "../altimate/tools/dbt-unit-test-gen" +// altimate_change end import { DbtProfilesTool } from "../altimate/tools/dbt-profiles" import { DbtLineageTool } from "../altimate/tools/dbt-lineage" import { SchemaIndexTool } from "../altimate/tools/schema-index" @@ -223,6 +226,9 @@ export namespace ToolRegistry { // altimate_change end DbtManifestTool, + // altimate_change start - register dbt unit test generation tool + DbtUnitTestGenTool, + // altimate_change end DbtProfilesTool, DbtLineageTool, SchemaIndexTool, diff --git a/packages/opencode/test/altimate/dbt-unit-test-gen.test.ts b/packages/opencode/test/altimate/dbt-unit-test-gen.test.ts new file mode 100644 index 0000000000..b711b722ba --- /dev/null +++ b/packages/opencode/test/altimate/dbt-unit-test-gen.test.ts @@ -0,0 +1,588 @@ +import { describe, test, expect } from "bun:test" +import fs from "fs" +import path from "path" +import YAML from "yaml" +import { tmpdir } from "../fixture/fixture" +import { generateDbtUnitTests, assembleYaml } from "../../src/altimate/native/dbt/unit-tests" +import type { UnitTestCase } from "../../src/altimate/native/types" + +// --------------------------------------------------------------------------- +// Helpers — each test uses `await using tmp = await tmpdir()` for its own +// disposable tmpdir. No suite-level state. +// --------------------------------------------------------------------------- + +/** Write a manifest JSON into the given tmp dir and return its absolute path. */ +function writeManifestTo(dirPath: string, content: object | string): string { + const p = path.join(dirPath, "manifest.json") + fs.writeFileSync(p, typeof content === "string" ? content : JSON.stringify(content)) + return p +} + +function makeManifest(overrides?: { + modelName?: string; materialized?: string; compiledSql?: string + modelColumns?: Record; upstreamName?: string + upstreamColumns?: Record; upstreamMaterialized?: string + adapterType?: string; sources?: Record +}) { + const o = overrides || {} + const modelName = o.modelName ?? "fct_orders" + const upstreamName = o.upstreamName ?? "stg_orders" + const proj = "my_project" + return { + metadata: { dbt_version: "1.8.0", adapter_type: o.adapterType ?? "snowflake" }, + nodes: { + [`model.${proj}.${modelName}`]: { + resource_type: "model", name: modelName, schema: "analytics", + config: { materialized: o.materialized ?? "table" }, + depends_on: { nodes: [`model.${proj}.${upstreamName}`] }, + columns: o.modelColumns ?? { + order_id: { name: "order_id", data_type: "INTEGER" }, + order_total: { name: "order_total", data_type: "NUMERIC" }, + }, + compiled_code: o.compiledSql ?? `SELECT order_id, quantity * unit_price AS order_total FROM ${upstreamName}`, + }, + [`model.${proj}.${upstreamName}`]: { + resource_type: "model", name: upstreamName, schema: "staging", + config: { materialized: o.upstreamMaterialized ?? "view" }, + depends_on: { nodes: [] }, + columns: o.upstreamColumns ?? { + order_id: { name: "order_id", data_type: "INTEGER" }, + quantity: { name: "quantity", data_type: "INTEGER" }, + unit_price: { name: "unit_price", data_type: "NUMERIC" }, + }, + }, + }, + sources: o.sources ?? {}, + } +} + +// --------------------------------------------------------------------------- +// generateDbtUnitTests +// --------------------------------------------------------------------------- + +describe("generateDbtUnitTests", () => { + test("returns error when manifest file does not exist", async () => { + const r = await generateDbtUnitTests({ manifest_path: "/tmp/nonexistent.json", model: "fct_orders" }) + expect(r.success).toBe(false) + expect(r.error).toContain("Manifest file not found") + }) + + test("returns error when model not found", async () => { + await using tmp = await tmpdir() + const r = await generateDbtUnitTests({ manifest_path: writeManifestTo(tmp.path, makeManifest()), model: "nope" }) + expect(r.success).toBe(false) + expect(r.error).toContain("not found in manifest") + }) + + test("returns error when compiled SQL is missing", async () => { + await using tmp = await tmpdir() + const m = makeManifest({ compiledSql: "" }) + const key = Object.keys(m.nodes).find((k) => k.includes("fct_orders"))! + ;(m.nodes as any)[key].compiled_code = "" + const r = await generateDbtUnitTests({ manifest_path: writeManifestTo(tmp.path, m), model: "fct_orders" }) + expect(r.success).toBe(false) + expect(r.error).toContain("No compiled SQL found") + }) + + test("generates happy path test for simple model", async () => { + await using tmp = await tmpdir() + const r = await generateDbtUnitTests({ manifest_path: writeManifestTo(tmp.path, makeManifest()), model: "fct_orders" }) + expect(r.success).toBe(true) + expect(r.model_name).toBe("fct_orders") + expect(r.materialized).toBe("table") + expect(r.dependency_count).toBe(1) + expect(r.tests.length).toBeGreaterThanOrEqual(1) + expect(r.tests[0].category).toBe("happy_path") + expect(r.tests[0].given.length).toBe(1) + expect(r.tests[0].given[0].input).toBe("ref('stg_orders')") + expect(r.tests[0].given[0].rows.length).toBeGreaterThan(0) + }) + + test("YAML output is valid and parseable", async () => { + await using tmp = await tmpdir() + const r = await generateDbtUnitTests({ manifest_path: writeManifestTo(tmp.path, makeManifest()), model: "fct_orders" }) + expect(r.yaml).toBeTruthy() + // Round-trip: parse the generated YAML and verify structure + const parsed = YAML.parse(r.yaml) + expect(parsed.unit_tests).toBeArray() + expect(parsed.unit_tests[0].name).toContain("fct_orders") + expect(parsed.unit_tests[0].model).toBe("fct_orders") + expect(parsed.unit_tests[0].given).toBeArray() + expect(parsed.unit_tests[0].expect.rows).toBeArray() + }) + + test("detects CASE/WHEN and generates null_handling test", async () => { + await using tmp = await tmpdir() + const r = await generateDbtUnitTests({ + manifest_path: writeManifestTo(tmp.path, makeManifest({ + compiledSql: `SELECT order_id, CASE WHEN status = 'done' THEN amount ELSE 0 END AS net FROM stg_orders`, + })), + model: "fct_orders", + }) + expect(r.success).toBe(true) + expect(r.tests.length).toBeGreaterThan(1) + expect(r.tests.map((t) => t.category)).toContain("null_handling") + }) + + test("detects division and generates boundary test", async () => { + await using tmp = await tmpdir() + const r = await generateDbtUnitTests({ + manifest_path: writeManifestTo(tmp.path, makeManifest({ + compiledSql: `SELECT order_id, amount / quantity AS unit_price FROM stg_orders`, + })), + model: "fct_orders", + }) + expect(r.success).toBe(true) + expect(r.tests.map((t) => t.category)).toContain("edge_case") + }) + + test("generates incremental test with input: this mock", async () => { + await using tmp = await tmpdir() + const r = await generateDbtUnitTests({ + manifest_path: writeManifestTo(tmp.path, makeManifest({ materialized: "incremental" })), + model: "fct_orders", + max_scenarios: 5, + }) + expect(r.success).toBe(true) + const inc = r.tests.find((t) => t.category === "incremental") + expect(inc).toBeDefined() + expect(inc!.overrides?.macros?.is_incremental).toBe(true) + // Must include input: this for existing table state + const thisInput = inc!.given.find((g) => g.input === "this") + expect(thisInput).toBeDefined() + expect(thisInput!.rows.length).toBeGreaterThan(0) + }) + + test("ephemeral deps with no columns use sql format, not dict", async () => { + await using tmp = await tmpdir() + const r = await generateDbtUnitTests({ + manifest_path: writeManifestTo(tmp.path, makeManifest({ + upstreamMaterialized: "ephemeral", + upstreamColumns: {}, // no columns known + })), + model: "fct_orders", + }) + expect(r.success).toBe(true) + // The ephemeral dep should use sql format even with no columns + const ephInput = r.tests[0].given.find((g) => g.format === "sql") + expect(ephInput).toBeDefined() + expect(ephInput!.sql).toBeDefined() + }) + + test("resolves seed dependencies via ref()", async () => { + await using tmp = await tmpdir() + const m = makeManifest() + const key = Object.keys(m.nodes).find((k) => k.includes("fct_orders"))! + ;(m.nodes as any)[key].depends_on.nodes = ["seed.my_project.country_codes"] + ;(m.nodes as any)["seed.my_project.country_codes"] = { + resource_type: "seed", + name: "country_codes", + schema: "seeds", + config: { materialized: "seed" }, + depends_on: { nodes: [] }, + columns: { code: { name: "code", data_type: "VARCHAR" }, name: { name: "name", data_type: "VARCHAR" } }, + } + const r = await generateDbtUnitTests({ manifest_path: writeManifestTo(tmp.path, m), model: "fct_orders" }) + expect(r.success).toBe(true) + expect(r.dependency_count).toBe(1) + // Seed should resolve as ref(), not source() + expect(r.tests[0].given[0].input).toBe("ref('country_codes')") + expect(r.tests[0].given[0].rows.length).toBeGreaterThan(0) + }) + + test("warns when upstream deps cannot be resolved", async () => { + await using tmp = await tmpdir() + const m = makeManifest() + const key = Object.keys(m.nodes).find((k) => k.includes("fct_orders"))! + // Add an unresolvable dep — semantic_model.* is a real dbt resource type + // that parseManifest doesn't extract (and we don't support) + ;(m.nodes as any)[key].depends_on.nodes.push("semantic_model.my_project.orders_sm") + const r = await generateDbtUnitTests({ manifest_path: writeManifestTo(tmp.path, m), model: "fct_orders" }) + expect(r.success).toBe(true) + expect(r.warnings.some((w) => w.includes("Could not resolve") && w.includes("semantic_model"))).toBe(true) + }) + + test("resolves snapshot dependencies via ref()", async () => { + await using tmp = await tmpdir() + const m = makeManifest() + const key = Object.keys(m.nodes).find((k) => k.includes("fct_orders"))! + ;(m.nodes as any)[key].depends_on.nodes = ["snapshot.my_project.orders_snapshot"] + ;(m.nodes as any)["snapshot.my_project.orders_snapshot"] = { + resource_type: "snapshot", + name: "orders_snapshot", + schema: "snapshots", + config: { materialized: "snapshot" }, + depends_on: { nodes: [] }, + columns: { order_id: { name: "order_id", data_type: "INTEGER" }, status: { name: "status", data_type: "VARCHAR" } }, + } + const r = await generateDbtUnitTests({ manifest_path: writeManifestTo(tmp.path, m), model: "fct_orders" }) + expect(r.success).toBe(true) + expect(r.dependency_count).toBe(1) + expect(r.tests[0].given[0].input).toBe("ref('orders_snapshot')") + expect(r.tests[0].given[0].rows.length).toBeGreaterThan(0) + }) + + test("long model names preserve scenario suffix (no truncation collision)", async () => { + await using tmp = await tmpdir() + // 70-char model name — longer than 64-char test name limit + const longName = "fct_this_is_a_very_long_model_name_that_will_definitely_exceed_limits" + const r = await generateDbtUnitTests({ + manifest_path: writeManifestTo(tmp.path, makeManifest({ + modelName: longName, + compiledSql: `SELECT order_id, CASE WHEN x=1 THEN 'a' END, a/b FROM stg_orders`, + })), + model: longName, + max_scenarios: 5, + }) + expect(r.success).toBe(true) + // All test names should be unique — no collisions from truncation + const names = r.tests.map((t) => t.name) + expect(new Set(names).size).toBe(names.length) + // Scenario suffixes should be preserved + expect(names.some((n) => n.endsWith("_happy_path"))).toBe(true) + expect(names.some((n) => n.includes("null_handling") || n.includes("edge_case"))).toBe(true) + }) + + test("division in string literals does not trigger boundary scenario", async () => { + await using tmp = await tmpdir() + // The SQL has '/' only inside a string literal — should NOT trigger division edge case + const r = await generateDbtUnitTests({ + manifest_path: writeManifestTo(tmp.path, makeManifest({ + compiledSql: `SELECT order_id, '2024/01/15' AS date_str FROM stg_orders`, + })), + model: "fct_orders", + max_scenarios: 5, + }) + expect(r.success).toBe(true) + // Only happy_path should be generated (no division → no boundary test) + expect(r.tests.length).toBe(1) + expect(r.tests[0].category).toBe("happy_path") + }) + + test("test names are deterministic across runs", async () => { + await using tmp = await tmpdir() + const manifestPath = writeManifestTo(tmp.path, makeManifest({ + compiledSql: `SELECT order_id, CASE WHEN x=1 THEN 'a' ELSE 'b' END, a/b FROM stg_orders`, + })) + const r1 = await generateDbtUnitTests({ manifest_path: manifestPath, model: "fct_orders", max_scenarios: 5 }) + const r2 = await generateDbtUnitTests({ manifest_path: manifestPath, model: "fct_orders", max_scenarios: 5 }) + expect(r1.tests.map((t) => t.name)).toEqual(r2.tests.map((t) => t.name)) + expect(r1.yaml).toEqual(r2.yaml) + }) + + test("uses sql format for ephemeral upstream models", async () => { + await using tmp = await tmpdir() + const r = await generateDbtUnitTests({ + manifest_path: writeManifestTo(tmp.path, makeManifest({ upstreamMaterialized: "ephemeral" })), + model: "fct_orders", + }) + expect(r.success).toBe(true) + expect(r.warnings.some((w) => w.includes("ephemeral"))).toBe(true) + const sqlInput = r.tests[0].given.find((g) => g.format === "sql") + expect(sqlInput).toBeDefined() + expect(sqlInput!.sql).toContain("SELECT") + }) + + test("handles source() dependencies", async () => { + await using tmp = await tmpdir() + const m = makeManifest() + const key = Object.keys(m.nodes).find((k) => k.includes("fct_orders"))! + ;(m.nodes as any)[key].depends_on.nodes = ["source.my_project.raw.orders"] + m.sources = { + "source.my_project.raw.orders": { + name: "orders", source_name: "raw", resource_type: "source", schema: "raw_data", + columns: { order_id: { name: "order_id", data_type: "INTEGER" } }, + }, + } + const r = await generateDbtUnitTests({ manifest_path: writeManifestTo(tmp.path, m), model: "fct_orders" }) + expect(r.success).toBe(true) + expect(r.tests[0].given.find((g) => g.input.includes("source("))).toBeDefined() + }) + + test("respects max_scenarios", async () => { + await using tmp = await tmpdir() + const r = await generateDbtUnitTests({ + manifest_path: writeManifestTo(tmp.path, makeManifest({ + compiledSql: `SELECT order_id, CASE WHEN x=1 THEN 'a' ELSE 'b' END, amount/qty FROM stg_orders`, + })), + model: "fct_orders", + max_scenarios: 2, + }) + expect(r.tests.length).toBeLessThanOrEqual(2) + }) + + test("handles multiple upstream dependencies", async () => { + await using tmp = await tmpdir() + const m = makeManifest() + const key = Object.keys(m.nodes).find((k) => k.includes("fct_orders"))! + ;(m.nodes as any)[key].depends_on.nodes.push("model.my_project.dim_customers") + ;(m.nodes as any)["model.my_project.dim_customers"] = { + resource_type: "model", name: "dim_customers", + config: { materialized: "table" }, depends_on: { nodes: [] }, + columns: { customer_id: { name: "customer_id", data_type: "INTEGER" } }, + } + const r = await generateDbtUnitTests({ manifest_path: writeManifestTo(tmp.path, m), model: "fct_orders" }) + expect(r.dependency_count).toBe(2) + expect(r.tests[0].given.length).toBe(2) + }) + + test("model lookup by unique_id works", async () => { + await using tmp = await tmpdir() + const r = await generateDbtUnitTests({ manifest_path: writeManifestTo(tmp.path, makeManifest()), model: "model.my_project.fct_orders" }) + expect(r.success).toBe(true) + expect(r.model_name).toBe("fct_orders") + }) + + test("handles invalid JSON manifest", async () => { + await using tmp = await tmpdir() + const r = await generateDbtUnitTests({ manifest_path: writeManifestTo(tmp.path, "{{not json}}"), model: "fct_orders" }) + expect(r.success).toBe(false) + }) + + test("test names are valid identifiers and unique", async () => { + await using tmp = await tmpdir() + const r = await generateDbtUnitTests({ + manifest_path: writeManifestTo(tmp.path, makeManifest({ + compiledSql: `SELECT order_id, CASE WHEN x=1 THEN 'a' ELSE 'b' END, a/b FROM stg_orders`, + })), + model: "fct_orders", + max_scenarios: 5, + }) + for (const t of r.tests) { + expect(t.name).toMatch(/^[a-z_][a-z0-9_]*$/) + expect(t.name.length).toBeLessThanOrEqual(64) + } + expect(new Set(r.tests.map((t) => t.name)).size).toBe(r.tests.length) + }) +}) + +// --------------------------------------------------------------------------- +// assembleYaml — round-trip YAML validation +// --------------------------------------------------------------------------- + +describe("assembleYaml", () => { + test("produces parseable YAML with correct structure", () => { + const tests: UnitTestCase[] = [{ + name: "test_happy", description: "Happy path", category: "happy_path", + target_logic: "arithmetic", + given: [{ input: "ref('stg_orders')", rows: [{ order_id: 1, qty: 3, price: 10.0 }] }], + expect_rows: [{ order_id: 1, total: 30.0 }], + }] + const yaml = assembleYaml("fct_orders", tests) + const parsed = YAML.parse(yaml) + expect(parsed.unit_tests).toHaveLength(1) + expect(parsed.unit_tests[0].name).toBe("test_happy") + expect(parsed.unit_tests[0].model).toBe("fct_orders") + expect(parsed.unit_tests[0].given[0].input).toBe("ref('stg_orders')") + expect(parsed.unit_tests[0].given[0].rows[0].order_id).toBe(1) + expect(parsed.unit_tests[0].expect.rows[0].total).toBe(30.0) + }) + + test("handles ephemeral sql format", () => { + const tests: UnitTestCase[] = [{ + name: "test_eph", description: "Ephemeral", category: "happy_path", + target_logic: "passthrough", + given: [{ + input: "ref('eph')", rows: [], format: "sql", + sql: "SELECT 1 AS id, 'test' AS name\nUNION ALL\nSELECT 2 AS id, 'other' AS name", + }], + expect_rows: [{ id: 1 }], + }] + const yaml = assembleYaml("my_model", tests) + const parsed = YAML.parse(yaml) + expect(parsed.unit_tests[0].given[0].format).toBe("sql") + expect(parsed.unit_tests[0].given[0].rows).toContain("SELECT 1 AS id") + }) + + test("handles macro overrides for incremental", () => { + const tests: UnitTestCase[] = [{ + name: "test_inc", description: "Incremental", category: "incremental", + target_logic: "incremental", + given: [{ input: "ref('src')", rows: [{ id: 1 }] }], + expect_rows: [{ id: 1 }], + overrides: { macros: { is_incremental: true } }, + }] + const yaml = assembleYaml("fct", tests) + const parsed = YAML.parse(yaml) + expect(parsed.unit_tests[0].overrides.macros.is_incremental).toBe(true) + }) + + test("handles null values", () => { + const tests: UnitTestCase[] = [{ + name: "test_null", description: "Nulls", category: "null_handling", + target_logic: "COALESCE", + given: [{ input: "ref('src')", rows: [{ id: 1, discount: null }] }], + expect_rows: [{ id: 1, net: 100.0 }], + }] + const yaml = assembleYaml("fct", tests) + const parsed = YAML.parse(yaml) + expect(parsed.unit_tests[0].given[0].rows[0].discount).toBeNull() + }) + + test("handles date strings correctly", () => { + const tests: UnitTestCase[] = [{ + name: "test_date", description: "Dates", category: "happy_path", + target_logic: "date", + given: [{ input: "ref('src')", rows: [{ id: 1, dt: "2024-01-15", ts: "2024-01-15 10:30:00" }] }], + expect_rows: [{ id: 1, dt: "2024-01-15" }], + }] + const yaml = assembleYaml("fct", tests) + const parsed = YAML.parse(yaml) + expect(parsed.unit_tests[0].given[0].rows[0].dt).toBe("2024-01-15") + expect(parsed.unit_tests[0].given[0].rows[0].ts).toBe("2024-01-15 10:30:00") + }) + + test("multiple tests produce valid YAML", () => { + const tests: UnitTestCase[] = [ + { name: "test_a", description: "A", category: "happy_path", target_logic: "x", + given: [{ input: "ref('s')", rows: [{ id: 1 }] }], expect_rows: [{ id: 1 }] }, + { name: "test_b", description: "B", category: "edge_case", target_logic: "y", + given: [{ input: "ref('s')", rows: [{ id: 0 }] }], expect_rows: [{ id: 0 }] }, + ] + const yaml = assembleYaml("m", tests) + const parsed = YAML.parse(yaml) + expect(parsed.unit_tests).toHaveLength(2) + }) + + test("empty tests array", () => { + const yaml = assembleYaml("m", []) + const parsed = YAML.parse(yaml) + expect(parsed.unit_tests).toEqual([]) + }) + + test("booleans and var overrides", () => { + const tests: UnitTestCase[] = [{ + name: "test_vars", description: "Vars", category: "happy_path", target_logic: "x", + given: [{ input: "ref('s')", rows: [{ id: 1, active: true, deleted: false }] }], + expect_rows: [{ id: 1 }], + overrides: { vars: { run_date: "2024-01-15", lookback: 30 } }, + }] + const yaml = assembleYaml("m", tests) + const parsed = YAML.parse(yaml) + expect(parsed.unit_tests[0].given[0].rows[0].active).toBe(true) + expect(parsed.unit_tests[0].given[0].rows[0].deleted).toBe(false) + expect(parsed.unit_tests[0].overrides.vars.run_date).toBe("2024-01-15") + expect(parsed.unit_tests[0].overrides.vars.lookback).toBe(30) + }) +}) + +// --------------------------------------------------------------------------- +// Context: descriptions and lineage +// --------------------------------------------------------------------------- + +describe("context: descriptions and lineage", () => { + test("includes model description", async () => { + await using tmp = await tmpdir() + const m = makeManifest() + const key = Object.keys(m.nodes).find((k) => k.includes("fct_orders"))! + ;(m.nodes as any)[key].description = "Daily order totals" + const r = await generateDbtUnitTests({ manifest_path: writeManifestTo(tmp.path, m), model: "fct_orders" }) + expect(r.context?.model_description).toBe("Daily order totals") + }) + + test("includes upstream descriptions", async () => { + await using tmp = await tmpdir() + const m = makeManifest() + const key = Object.keys(m.nodes).find((k) => k.includes("stg_orders"))! + ;(m.nodes as any)[key].description = "Staged orders" + const r = await generateDbtUnitTests({ manifest_path: writeManifestTo(tmp.path, m), model: "fct_orders" }) + expect(r.context?.upstream[0].description).toBe("Staged orders") + expect(r.context?.upstream[0].ref).toBe("ref('stg_orders')") + }) + + test("includes column descriptions", async () => { + await using tmp = await tmpdir() + const r = await generateDbtUnitTests({ + manifest_path: writeManifestTo(tmp.path, makeManifest({ + upstreamColumns: { + order_id: { name: "order_id", data_type: "INTEGER", description: "PK" }, + unit_price: { name: "unit_price", data_type: "NUMERIC", description: "USD price" }, + }, + modelColumns: { + order_id: { name: "order_id", data_type: "INTEGER" }, + order_total: { name: "order_total", data_type: "NUMERIC", description: "qty * price" }, + }, + })), + model: "fct_orders", + }) + expect(r.context?.upstream[0].columns.find((c) => c.name === "unit_price")?.description).toBe("USD price") + expect(r.context?.output_columns.find((c) => c.name === "order_total")?.description).toBe("qty * price") + }) + + test("includes compiled SQL", async () => { + await using tmp = await tmpdir() + const sql = "SELECT order_id, quantity * unit_price AS order_total FROM stg_orders" + const r = await generateDbtUnitTests({ + manifest_path: writeManifestTo(tmp.path, makeManifest({ compiledSql: sql })), + model: "fct_orders", + }) + expect(r.context?.compiled_sql).toBe(sql) + }) + + test("source deps use source() ref format", async () => { + await using tmp = await tmpdir() + const m = makeManifest() + const key = Object.keys(m.nodes).find((k) => k.includes("fct_orders"))! + ;(m.nodes as any)[key].depends_on.nodes = ["source.my_project.raw.orders"] + m.sources = { + "source.my_project.raw.orders": { + name: "orders", source_name: "raw", resource_type: "source", + description: "Raw Shopify orders", schema: "raw_data", + columns: { order_id: { name: "order_id", data_type: "INTEGER" } }, + }, + } + const r = await generateDbtUnitTests({ manifest_path: writeManifestTo(tmp.path, m), model: "fct_orders" }) + expect(r.context?.upstream[0].ref).toBe("source('raw', 'orders')") + expect(r.context?.upstream[0].description).toBe("Raw Shopify orders") + }) +}) + +// --------------------------------------------------------------------------- +// Mock data type handling +// --------------------------------------------------------------------------- + +describe("mock data type handling", () => { + test("generates correct types for various columns", async () => { + await using tmp = await tmpdir() + const r = await generateDbtUnitTests({ + manifest_path: writeManifestTo(tmp.path, makeManifest({ + upstreamColumns: { + id: { name: "id", data_type: "INTEGER" }, + name: { name: "name", data_type: "VARCHAR" }, + active: { name: "active", data_type: "BOOLEAN" }, + dt: { name: "dt", data_type: "DATE" }, + ts: { name: "ts", data_type: "TIMESTAMP" }, + score: { name: "score", data_type: "FLOAT" }, + }, + })), + model: "fct_orders", + }) + const row = r.tests[0].given[0].rows[0] + expect(typeof row.id).toBe("number") + expect(Number.isInteger(row.id)).toBe(true) + expect(typeof row.name).toBe("string") + expect(typeof row.active).toBe("boolean") + expect(typeof row.dt).toBe("string") + expect(typeof row.ts).toBe("string") + }) + + test("null_edge scenario has nulls in non-key columns", async () => { + await using tmp = await tmpdir() + const r = await generateDbtUnitTests({ + manifest_path: writeManifestTo(tmp.path, makeManifest({ + compiledSql: `SELECT order_id, COALESCE(discount, 0) AS d FROM stg_orders`, + upstreamColumns: { + order_id: { name: "order_id", data_type: "INTEGER" }, + discount: { name: "discount", data_type: "NUMERIC" }, + }, + })), + model: "fct_orders", + }) + const nullTest = r.tests.find((t) => t.category === "null_handling") + if (nullTest) { + const lastRow = nullTest.given[0].rows[nullTest.given[0].rows.length - 1] + expect(lastRow.discount).toBeNull() + expect(lastRow.order_id).not.toBeNull() + } + }) +}) diff --git a/packages/opencode/test/mcp/oauth-browser.test.ts b/packages/opencode/test/mcp/oauth-browser.test.ts index 58644776fa..669523600e 100644 --- a/packages/opencode/test/mcp/oauth-browser.test.ts +++ b/packages/opencode/test/mcp/oauth-browser.test.ts @@ -188,8 +188,16 @@ test("BrowserOpenFailed event is NOT published when open() succeeds", async () = // Run authenticate with a timeout to avoid waiting forever for the callback const authPromise = MCP.authenticate("test-oauth-server-2").catch(() => undefined) - // The source code waits 500ms to detect browser-open failures. - // Allow enough time for that plus event propagation. + // Poll for open() to be called (instead of a fixed sleep, which is flaky + // on slow CI runners). The source code waits up to 500ms after calling + // open() to detect browser-open failures, so we need open() to be called + // AND the detection window to complete before asserting. + const pollStart = Date.now() + while (openCalledWith === undefined && Date.now() - pollStart < 5000) { + await new Promise((resolve) => setTimeout(resolve, 20)) + } + // Allow the 500ms detection window to elapse so any BrowserOpenFailed + // event would have fired by now. Add a small propagation buffer. await new Promise((resolve) => setTimeout(resolve, 600)) // Stop the callback server and cancel any pending auth