feat(ai): AI Testing Framework — consolidation staging branch [6/8 → master] by ianwhitedeveloper · Pull Request #411 · paralleldrive/riteway

ianwhitedeveloper · 2026-02-19T15:45:20Z

Context

This is the staging branch for the structured consolidation of draft PR #394 — the Riteway AI Testing Framework. Per Eric's consolidation request, PR #394 (80+ commits, 104 files, ~21K lines, ~60% docs/planning) is being decomposed into 8 small, focused PRs — one module per PR, in dependency order — each with functional requirements and unit tests, ruthlessly reviewed before merging here.

Current status: 6 of 8 PRs merged. PR 7 is in review; PR 8 is draft.

This branch is NOT ready to merge to master until all 8 PRs are merged and a final review passes.

Epic

Enable riteway ai <promptfile> — a CLI command that reads SudoLang test files, delegates execution to AI agents, and outputs results in TAP format. Treats prompts as first-class testable units, supporting configurable runs, pass thresholds, parallel execution, and rich TAP markdown output.

Full requirements: tasks/2026-01-22-riteway-ai-testing-framework.md

Why Not Cherry-Pick or Rebase PR #394?

80+ commits interleave multiple modules — no clean per-module slices
Duplicate commits from prior rebases make cherry-pick impractical
~60% of changed files are docs/planning that must stay out of production PRs
Circular dependency (ai-runner.js ↔ test-extractor.js) needed to be resolved first

Approach: Fresh branches from this consolidation base, copy files from the feature branch, fix WIP issues during consolidation, review each PR independently before merging here.

Dependency Graph (module architecture)

ai-errors.js  (leaf)       constants.js  (leaf)
    ↓                           ↓
agent-parser.js  ←  ai-errors      [note: debug-logger removed in PR 5 cleanup]
extraction-parser.js  ←  ai-errors
execute-agent.js  ←  ai-errors, agent-parser
aggregation.js  ←  ai-errors, constants
    ↓
agent-config.js  ←  ai-errors                 [PR 4]
validation.js  ←  ai-errors                   [PR 4]
    ↓
test-extractor.js  ←  execute-agent           [PR 5]
ai-runner.js  ←  all prior                    [PR 5]
    ↓
test-output.js                                [PR 6]
ai-command.js  ←  all prior                   [PR 6]
bin/riteway.js  (modifications)               [PR 6]
    ↓
e2e.test.js  +  fixtures  +  config           [PR 7]
    ↓
agent-config.js (outputFormat + registry)     [PR 8]
ai-init.js  ←  agent-config                   [PR 8]
bin/riteway.js  (ai init subcommand)          [PR 8]

No cycles. Every module has a colocated test file.

8-PR Progress

#	PR	Files	Status
1	Foundation — Error Types + Constants	`ai-errors.js`, `constants.js` + tests	✅ Merged (#407)
2	Utilities — Concurrency Limiter + TAP YAML	`limit-concurrency.js`, `tap-yaml.js` + tests (debug-logger added then removed in PR 5 cleanup)	✅ Merged (#408)
3	Parsers + Execute Agent	`agent-parser`, `extraction-parser`, `aggregation`, `execute-agent` + tests	✅ Merged (#409)
4	Config + Validation	`agent-config`, `validation` + tests + fixtures	✅ Merged (#410)
5	Test Extractor + Core Runner	`test-extractor`, `ai-runner` + tests; debug-logger removed across all modules	✅ Merged (#416)
6	Test Output + CLI Integration	`test-output`, `ai-command`, `bin/riteway` + tests	✅ Merged (#420)
7	E2E Tests + Fixtures + Config	`e2e.test.js`, fixtures, vitest config; `test-extractor.js` post-consolidation fixups	🔍 In review (#421)
8	`outputFormat` strategy + `riteway ai init`	`execute-agent`, `agent-config` (outputFormat + registry), `ai-init.js` (new), `bin/riteway` (init subcommand), README	📝 Draft (#423)

Current test count: 190 passing (PRs 1–6 merged). PR 7 adds 6 E2E tests (npm run test:e2e); PR 8 brings unit tests to 211.

WIP Issues From Original PR (13 total — all resolved)

#	Issue	Status
1	`for (const` loops in tests	✅ Zero instances — resolved
2	agent-config schema comment verbose	✅ Resolved in PR 4
3	Fixtures README outdated	✅ Resolved in PRs 7 & 8
4	`formatMedia` dead code	✅ Removed in PR 6
5	`test-output.js` dead call	✅ Removed with #4 in PR 6
6	Redundant test comments	✅ None found in PRs 1–6
7	`Try(() => fn(args))` syntax	✅ Valid — no change
8	ai-runner logger coupling	✅ debug-logger removed entirely in PR 5 cleanup
9	`unwrapRawEnvelope` duplication	✅ Resolved in PR 3 (shared `unwrapEnvelope`)
10	Cursor agent `--trust` flag	✅ Resolved in PR 4
11	Hardcoded defaults in tests	✅ Explicit per TDD rules
12	Error handling/Zod placement	✅ `z.prettifyError` inline in PR 5; `AgentConfig*` errors in PR 5
13	Re-exports in `test-extractor.js`	✅ Removed in PR 5

Architectural Questions (surfaced in PR 4 — both resolved in PR 8)

1. Built-in agent configs hardcode third-party CLI flags

Resolved: riteway ai init (PR 8) writes all built-in configs to riteway.agent-config.json. Teams who want stability own their config file. Library built-ins remain for first-run convenience.

2. parseOutput function can't live in a JSON config file

Resolved: PR 8 replaces parseOutput: fn with declarative outputFormat: 'json' | 'ndjson' | 'text' string in all agent configs and the agentConfigFileSchema. execute-agent.js maps format names to parsers via a lookup table. Config is now fully serializable.

Merge Plan

Each topic PR targets this branch (not master)
Agent + human review before each merge
When all 8 are merged here and tests are green: final review, then PR this → master

ericelliott · 2026-02-20T05:10:10Z

I'm okay with the strategies here.

* feat(ai): add error types, constants, and Zod schemas (PR 1/7) Foundation layer for the AI testing framework. Introduces structured error handling via error-causes and runtime-validated configuration constants via Zod schemas. Updates eslint ecmaVersion to 2022 to support numeric separators and optional chaining used throughout the framework source. Files: - source/ai-errors.js — named error types (ParseError, ValidationError, etc.) - source/ai-errors.test.js — full coverage for error descriptors and createError - source/constants.js — defaults, constraints, and Zod schemas - source/constants.test.js — 26 tests covering all schemas and boundaries - eslint.config.js — bump ecmaVersion 2017 → 2022 (prerequisite) - package.json — add error-causes and zod to production dependencies Co-authored-by: Cursor <cursoragent@cursor.com> * chore(config): bring working configs from feature branch Adds vitest.config.js e2e exclusion (source/e2e.test.js uses Riteway/Tape, not Vitest) alongside the eslint ecmaVersion 2022 bump already in place. Both changes are sourced from the working feature branch. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(ai): address PR 1 review findings - constants.js: lazy process.cwd() default (z.string().default(() => process.cwd())) prevents stale value when cwd changes after module load - constants.js: add concurrencyMax (50) to constraints + enforce in concurrencySchema - constants.js: remove JSDoc from internal constants (not public API) - constants.test.js: add full aiTestOptionsSchema coverage (valid input, missing filePath, empty filePath, invalid agent, lazy cwd default, optional agentConfigPath) - constants.test.js: add concurrencySchema upper-bound tests - ai-errors.test.js: replace for..of loops with test.each (one named test per case) - ai-errors.test.js: expand createError integration to cover two error types - ai-errors.test.js: replace typeof handleAIErrors check with behavioral routing tests - ai-errors.js: remove forward-reference comment (extraction-parser.js not yet in scope) - eslint.config.js: Object.assign -> spread operator Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>

… 2/7] (#408) * feat(ai): Utilities — Debug Logger, Concurrency Limiter, TAP YAML [PR 2/7] - Add createDebugLogger: console + file logging with buffer/flush - Add limitConcurrency: sliding-window async concurrency limiter - Add parseTAPYAML: parse judge agent TAP YAML diagnostic blocks - Add limit-concurrency.test.js (missing from PR #394) - Apply js.mdc cleanup: flush loop → single write, for-of → reduce pipeline - Replace @paralleldrive/cuid2 (not in deps) with mkdtempSync in debug-logger.test.js Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(ai): apply PR 2 review suggestions - Collapse formatMessage to concise arrow expression - Add comment to limit-concurrency for-of loop (justified async pattern) - Add flush no-op test when logFile is not configured - Use vi.useFakeTimers() in concurrency-cap test for determinism Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>

Add agent-parser, extraction-parser, aggregation, and execute-agent modules with full unit test coverage. - agent-parser: parseStringResult, parseOpenCodeNDJSON, unwrapEnvelope (new shared export), unwrapAgentResult. Shared unwrapEnvelope breaks duplication between agent-parser and execute-agent (WIP fix #9). - extraction-parser: parseExtractionResult with multi-strategy JSON parsing (direct, markdown fence, pre-parsed object), and resolveImportPaths for prompt file resolution. - aggregation: normalizeJudgment, calculateRequiredPasses, aggregatePerAssertionResults with Zod validation. - execute-agent: extracted from ai-runner.js to break the circular dependency (ai-runner ↔ test-extractor). Logger injected at executeAgent call site rather than created inside spawnProcess (WIP fix #8). Uses shared unwrapEnvelope from agent-parser. - Test files use test.each for all table-driven cases per convention. 164 tests pass, 0 lint errors, TypeScript checks pass. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(ai): address PR 3 code review findings - aggregation.js: validate once in aggregatePerAssertionResults — capture the Zod-validated result and compute Math.ceil inline, eliminating the redundant second schema parse inside calculateRequiredPasses - aggregation.js: remove misleading optional chaining (raw?.passed etc.) after the null-guard throw; use plain property access - agent-parser.js: replace acc.push() with [...acc, text] in reduce accumulator to prefer immutability per JS style guide - agent-parser.test.js: drop redundant "parsed object:" prefix from unwrapEnvelope test.each given fields; remove duplicate standalone "no result key" test that overlapped with test.each row - aggregation.test.js: remove redundant export-existence assertion for normalizeJudgment; add empty perAssertionResults edge case (vacuous truth — every() on [] returns true) - execute-agent.test.js: strengthen parseOutput test to verify stdout and logger are threaded through as expected (documents WIP fix #8) Co-authored-by: Cursor <cursoragent@cursor.com> * fix(ai): address PR 3 author review findings - aggregation.js: rename `raw` param to `judgeResponse` and fold into single options object for normalizeJudgment; removes the two-argument signature (breaking change, callers updated) - aggregation.js: remove calculateRequiredPasses — math is inlined in aggregatePerAssertionResults, eliminating double schema parse - aggregation.test.js: remove calculateRequiredPasses describe block; fix Try() usage (direct fn ref, not arrow wrapper); update all normalizeJudgment call sites to new single-options signature - execute-agent.js: extract magic number 500 to maxOutputPreviewLength constant (camelCase per javascript.mdc); applied to all 3 truncation sites - execute-agent.test.js: replace try/catch antipatterns with await Try(); add Try import from riteway.js - extraction-parser.test.js: strengthen weak typeof assertions to check specific fields; strengthen cause !== undefined to cause.name === SyntaxError 151 tests pass, 0 lint errors, TypeScript clean. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(ai): address PR 3 follow-up review findings - constants.js: rename calculateRequiredPassesSchema to aggregationParamsSchema — name now reflects what the schema validates (aggregation input params) rather than the deleted calculateRequiredPasses function; update all import sites - aggregation.test.js: add 6 missing Zod validation edge cases for aggregatePerAssertionResults (zero runs, negative runs, non-integer runs, NaN runs, negative threshold, NaN threshold) — coverage gap introduced when calculateRequiredPasses and its tests were removed; all cases now exercised via aggregatePerAssertionResults test.each 157 tests pass, 0 lint errors, TypeScript clean. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(test): complete PR review remediation 🐛 - Remove weak instanceof Error assertions 🔄 - Add threshold calculation verification tests Tests now verify threshold-based pass/fail logic directly 164 tests passing, 0 lint errors, TypeScript clean Co-authored-by: Ian White <ian.white.developer@gmail.com> * fix(ai): remove implementation detail from test - execute-agent.test.js: remove logger type assertion from parseOutput test — typeof checks violate tdd.mdc:64 and logger threading is an implementation detail; the three remaining assertions (call count, stdout arg, parsed result) collectively verify correct integration 164 tests pass, 0 lint errors, TypeScript clean. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>

- test(ai-errors): remove error-causes API tests; keep only handleAIErrors behavioral routing (ericelliott/janhesters #407) - test(constants): remove defaults/constraints value-only blocks; replace tautological expected: defaults.X with literals (ericelliott #407) - fix(debug-logger): rename writeToFile→bufferEntry, process→logProcess export; add logFile type guard; circular ref safety in formatMessage; command() rest params; improved JSDoc (janhesters #408) - test(debug-logger): onTestFinished for all teardown; add circular ref and logFile TypeError tests; flush no-op debug:false (janhesters #408) - fix(limit-concurrency): guard non-positive limit with RangeError; onTestFinished for fake timer teardown; document fail-fast (janhesters #408) - test(agent-parser): replace partial assertions with full expected values including ndjsonLength (janhesters #409) - test(extraction-parser): replace 4x multi-assert blocks with single full-object assertions (janhesters #409) Co-authored-by: Cursor <cursoragent@cursor.com> * test(agent-parser): use full expected values - Replace JSON.stringify comparisons with direct object assertions - Collapse 4 partial error.cause assertions into single full-object assert in parseOpenCodeNDJSON error test - Expand partial error.cause?.name assertion to full cause object in unwrapAgentResult error test Addresses Jan's PR #409 comment: deterministic functions should assert the complete expected value, not individual properties. Co-authored-by: Cursor <cursoragent@cursor.com> * test(ai): replace partial assertions with full expected values Per Jan's review: deterministic functions should assert the complete expected value, not individual properties. - extraction-parser: collapse ExtractionParseError and ExtractionValidationError cause assertions to full objects; comment .name usage (SyntaxError sets it as own property) - tap-yaml: consolidate per-property result asserts to full objects; collapse error cause to single full-object assert; remove redundant typeof score check - execute-agent: collapse AgentProcessError, TimeoutError, ParseError cause assertions to full objects; comment 3-deep .cause chain - aggregation: collapse ValidationError and ParseError test.each cause assertions to full objects; comment .constructor.name (ZodError does not set .name as own property); remove standalone normalizeJudgment ParseError test made redundant by test.each Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>

- agent-config: getAgentConfig() for claude/opencode/cursor agents - agent-config: loadAgentConfig() reads + validates JSON config files - validation: validateFilePath() guards against path traversal - validation: verifyAgentAuthentication() smoke-tests agent availability - fixtures: test-agent-config.json, invalid-agent-config.txt, no-command-agent-config.json WIP fixes applied: - #2: replace verbose schema JSDoc with single-line YAGNI comment - #10: add --trust flag to cursor agent args for non-interactive execution 182 tests passing (19 new: 10 agent-config + 9 validation). Co-authored-by: Cursor <cursoragent@cursor.com> * fix(ai): address PR 4 code review findings - add direct unit tests for formatZodError (4 cases, both code paths) - simplify parseJson: remove unnecessary currying → plain two-arg fn - remove spurious await on synchronous parseJson call - convert multi-line string concat to template literal in validation.js - rename misleading test: 'uses default timeout' → 'succeeds without explicit timeout argument' Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>

- ai-errors: export allNoop helper for exhaustive handleAIErrors tests - extraction-parser: spread ...ValidationError instead of bare name string - tap-yaml: spread ...ParseError instead of bare name string - constants.test: replace safeParse leakage with parse()+Try(); trim to behavioral tests only - aggregation.test: full-object assertions; fix duplicate test.each labels - extraction-parser.test: add resolveImportPaths success/error tests; non-object branch test - execute-agent.test: add spawn failure, malformed JSON fallback, ParseError envelope tests - ai-errors.test.js: removed; handleAIErrors routing already covered by agent-parser, aggregation, extraction-parser, and tap-yaml test suites --------- Co-authored-by: Cursor <cursoragent@cursor.com>

debug/logFile/logger params were never in the formal requirements, never exposed via the CLI, and never tested end-to-end. logFile was UAT scaffolding already broken in two places. Removing the abstraction simplifies every public signature and eliminates logger threading. - Delete debug-logger.js and debug-logger.test.js (−417 lines) - Drop debug/logFile/logger params from execute-agent, agent-parser, aggregation, extraction-parser, validation public signatures - Convert user-visible progress messages to console.log/console.warn - Delete internal diagnostic noise throughout - Remove debug/debugLog fields from constants.js defaults and schema - Extract truncateOutput helper in execute-agent.js (eliminates duplication) - Convert resolveImportPaths to named params { importPaths, projectRoot } - Replace manual zodError.issues mapping with z.prettifyError in aggregation - Full expected-value assertions and allNoop spread pattern in test files --------- Co-authored-by: Cursor <cursoragent@cursor.com>

- Add AgentConfigReadError, AgentConfigParseError, AgentConfigValidationError to ai-errors.js - Update agent-config.js to use specific AgentConfig* error types; z.prettifyError() inline - Update agent-config.test.js to use handleAIErrors routing pattern throughout - Add test-extractor.js: buildExtractionPrompt, buildResultPrompt, buildJudgePrompt, extractTests - Add ai-runner.js: runAITests, verifyAgentAuthentication - Add @paralleldrive/cuid2 dependency for hermetic test temp-dir naming --------- Co-authored-by: Cursor <cursoragent@cursor.com>

- Add source/test-output.js: formatTAP, recordTestOutput, openInBrowser with open package dependency - Add source/ai-command.js: parseAIArgs, runAICommand, formatAssertionReport; remove debug/debugLog params, use z.prettifyError instead of formatZodError - Update bin/riteway.js: add riteway ai <file> subcommand with exhaustive handleAIErrors (all 12 error types) - Add tests for all new modules (42 new tests) - Fix tap-yaml.js TS: JSDoc cast on reduce initial value - Remove formatMedia dead code (WIP #4/#5) - Remove generateLogFilePath (debug-logger removed in PR 5) Made-with: Cursor

ianwhitedeveloper mentioned this pull request Feb 21, 2026

fix(ai): Retroactive review remediation — PR 1-3 findings #412

Merged

ianwhitedeveloper and others added 6 commits February 25, 2026 14:39

ianwhitedeveloper force-pushed the ai-testing-framework-implementation-consolidation branch from a96a49d to 6c81837 Compare February 25, 2026 20:39

ianwhitedeveloper and others added 2 commits February 26, 2026 11:57

ianwhitedeveloper mentioned this pull request Feb 27, 2026

test(aidd-fix): add unit tests for /aidd-fix skill prompts paralleldrive/aidd#104

Open

ianwhitedeveloper changed the title ~~feat(ai): AI Testing Framework — consolidation staging branch [0/7 → master]~~ feat(ai): AI Testing Framework — consolidation staging branch [6/8 → master] Mar 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ai): AI Testing Framework — consolidation staging branch [6/8 → master]#411

feat(ai): AI Testing Framework — consolidation staging branch [6/8 → master]#411
ianwhitedeveloper wants to merge 9 commits intomasterfrom
ai-testing-framework-implementation-consolidation

ianwhitedeveloper commented Feb 19, 2026 •

edited

Loading

Uh oh!

ericelliott commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ianwhitedeveloper commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Epic

Why Not Cherry-Pick or Rebase PR #394?

Dependency Graph (module architecture)

8-PR Progress

WIP Issues From Original PR (13 total — all resolved)

Architectural Questions (surfaced in PR 4 — both resolved in PR 8)

Merge Plan

Uh oh!

ericelliott commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ianwhitedeveloper commented Feb 19, 2026 •

edited

Loading