feat(ai): AI Testing Framework — consolidation staging branch [6/8 → master]#411
Draft
ianwhitedeveloper wants to merge 9 commits intomasterfrom
Draft
feat(ai): AI Testing Framework — consolidation staging branch [6/8 → master]#411ianwhitedeveloper wants to merge 9 commits intomasterfrom
ianwhitedeveloper wants to merge 9 commits intomasterfrom
Conversation
Collaborator
|
I'm okay with the strategies here. |
* feat(ai): add error types, constants, and Zod schemas (PR 1/7) Foundation layer for the AI testing framework. Introduces structured error handling via error-causes and runtime-validated configuration constants via Zod schemas. Updates eslint ecmaVersion to 2022 to support numeric separators and optional chaining used throughout the framework source. Files: - source/ai-errors.js — named error types (ParseError, ValidationError, etc.) - source/ai-errors.test.js — full coverage for error descriptors and createError - source/constants.js — defaults, constraints, and Zod schemas - source/constants.test.js — 26 tests covering all schemas and boundaries - eslint.config.js — bump ecmaVersion 2017 → 2022 (prerequisite) - package.json — add error-causes and zod to production dependencies Co-authored-by: Cursor <cursoragent@cursor.com> * chore(config): bring working configs from feature branch Adds vitest.config.js e2e exclusion (source/e2e.test.js uses Riteway/Tape, not Vitest) alongside the eslint ecmaVersion 2022 bump already in place. Both changes are sourced from the working feature branch. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(ai): address PR 1 review findings - constants.js: lazy process.cwd() default (z.string().default(() => process.cwd())) prevents stale value when cwd changes after module load - constants.js: add concurrencyMax (50) to constraints + enforce in concurrencySchema - constants.js: remove JSDoc from internal constants (not public API) - constants.test.js: add full aiTestOptionsSchema coverage (valid input, missing filePath, empty filePath, invalid agent, lazy cwd default, optional agentConfigPath) - constants.test.js: add concurrencySchema upper-bound tests - ai-errors.test.js: replace for..of loops with test.each (one named test per case) - ai-errors.test.js: expand createError integration to cover two error types - ai-errors.test.js: replace typeof handleAIErrors check with behavioral routing tests - ai-errors.js: remove forward-reference comment (extraction-parser.js not yet in scope) - eslint.config.js: Object.assign -> spread operator Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>
… 2/7] (#408) * feat(ai): Utilities — Debug Logger, Concurrency Limiter, TAP YAML [PR 2/7] - Add createDebugLogger: console + file logging with buffer/flush - Add limitConcurrency: sliding-window async concurrency limiter - Add parseTAPYAML: parse judge agent TAP YAML diagnostic blocks - Add limit-concurrency.test.js (missing from PR #394) - Apply js.mdc cleanup: flush loop → single write, for-of → reduce pipeline - Replace @paralleldrive/cuid2 (not in deps) with mkdtempSync in debug-logger.test.js Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(ai): apply PR 2 review suggestions - Collapse formatMessage to concise arrow expression - Add comment to limit-concurrency for-of loop (justified async pattern) - Add flush no-op test when logFile is not configured - Use vi.useFakeTimers() in concurrency-cap test for determinism Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>
Add agent-parser, extraction-parser, aggregation, and execute-agent modules with full unit test coverage. - agent-parser: parseStringResult, parseOpenCodeNDJSON, unwrapEnvelope (new shared export), unwrapAgentResult. Shared unwrapEnvelope breaks duplication between agent-parser and execute-agent (WIP fix #9). - extraction-parser: parseExtractionResult with multi-strategy JSON parsing (direct, markdown fence, pre-parsed object), and resolveImportPaths for prompt file resolution. - aggregation: normalizeJudgment, calculateRequiredPasses, aggregatePerAssertionResults with Zod validation. - execute-agent: extracted from ai-runner.js to break the circular dependency (ai-runner ↔ test-extractor). Logger injected at executeAgent call site rather than created inside spawnProcess (WIP fix #8). Uses shared unwrapEnvelope from agent-parser. - Test files use test.each for all table-driven cases per convention. 164 tests pass, 0 lint errors, TypeScript checks pass. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(ai): address PR 3 code review findings - aggregation.js: validate once in aggregatePerAssertionResults — capture the Zod-validated result and compute Math.ceil inline, eliminating the redundant second schema parse inside calculateRequiredPasses - aggregation.js: remove misleading optional chaining (raw?.passed etc.) after the null-guard throw; use plain property access - agent-parser.js: replace acc.push() with [...acc, text] in reduce accumulator to prefer immutability per JS style guide - agent-parser.test.js: drop redundant "parsed object:" prefix from unwrapEnvelope test.each given fields; remove duplicate standalone "no result key" test that overlapped with test.each row - aggregation.test.js: remove redundant export-existence assertion for normalizeJudgment; add empty perAssertionResults edge case (vacuous truth — every() on [] returns true) - execute-agent.test.js: strengthen parseOutput test to verify stdout and logger are threaded through as expected (documents WIP fix #8) Co-authored-by: Cursor <cursoragent@cursor.com> * fix(ai): address PR 3 author review findings - aggregation.js: rename `raw` param to `judgeResponse` and fold into single options object for normalizeJudgment; removes the two-argument signature (breaking change, callers updated) - aggregation.js: remove calculateRequiredPasses — math is inlined in aggregatePerAssertionResults, eliminating double schema parse - aggregation.test.js: remove calculateRequiredPasses describe block; fix Try() usage (direct fn ref, not arrow wrapper); update all normalizeJudgment call sites to new single-options signature - execute-agent.js: extract magic number 500 to maxOutputPreviewLength constant (camelCase per javascript.mdc); applied to all 3 truncation sites - execute-agent.test.js: replace try/catch antipatterns with await Try(); add Try import from riteway.js - extraction-parser.test.js: strengthen weak typeof assertions to check specific fields; strengthen cause !== undefined to cause.name === SyntaxError 151 tests pass, 0 lint errors, TypeScript clean. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(ai): address PR 3 follow-up review findings - constants.js: rename calculateRequiredPassesSchema to aggregationParamsSchema — name now reflects what the schema validates (aggregation input params) rather than the deleted calculateRequiredPasses function; update all import sites - aggregation.test.js: add 6 missing Zod validation edge cases for aggregatePerAssertionResults (zero runs, negative runs, non-integer runs, NaN runs, negative threshold, NaN threshold) — coverage gap introduced when calculateRequiredPasses and its tests were removed; all cases now exercised via aggregatePerAssertionResults test.each 157 tests pass, 0 lint errors, TypeScript clean. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(test): complete PR review remediation 🐛 - Remove weak instanceof Error assertions 🔄 - Add threshold calculation verification tests Tests now verify threshold-based pass/fail logic directly 164 tests passing, 0 lint errors, TypeScript clean Co-authored-by: Ian White <ian.white.developer@gmail.com> * fix(ai): remove implementation detail from test - execute-agent.test.js: remove logger type assertion from parseOutput test — typeof checks violate tdd.mdc:64 and logger threading is an implementation detail; the three remaining assertions (call count, stdout arg, parsed result) collectively verify correct integration 164 tests pass, 0 lint errors, TypeScript clean. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>
- test(ai-errors): remove error-causes API tests; keep only handleAIErrors behavioral routing (ericelliott/janhesters #407) - test(constants): remove defaults/constraints value-only blocks; replace tautological expected: defaults.X with literals (ericelliott #407) - fix(debug-logger): rename writeToFile→bufferEntry, process→logProcess export; add logFile type guard; circular ref safety in formatMessage; command() rest params; improved JSDoc (janhesters #408) - test(debug-logger): onTestFinished for all teardown; add circular ref and logFile TypeError tests; flush no-op debug:false (janhesters #408) - fix(limit-concurrency): guard non-positive limit with RangeError; onTestFinished for fake timer teardown; document fail-fast (janhesters #408) - test(agent-parser): replace partial assertions with full expected values including ndjsonLength (janhesters #409) - test(extraction-parser): replace 4x multi-assert blocks with single full-object assertions (janhesters #409) Co-authored-by: Cursor <cursoragent@cursor.com> * test(agent-parser): use full expected values - Replace JSON.stringify comparisons with direct object assertions - Collapse 4 partial error.cause assertions into single full-object assert in parseOpenCodeNDJSON error test - Expand partial error.cause?.name assertion to full cause object in unwrapAgentResult error test Addresses Jan's PR #409 comment: deterministic functions should assert the complete expected value, not individual properties. Co-authored-by: Cursor <cursoragent@cursor.com> * test(ai): replace partial assertions with full expected values Per Jan's review: deterministic functions should assert the complete expected value, not individual properties. - extraction-parser: collapse ExtractionParseError and ExtractionValidationError cause assertions to full objects; comment .name usage (SyntaxError sets it as own property) - tap-yaml: consolidate per-property result asserts to full objects; collapse error cause to single full-object assert; remove redundant typeof score check - execute-agent: collapse AgentProcessError, TimeoutError, ParseError cause assertions to full objects; comment 3-deep .cause chain - aggregation: collapse ValidationError and ParseError test.each cause assertions to full objects; comment .constructor.name (ZodError does not set .name as own property); remove standalone normalizeJudgment ParseError test made redundant by test.each Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>
- agent-config: getAgentConfig() for claude/opencode/cursor agents - agent-config: loadAgentConfig() reads + validates JSON config files - validation: validateFilePath() guards against path traversal - validation: verifyAgentAuthentication() smoke-tests agent availability - fixtures: test-agent-config.json, invalid-agent-config.txt, no-command-agent-config.json WIP fixes applied: - #2: replace verbose schema JSDoc with single-line YAGNI comment - #10: add --trust flag to cursor agent args for non-interactive execution 182 tests passing (19 new: 10 agent-config + 9 validation). Co-authored-by: Cursor <cursoragent@cursor.com> * fix(ai): address PR 4 code review findings - add direct unit tests for formatZodError (4 cases, both code paths) - simplify parseJson: remove unnecessary currying → plain two-arg fn - remove spurious await on synchronous parseJson call - convert multi-line string concat to template literal in validation.js - rename misleading test: 'uses default timeout' → 'succeeds without explicit timeout argument' Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>
- ai-errors: export allNoop helper for exhaustive handleAIErrors tests - extraction-parser: spread ...ValidationError instead of bare name string - tap-yaml: spread ...ParseError instead of bare name string - constants.test: replace safeParse leakage with parse()+Try(); trim to behavioral tests only - aggregation.test: full-object assertions; fix duplicate test.each labels - extraction-parser.test: add resolveImportPaths success/error tests; non-object branch test - execute-agent.test: add spawn failure, malformed JSON fallback, ParseError envelope tests - ai-errors.test.js: removed; handleAIErrors routing already covered by agent-parser, aggregation, extraction-parser, and tap-yaml test suites --------- Co-authored-by: Cursor <cursoragent@cursor.com>
a96a49d to
6c81837
Compare
debug/logFile/logger params were never in the formal requirements, never
exposed via the CLI, and never tested end-to-end. logFile was UAT
scaffolding already broken in two places. Removing the abstraction
simplifies every public signature and eliminates logger threading.
- Delete debug-logger.js and debug-logger.test.js (−417 lines)
- Drop debug/logFile/logger params from execute-agent, agent-parser,
aggregation, extraction-parser, validation public signatures
- Convert user-visible progress messages to console.log/console.warn
- Delete internal diagnostic noise throughout
- Remove debug/debugLog fields from constants.js defaults and schema
- Extract truncateOutput helper in execute-agent.js (eliminates duplication)
- Convert resolveImportPaths to named params { importPaths, projectRoot }
- Replace manual zodError.issues mapping with z.prettifyError in aggregation
- Full expected-value assertions and allNoop spread pattern in test files
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
- Add AgentConfigReadError, AgentConfigParseError, AgentConfigValidationError to ai-errors.js - Update agent-config.js to use specific AgentConfig* error types; z.prettifyError() inline - Update agent-config.test.js to use handleAIErrors routing pattern throughout - Add test-extractor.js: buildExtractionPrompt, buildResultPrompt, buildJudgePrompt, extractTests - Add ai-runner.js: runAITests, verifyAgentAuthentication - Add @paralleldrive/cuid2 dependency for hermetic test temp-dir naming --------- Co-authored-by: Cursor <cursoragent@cursor.com>
- Add source/test-output.js: formatTAP, recordTestOutput, openInBrowser with open package dependency - Add source/ai-command.js: parseAIArgs, runAICommand, formatAssertionReport; remove debug/debugLog params, use z.prettifyError instead of formatZodError - Update bin/riteway.js: add riteway ai <file> subcommand with exhaustive handleAIErrors (all 12 error types) - Add tests for all new modules (42 new tests) - Fix tap-yaml.js TS: JSDoc cast on reduce initial value - Remove formatMedia dead code (WIP #4/#5) - Remove generateLogFilePath (debug-logger removed in PR 5) Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
This is the staging branch for the structured consolidation of draft PR #394 — the Riteway AI Testing Framework. Per Eric's consolidation request, PR #394 (80+ commits, 104 files, ~21K lines, ~60% docs/planning) is being decomposed into 8 small, focused PRs — one module per PR, in dependency order — each with functional requirements and unit tests, ruthlessly reviewed before merging here.
Current status: 6 of 8 PRs merged. PR 7 is in review; PR 8 is draft.
This branch is NOT ready to merge to master until all 8 PRs are merged and a final review passes.
Epic
Enable
riteway ai <promptfile>— a CLI command that reads SudoLang test files, delegates execution to AI agents, and outputs results in TAP format. Treats prompts as first-class testable units, supporting configurable runs, pass thresholds, parallel execution, and rich TAP markdown output.Full requirements:
tasks/2026-01-22-riteway-ai-testing-framework.mdWhy Not Cherry-Pick or Rebase PR #394?
ai-runner.js↔test-extractor.js) needed to be resolved firstApproach: Fresh branches from this consolidation base, copy files from the feature branch, fix WIP issues during consolidation, review each PR independently before merging here.
Dependency Graph (module architecture)
No cycles. Every module has a colocated test file.
8-PR Progress
ai-errors.js,constants.js+ testslimit-concurrency.js,tap-yaml.js+ tests (debug-logger added then removed in PR 5 cleanup)agent-parser,extraction-parser,aggregation,execute-agent+ testsagent-config,validation+ tests + fixturestest-extractor,ai-runner+ tests; debug-logger removed across all modulestest-output,ai-command,bin/riteway+ testse2e.test.js, fixtures, vitest config;test-extractor.jspost-consolidation fixupsoutputFormatstrategy +riteway ai initexecute-agent,agent-config(outputFormat + registry),ai-init.js(new),bin/riteway(init subcommand), READMECurrent test count: 190 passing (PRs 1–6 merged). PR 7 adds 6 E2E tests (
npm run test:e2e); PR 8 brings unit tests to 211.WIP Issues From Original PR (13 total — all resolved)
for (constloops in testsformatMediadead codetest-output.jsdead callTry(() => fn(args))syntaxunwrapRawEnvelopeduplicationunwrapEnvelope)--trustflagz.prettifyErrorinline in PR 5;AgentConfig*errors in PR 5test-extractor.jsArchitectural Questions (surfaced in PR 4 — both resolved in PR 8)
1. Built-in agent configs hardcode third-party CLI flags
Resolved:
riteway ai init(PR 8) writes all built-in configs toriteway.agent-config.json. Teams who want stability own their config file. Library built-ins remain for first-run convenience.2.
parseOutputfunction can't live in a JSON config fileResolved: PR 8 replaces
parseOutput: fnwith declarativeoutputFormat: 'json' | 'ndjson' | 'text'string in all agent configs and theagentConfigFileSchema.execute-agent.jsmaps format names to parsers via a lookup table. Config is now fully serializable.Merge Plan