fix(framework): split Output TypeVar into Output and Expected#243
Merged
Abhijeet Prasad (AbhiPrasad) merged 3 commits intomainfrom Apr 10, 2026
Merged
fix(framework): split Output TypeVar into Output and Expected#243Abhijeet Prasad (AbhiPrasad) merged 3 commits intomainfrom
Output TypeVar into Output and Expected#243Abhijeet Prasad (AbhiPrasad) merged 3 commits intomainfrom
Conversation
Andrew Kent (realark)
approved these changes
Apr 9, 2026
…240) The `Eval` generic `Output` parameter was shared across three positions: task return type, `EvalCase.expected`, and scorer args. When the expected data type differs from the task output (e.g. assertion specs vs model output), type checkers reject the call because `Output` can't unify. Introduce a separate `Expected` TypeVar so `data` binds `Expected` and `task` binds `Output` independently. Add a `test_types` nox session that runs pyright, mypy, and pytest on `py/src/braintrust/type_tests/`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pin to pyright==1.1.408 and mypy==1.20.0 to avoid flaky CI from upstream type checker releases introducing stricter checks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
9747351 to
dbe6a10
Compare
Evaluator is now Generic[Input, Output, Expected] after the TypeVar split. Python 3.10 enforces generic param counts at runtime, so the 2-param Evaluator[Any, Any] in server.py caused a TypeError on import. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
342a58f to
e619a9f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
resolves #240
The
EvalgenericOutputparameter was shared across three positions: task return type,EvalCase.expected, and scorer args. When the expected data type differs from the task output (e.g. assertion specs vs model output), type checkers reject the call becauseOutputcan't unify.Introduce a separate
ExpectedTypeVar sodatabindsExpectedandtaskbindsOutputindependently. Add atest_typesnox session that runs pyright, mypy, and pytest onpy/src/braintrust/type_tests/.