-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Summary
Add a configuration option to automatically push eval result artifacts to a git repository after each eval run. The agent creates a PR (not auto-merge) so a human still reviews and merges.
Motivation
Currently eval results live in .agentv/results/runs/ locally and are lost unless manually committed. For reproducibility and historical comparison, results should be automatically pushed to a dedicated repo (e.g., EntityProcess/agentv-evals).
Similar to Entire.io's approach where evaluation artifacts are versioned and browsable.
Design
Configuration
In .agentv/config.yaml or agentv.config.yaml:
results:
export:
repo: EntityProcess/agentv-evals # GitHub repo path
path: autopilot-dev/runs # Directory within the repo
auto_push: true # Enable auto-push after each run
branch_prefix: eval-results # Branch naming prefixWorkflow
- After
agentv eval runcompletes, ifauto_pushis enabled: - Clone/fetch the target repo
- Create a branch:
eval-results/<timestamp> - Copy result artifacts to the configured path
- Commit with a structured message including eval summary (pass/fail counts, mean score)
- Push branch and create a draft PR with results summary in the body
- Human reviews and merges the PR
PR Format
feat(results): ad-explore claude-cli — 3/3 PASS (1.000)
## Results
| Test | Score | Status |
|---|---|---|
| discovers-existing-implementation | 1.000 | PASS |
| finds-all-consumers | 1.000 | PASS |
| structured-summary | 1.000 | PASS |
Run: 2026-03-29T01-15-06-826Z
Target: claude-cli
Eval: evals/autopilot-dev/ad-explore.eval.yaml
Acceptance Signals
-
config.yamlsupportsresults.exportsection - After eval run, artifacts are pushed to configured repo as a PR
- PR includes structured results summary
- Human must merge — no auto-merge
- Works with
agentv eval runandagentv pipelinecommands - Graceful fallback if repo is not accessible (warning, not error)
Non-Goals
- Auto-merging PRs (human review required)
- Real-time streaming of results
- Dashboard integration (separate concern — feat: AgentV Studio — eval management platform with historical trends, quality gates, and orchestration #563)
Related
- EntityProcess/agentv-evals — current manual results repo
- feat: AgentV Studio — eval management platform with historical trends, quality gates, and orchestration #563 — Studio eval management platform
- refactor(pipeline): align output paths, drop target from path, rename to subagent mode #801 — artifact structure standardization
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Backlog