Skip to content

feat: auto-push eval results to configurable git repo (needs design) #826

@christso

Description

@christso

Summary

Add a configuration option to automatically push eval result artifacts to a git repository after each eval run. The agent creates a PR (not auto-merge) so a human still reviews and merges.

Motivation

Currently eval results live in .agentv/results/runs/ locally and are lost unless manually committed. For reproducibility and historical comparison, results should be automatically pushed to a dedicated repo (e.g., EntityProcess/agentv-evals).

Similar to Entire.io's approach where evaluation artifacts are versioned and browsable.

Design

Configuration

In .agentv/config.yaml or agentv.config.yaml:

results:
  export:
    repo: EntityProcess/agentv-evals     # GitHub repo path
    path: autopilot-dev/runs             # Directory within the repo
    auto_push: true                      # Enable auto-push after each run
    branch_prefix: eval-results          # Branch naming prefix

Workflow

  1. After agentv eval run completes, if auto_push is enabled:
  2. Clone/fetch the target repo
  3. Create a branch: eval-results/<timestamp>
  4. Copy result artifacts to the configured path
  5. Commit with a structured message including eval summary (pass/fail counts, mean score)
  6. Push branch and create a draft PR with results summary in the body
  7. Human reviews and merges the PR

PR Format

feat(results): ad-explore claude-cli — 3/3 PASS (1.000)

## Results
| Test | Score | Status |
|---|---|---|
| discovers-existing-implementation | 1.000 | PASS |
| finds-all-consumers | 1.000 | PASS |
| structured-summary | 1.000 | PASS |

Run: 2026-03-29T01-15-06-826Z
Target: claude-cli
Eval: evals/autopilot-dev/ad-explore.eval.yaml

Acceptance Signals

  • config.yaml supports results.export section
  • After eval run, artifacts are pushed to configured repo as a PR
  • PR includes structured results summary
  • Human must merge — no auto-merge
  • Works with agentv eval run and agentv pipeline commands
  • Graceful fallback if repo is not accessible (warning, not error)

Non-Goals

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions