Skip to content

feat: GitHub Actions provider for remote eval execution #860

@christso

Description

@christso

Objective

Add a github-actions provider kind that treats GitHub Actions as a remote compute backend for LLM invocations. API keys stay in GitHub Actions secrets — AgentV orchestrates dispatch and result collection.

Motivation

  • Teams manage API keys centrally in GitHub secrets, not on developer machines
  • CI/CD-driven eval workflows shouldn't require local API keys
  • Natural audit trail via workflow run logs and git history

Proposed Design

Batch-oriented architecture

Rather than one workflow per prompt (~15-30s spin-up overhead each), the provider batches all eval prompts into a single workflow run:

  1. AgentV serializes prompts into a JSON artifact
  2. Triggers workflow_dispatch on the runner repo
  3. The GitHub Action runs all LLM calls in parallel using repo secrets
  4. Results are pushed to a results branch (e.g. eval-results/<batch-id>)
  5. AgentV pulls the results branch to collect JSONL output

Target configuration

targets:
  - name: cloud-eval-runner
    provider: github-actions
    repo: myorg/eval-runner
    workflow: run-evals.yml
    results_branch: eval-results   # or: results_artifact: true
    poll_interval: 10s
    timeout: 10m

Provider implementation

New provider class implementing the existing Provider interface:

class GitHubActionsProvider implements Provider {
  async invoke(request: ProviderRequest): Promise<ProviderResponse> { /* ... */ }
  supportsBatch = true;
  async invokeBatch(requests: ProviderRequest[]): Promise<ProviderResponse[]> {
    // 1. Serialize requests to JSON
    // 2. gh workflow run <workflow> -f batch_id=<id> -f prompts=<json>
    // 3. Poll gh run list / gh run watch for completion
    // 4. Download results (artifact or git pull)
    // 5. Parse JSONL into ProviderResponse[]
  }
}

Runner-side workflow template

# .github/workflows/run-evals.yml
name: Run Eval Batch
on:
  workflow_dispatch:
    inputs:
      batch_id:
        required: true
      prompts_artifact:
        required: true

jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run LLM batch
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: node run-batch.js '${{ inputs.batch_id }}'
      - name: Push results
        run: |
          git checkout -B eval-results/${{ inputs.batch_id }}
          git add results/
          git push origin eval-results/${{ inputs.batch_id }}

Design Decisions

Decision Choice Rationale
Result delivery Git push to branch Simpler consumption, natural audit trail, no artifact expiry
Batch granularity All prompts in one workflow Avoids per-prompt spin-up overhead
Input delivery Artifact reference for large batches workflow_dispatch inputs have size limits
Auth gh CLI (user's GitHub token) Simple; GitHub App token for CI environments

Existing architecture support

  • Provider interface — just implement invoke() / invokeBatch() and register
  • Provider registry — factory pattern, add to createBuiltinProviderRegistry()
  • Batch supportsupportsBatch + invokeBatch() already in the type system
  • Config schemas — add Zod schema to targets.ts
  • No core changes needed — evaluation pipeline is provider-agnostic

Tradeoffs

Pros:

  • API keys never leave GitHub Actions secrets
  • Natural fit for team/org key management
  • Audit trail via workflow logs + git commits
  • Can use larger runners for heavy batches

Cons:

  • ~15-30s workflow spin-up latency (acceptable for batch evals)
  • Two-repo coordination (agentv config + runner repo)
  • GitHub Actions minutes cost
  • Large batch payloads need artifact workaround

Acceptance signals

  • github-actions provider kind registered and functional
  • Batch dispatch + result collection working end-to-end
  • Reusable workflow template published for runner side
  • Config schema with validation in targets.ts
  • Works with gh CLI auth (no extra token setup)

Non-goals

  • Real-time streaming of individual prompt responses
  • Supporting self-hosted runners (can be added later)
  • Replacing local providers — this is an additional option

Research

Full proposal: agentevals-research/research/proposals/github-actions-provider.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions