-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Objective
Add a github-actions provider kind that treats GitHub Actions as a remote compute backend for LLM invocations. API keys stay in GitHub Actions secrets — AgentV orchestrates dispatch and result collection.
Motivation
- Teams manage API keys centrally in GitHub secrets, not on developer machines
- CI/CD-driven eval workflows shouldn't require local API keys
- Natural audit trail via workflow run logs and git history
Proposed Design
Batch-oriented architecture
Rather than one workflow per prompt (~15-30s spin-up overhead each), the provider batches all eval prompts into a single workflow run:
- AgentV serializes prompts into a JSON artifact
- Triggers
workflow_dispatchon the runner repo - The GitHub Action runs all LLM calls in parallel using repo secrets
- Results are pushed to a results branch (e.g.
eval-results/<batch-id>) - AgentV pulls the results branch to collect JSONL output
Target configuration
targets:
- name: cloud-eval-runner
provider: github-actions
repo: myorg/eval-runner
workflow: run-evals.yml
results_branch: eval-results # or: results_artifact: true
poll_interval: 10s
timeout: 10mProvider implementation
New provider class implementing the existing Provider interface:
class GitHubActionsProvider implements Provider {
async invoke(request: ProviderRequest): Promise<ProviderResponse> { /* ... */ }
supportsBatch = true;
async invokeBatch(requests: ProviderRequest[]): Promise<ProviderResponse[]> {
// 1. Serialize requests to JSON
// 2. gh workflow run <workflow> -f batch_id=<id> -f prompts=<json>
// 3. Poll gh run list / gh run watch for completion
// 4. Download results (artifact or git pull)
// 5. Parse JSONL into ProviderResponse[]
}
}Runner-side workflow template
# .github/workflows/run-evals.yml
name: Run Eval Batch
on:
workflow_dispatch:
inputs:
batch_id:
required: true
prompts_artifact:
required: true
jobs:
run:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run LLM batch
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: node run-batch.js '${{ inputs.batch_id }}'
- name: Push results
run: |
git checkout -B eval-results/${{ inputs.batch_id }}
git add results/
git push origin eval-results/${{ inputs.batch_id }}Design Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Result delivery | Git push to branch | Simpler consumption, natural audit trail, no artifact expiry |
| Batch granularity | All prompts in one workflow | Avoids per-prompt spin-up overhead |
| Input delivery | Artifact reference for large batches | workflow_dispatch inputs have size limits |
| Auth | gh CLI (user's GitHub token) |
Simple; GitHub App token for CI environments |
Existing architecture support
- Provider interface — just implement
invoke()/invokeBatch()and register - Provider registry — factory pattern, add to
createBuiltinProviderRegistry() - Batch support —
supportsBatch+invokeBatch()already in the type system - Config schemas — add Zod schema to
targets.ts - No core changes needed — evaluation pipeline is provider-agnostic
Tradeoffs
Pros:
- API keys never leave GitHub Actions secrets
- Natural fit for team/org key management
- Audit trail via workflow logs + git commits
- Can use larger runners for heavy batches
Cons:
- ~15-30s workflow spin-up latency (acceptable for batch evals)
- Two-repo coordination (agentv config + runner repo)
- GitHub Actions minutes cost
- Large batch payloads need artifact workaround
Acceptance signals
-
github-actionsprovider kind registered and functional - Batch dispatch + result collection working end-to-end
- Reusable workflow template published for runner side
- Config schema with validation in
targets.ts - Works with
ghCLI auth (no extra token setup)
Non-goals
- Real-time streaming of individual prompt responses
- Supporting self-hosted runners (can be added later)
- Replacing local providers — this is an additional option
Research
Full proposal: agentevals-research/research/proposals/github-actions-provider.md
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels