feat: GitHub Actions provider for remote eval execution

## Objective

Add a `github-actions` provider kind that treats GitHub Actions as a remote compute backend for LLM invocations. API keys stay in GitHub Actions secrets — AgentV orchestrates dispatch and result collection.

## Motivation

- Teams manage API keys centrally in GitHub secrets, not on developer machines
- CI/CD-driven eval workflows shouldn't require local API keys
- Natural audit trail via workflow run logs and git history

## Proposed Design

### Batch-oriented architecture

Rather than one workflow per prompt (~15-30s spin-up overhead each), the provider batches all eval prompts into a **single workflow run**:

1. AgentV serializes prompts into a JSON artifact
2. Triggers `workflow_dispatch` on the runner repo
3. The GitHub Action runs all LLM calls in parallel using repo secrets
4. Results are pushed to a results branch (e.g. `eval-results/<batch-id>`)
5. AgentV pulls the results branch to collect JSONL output

### Target configuration

```yaml
targets:
  - name: cloud-eval-runner
    provider: github-actions
    repo: myorg/eval-runner
    workflow: run-evals.yml
    results_branch: eval-results   # or: results_artifact: true
    poll_interval: 10s
    timeout: 10m
```

### Provider implementation

New provider class implementing the existing `Provider` interface:

```typescript
class GitHubActionsProvider implements Provider {
  async invoke(request: ProviderRequest): Promise<ProviderResponse> { /* ... */ }
  supportsBatch = true;
  async invokeBatch(requests: ProviderRequest[]): Promise<ProviderResponse[]> {
    // 1. Serialize requests to JSON
    // 2. gh workflow run <workflow> -f batch_id=<id> -f prompts=<json>
    // 3. Poll gh run list / gh run watch for completion
    // 4. Download results (artifact or git pull)
    // 5. Parse JSONL into ProviderResponse[]
  }
}
```

### Runner-side workflow template

```yaml
# .github/workflows/run-evals.yml
name: Run Eval Batch
on:
  workflow_dispatch:
    inputs:
      batch_id:
        required: true
      prompts_artifact:
        required: true

jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run LLM batch
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: node run-batch.js '${{ inputs.batch_id }}'
      - name: Push results
        run: |
          git checkout -B eval-results/${{ inputs.batch_id }}
          git add results/
          git push origin eval-results/${{ inputs.batch_id }}
```

## Design Decisions

| Decision | Choice | Rationale |
|----------|--------|-----------|
| Result delivery | Git push to branch | Simpler consumption, natural audit trail, no artifact expiry |
| Batch granularity | All prompts in one workflow | Avoids per-prompt spin-up overhead |
| Input delivery | Artifact reference for large batches | `workflow_dispatch` inputs have size limits |
| Auth | `gh` CLI (user's GitHub token) | Simple; GitHub App token for CI environments |

## Existing architecture support

- **Provider interface** — just implement `invoke()` / `invokeBatch()` and register
- **Provider registry** — factory pattern, add to `createBuiltinProviderRegistry()`
- **Batch support** — `supportsBatch` + `invokeBatch()` already in the type system
- **Config schemas** — add Zod schema to `targets.ts`
- **No core changes needed** — evaluation pipeline is provider-agnostic

## Tradeoffs

**Pros:**
- API keys never leave GitHub Actions secrets
- Natural fit for team/org key management
- Audit trail via workflow logs + git commits
- Can use larger runners for heavy batches

**Cons:**
- ~15-30s workflow spin-up latency (acceptable for batch evals)
- Two-repo coordination (agentv config + runner repo)
- GitHub Actions minutes cost
- Large batch payloads need artifact workaround

## Acceptance signals

- [ ] `github-actions` provider kind registered and functional
- [ ] Batch dispatch + result collection working end-to-end
- [ ] Reusable workflow template published for runner side
- [ ] Config schema with validation in `targets.ts`
- [ ] Works with `gh` CLI auth (no extra token setup)

## Non-goals

- Real-time streaming of individual prompt responses
- Supporting self-hosted runners (can be added later)
- Replacing local providers — this is an additional option

## Research

Full proposal: [agentevals-research/research/proposals/github-actions-provider.md](https://github.com/agentevals/agentevals-research/blob/main/research/proposals/github-actions-provider.md)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: GitHub Actions provider for remote eval execution #860

Objective

Motivation

Proposed Design

Batch-oriented architecture

Target configuration

Provider implementation

Runner-side workflow template

Design Decisions

Existing architecture support

Tradeoffs

Acceptance signals

Non-goals

Research

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Decision	Choice	Rationale
Result delivery	Git push to branch	Simpler consumption, natural audit trail, no artifact expiry
Batch granularity	All prompts in one workflow	Avoids per-prompt spin-up overhead
Input delivery	Artifact reference for large batches	`workflow_dispatch` inputs have size limits
Auth	`gh` CLI (user's GitHub token)	Simple; GitHub App token for CI environments

feat: GitHub Actions provider for remote eval execution #860

Description

Objective

Motivation

Proposed Design

Batch-oriented architecture

Target configuration

Provider implementation

Runner-side workflow template

Design Decisions

Existing architecture support

Tradeoffs

Acceptance signals

Non-goals

Research

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions