Epic: Testing Infrastructure & Strategy Overhaul

# Epic: Testing Infrastructure & Strategy Overhaul

Agreed outcomes from [Discussion #711](https://github.com/generative-computing/mellea/discussions/711) and the 2026-03-23 planning call with @ajbozarth, @planetf1, @jakelorocco, and @avinash2692. cc @psschwei for further planning, @avinash2692 regarding Bluevela nightlies.

## Key Decisions

**Two-dimensional marker taxonomy** — granularity (`unit`, `integration`, `e2e`, `qualitative`) x backend (`ollama`, `huggingface`, `vllm`, `openai`, `watsonx`, `litellm`, etc.), plus resource markers (`requires_gpu`, `requires_heavy_ram`, `requires_gpu_isolation`).

| Tier | Trigger | Budget | What runs |
| ---- | ------- | ------ | --------- |
| **Pre-commit** | Every commit | <60s | Lint + type checking only |
| **Local dev** | Ad-hoc | <5 min | All tests matching available backends/resources |
| **PR CI** | Every push | <15 min | Unit + integration + Ollama e2e |
| **Nightly CI** | Scheduled | ~60 min | Every test, no exceptions (Bluevela, full GPU) |
| **Pre-release** | Manual | ~90 min | Manual trigger of nightly suite |

**Principles:** split e2e into integration + e2e pairs (don't just downgrade); parametrise across backends; fix root causes over workarounds; catalog minimal default models with overrides; scope covers both tests and examples; docs updated with every change.

## Work Items

| # | Issue | Summary |
| - | ----- | ------- |
| 1a | #727 | Granularity marker taxonomy and tiered timeouts |
| 1b | #728 | Backend & resource marker audit (children: #622, #539, #629, #634) |
| 2a | #729 | Split e2e tests into integration + e2e pairs |
| 2b | #730 | Parametrise and consolidate backend-specific tests |
| 3a | #731 | Environment diagnostic, pre-flight checks & reporting (children: #574, #349) |
| 3b | #732 | Model consolidation and flexibility (children: #359) |
| 4 | #733 | CI parallelisation and dynamic test selection (see also #451) |
| 5 | #734 | On-demand nightly test runs for PRs |
| 6 | #735 | Semantic assertions & recording for qualitative tests (children: #692) |
| 7 | #736 | Backend resource cleanup post-PR #721 |
| 8 | #737 | Test results & coverage reporting |
| 9 | #738 | Notebook testing (children: #89) |
| 10 | #739 | Pre-commit & type checking (children: #456) |

## Related Issues

**Expected to close with PR #721** (`cleanup_gpu_backend()`): #630, #625, #620, #699. Residual cleanup tracked in #736.

**Flaky tests — addressed by #735** (semantic assertions): #398, #384, #628, #684, #121.

**Not in scope:** #691, #496, #347, #267 — remain standalone.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epic: Testing Infrastructure & Strategy Overhaul #726

Epic: Testing Infrastructure & Strategy Overhaul

Key Decisions

Work Items

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tier	Trigger	Budget	What runs
Pre-commit	Every commit	<60s	Lint + type checking only
Local dev	Ad-hoc	<5 min	All tests matching available backends/resources
PR CI	Every push	<15 min	Unit + integration + Ollama e2e
Nightly CI	Scheduled	~60 min	Every test, no exceptions (Bluevela, full GPU)
Pre-release	Manual	~90 min	Manual trigger of nightly suite

#	Issue	Summary
1a	#727	Granularity marker taxonomy and tiered timeouts
1b	#728	Backend & resource marker audit (children: #622, #539, #629, #634)
2a	#729	Split e2e tests into integration + e2e pairs
2b	#730	Parametrise and consolidate backend-specific tests
3a	#731	Environment diagnostic, pre-flight checks & reporting (children: #574, #349)
3b	#732	Model consolidation and flexibility (children: #359)
4	#733	CI parallelisation and dynamic test selection (see also #451)
5	#734	On-demand nightly test runs for PRs
6	#735	Semantic assertions & recording for qualitative tests (children: #692)
7	#736	Backend resource cleanup post-PR #721
8	#737	Test results & coverage reporting
9	#738	Notebook testing (children: #89)
10	#739	Pre-commit & type checking (children: #456)

Epic: Testing Infrastructure & Strategy Overhaul #726

Description

Epic: Testing Infrastructure & Strategy Overhaul

Key Decisions

Work Items

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions