Skip to content

0.2.1b7#14

Merged
a11smiles merged 12 commits intomainfrom
build-deploy
Apr 9, 2026
Merged

0.2.1b7#14
a11smiles merged 12 commits intomainfrom
build-deploy

Conversation

@a11smiles
Copy link
Copy Markdown
Collaborator

@a11smiles a11smiles commented Apr 9, 2026

Build QA resilience and benchmark tooling

  • Fixed QA remediation re-entry bug where failed stages were skipped instead of retried on re-run
  • Added full stage retry: when QA remediation exhausts all attempts, the stage is cleaned and regenerated from scratch with prior QA findings injected (controlled by _MAX_FULL_STAGE_ATTEMPTS)
  • Hardened QA checklist: response_export_values mandatory on every resource, terraform output -state= flag check, UUID hex validation
  • Front-loaded cross-stage dependency no-dead-code directive in generation prompts
  • Added agent-level service filtering (stage_services on AgentContext) to reduce false positive anti-pattern warnings
  • Eliminated ReDoS risk in transform handlers by extracting shared _find_azapi_blocks() brace-counting helper
  • Updated PRU multiplier table from GitHub Copilot docs (6 new models including claude-opus-4.6-fast at 30 PRU)
  • Added version bump checklist to CLAUDE.md requiring PRU refresh
  • Benchmark run: 19 stages scored against 14 benchmarks, updated extract.py for full retry support, generated individual/overall HTML reports and PDF

a11smiles added 12 commits April 6, 2026 18:04
17 new tests in tests/stages/test_build_session_reentry.py covering:
- Validating stage with files → QA re-run
- Validating without layer field → QA re-run (Stage 16 bug)
- Validating QA pass → status advances
- Validating QA fail → build stops
- Validating QA pass → cascade downstream to pending
- App-layer validating → QA re-run
- Generating re-entry → artifact cleanup
- BuildState cascade_downstream_pending
- BuildState status transitions (mark_generating/validating/generated)
- BuildState get_pending/validating/generated queries

Bug fix: Branch C (no design changes) now checks for validating stages
in addition to pending. Previously, a restart with only validating
stages would say 'up to date' and skip generation entirely.
… push

Fix: renamed test_build_session_reentry.py → test_build_session.py
to match source file structure (one test file per source file).

tests/stages/test_policy_resolver.py (32 tests):
- Auto-accept, interactive accept/override/regenerate, mixed resolutions,
  fix instruction building, rule ID extraction

tests/stages/test_escalation.py (38 tests):
- 4-level escalation chain, auto-escalation timeout, blocker management,
  state persistence, report formatting

tests/stages/test_qa_router.py (24 tests):
- QA routing with early returns, diagnosis, token tracking, knowledge
  contribution, blocker recording, error text handling

tests/stages/test_backlog_push.py (56 tests):
- GitHub/DevOps push, auth checks, body formatting, parent linking,
  label handling, error paths
…/deploy stages, state persistence

tests/stages/test_deploy_helpers.py (55): CLI path resolution, deploy
  env, TF secret scanning, secret resolution, login, deployment context
tests/stages/test_knowledge_contributor.py (32): namespace resolution,
  gap detection, submission with label retry, auth checks
tests/stages/test_build_stage.py (17): guard validation, state
  transitions, reset, dry-run routing, template matching
tests/stages/test_deploy_stage.py (16): routing logic, state transitions
tests/stages/test_deploy_state.py (53): build state sync, orphan
  handling, legacy fallback, rollback ordering, audit logging
tests/stages/test_discovery_state.py (60): legacy migration, exchange
  updates, image stripping, item CRUD, context hash
tests/stages/test_backlog_state.py (31): item management, push status,
  context hash, conversation tracking

Total: 3448 tests passing across all tiers.
Coverage improvements:
- backlog_session.py: 63% → 99%
- design_stage.py: 67% → 89%
- discovery.py: 64% → 86%
- deploy_session.py: 77% → 87%
- build_session.py: 78% → 77% (largest file, needs more)

New test files:
- tests/stages/test_deploy_session.py (32 tests)
- tests/stages/test_design_stage.py (27 tests)
- tests/stages/test_discovery.py (36 tests)
- tests/stages/test_backlog_session.py (45 tests)
- tests/stages/test_build_session.py (+75 appended = 92 total)

TDD memory updated: tests satisfy business rules, not code.
Covers: policy regen path, review loop, fallback deployment plan,
PE stage injection, diff architectures, plan adjustment, transforms
debug logging, QA remediation writeback, stage advisory, execute
with retry/continuation, slash commands, DNS zone notes, deployment
plan derivation, design change branches, output key extraction,
affected stage identification, file content collection.

3787 tests passing. All 5 target files now above 85%.
Move 48 test files from tests/test_*.py to subdirectories that mirror
the source tree (tests/agents/, tests/ai/, tests/governance/, etc.).
Merge tests into existing mirrored files where both existed. Six root
files remain for root-level source modules (custom, telemetry, tracking,
debug_log, requirements). 3644 tests passing.
…cates

Migrated flat test files to 1:1 test-to-source directory structure,
merged split test files, and removed ~114 duplicate tests across 10 files.
…RU update

- Fix QA remediation re-entry bug: mark_stage_generated -> mark_stage_validating
  in remediation loop so failed stages are retried on re-run instead of skipped
- Add full stage retry (_MAX_FULL_STAGE_ATTEMPTS=2): when QA remediation exhausts
  all attempts, clean artifacts and regenerate from scratch with prior QA findings
  injected into the generation prompt
- Harden QA checklist: response_export_values mandatory on every azapi_resource,
  deploy.sh -state= flag check, UUID hex validation
- Front-load cross-stage dependency no-dead-code directive before architecture context
- Backfill PRU multiplier table from GitHub Copilot docs (raptor-mini, gemini-2.5-pro,
  gpt-5.2-codex, gpt-5.3-codex, claude-opus-4.6-fast at 30 PRU)
- 14 new tests (3 re-entry, 5 checklist/prompt, 6 full stage retry)
- Add stage_services field to AgentContext, populated by _agent_build_context()
  and passed through _apply_governance_check() to reduce false positive
  anti-pattern warnings for irrelevant service namespaces
- Extract _find_azapi_blocks() shared brace-counting helper and rewrite
  _add_response_export_values, _add_resource_group_parent_id, and
  _remove_private_endpoint_resources to eliminate nested-quantifier regex
- 5 new tests (2 service filtering, 3 brace counting safety)
- Updated extract.py to handle full stage retry (uses last task prompt per stage)
- Simplified INSTRUCTIONS.md extraction section to reference extract.py directly
- Individual run report: benchmarks/2026-04-08-14-40-57.html (19 stages, 14 benchmarks)
- Updated overall.html trends dashboard with run #2 data
- Generated PDF report with 29 charts (overall + 14 factor + 14 trend)
- Updated generate_pdf.py data section with new scores and two-point trend history
- Removed stale test run 2026-03-31-11-16-46.html
mock_run.return_value = self._mock_success(wi_id=10)
result = push_devops_feature("myorg", "myproj", {"title": "Infra Setup"})
assert result["id"] == 10
assert "dev.azure.com" in result["url"]

assert "az acr build" in script
assert "az containerapp update" in script
assert "myregistry.azurecr.io" in script
}
result = deploy_state.format_outputs()
assert "endpoint" in result
assert "https://app.com" in result
@a11smiles a11smiles merged commit afb8821 into main Apr 9, 2026
20 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants