0.2.1b7 by a11smiles · Pull Request #14 · Azure/az-prototype

a11smiles · 2026-04-09T14:10:07Z

Build QA resilience and benchmark tooling

Fixed QA remediation re-entry bug where failed stages were skipped instead of retried on re-run
Added full stage retry: when QA remediation exhausts all attempts, the stage is cleaned and regenerated from scratch with prior QA findings injected (controlled by _MAX_FULL_STAGE_ATTEMPTS)
Hardened QA checklist: response_export_values mandatory on every resource, terraform output -state= flag check, UUID hex validation
Front-loaded cross-stage dependency no-dead-code directive in generation prompts
Added agent-level service filtering (stage_services on AgentContext) to reduce false positive anti-pattern warnings
Eliminated ReDoS risk in transform handlers by extracting shared _find_azapi_blocks() brace-counting helper
Updated PRU multiplier table from GitHub Copilot docs (6 new models including claude-opus-4.6-fast at 30 PRU)
Added version bump checklist to CLAUDE.md requiring PRU refresh
Benchmark run: 19 stages scored against 14 benchmarks, updated extract.py for full retry support, generated individual/overall HTML reports and PDF

17 new tests in tests/stages/test_build_session_reentry.py covering: - Validating stage with files → QA re-run - Validating without layer field → QA re-run (Stage 16 bug) - Validating QA pass → status advances - Validating QA fail → build stops - Validating QA pass → cascade downstream to pending - App-layer validating → QA re-run - Generating re-entry → artifact cleanup - BuildState cascade_downstream_pending - BuildState status transitions (mark_generating/validating/generated) - BuildState get_pending/validating/generated queries Bug fix: Branch C (no design changes) now checks for validating stages in addition to pending. Previously, a restart with only validating stages would say 'up to date' and skip generation entirely.

… push Fix: renamed test_build_session_reentry.py → test_build_session.py to match source file structure (one test file per source file). tests/stages/test_policy_resolver.py (32 tests): - Auto-accept, interactive accept/override/regenerate, mixed resolutions, fix instruction building, rule ID extraction tests/stages/test_escalation.py (38 tests): - 4-level escalation chain, auto-escalation timeout, blocker management, state persistence, report formatting tests/stages/test_qa_router.py (24 tests): - QA routing with early returns, diagnosis, token tracking, knowledge contribution, blocker recording, error text handling tests/stages/test_backlog_push.py (56 tests): - GitHub/DevOps push, auth checks, body formatting, parent linking, label handling, error paths

…/deploy stages, state persistence tests/stages/test_deploy_helpers.py (55): CLI path resolution, deploy env, TF secret scanning, secret resolution, login, deployment context tests/stages/test_knowledge_contributor.py (32): namespace resolution, gap detection, submission with label retry, auth checks tests/stages/test_build_stage.py (17): guard validation, state transitions, reset, dry-run routing, template matching tests/stages/test_deploy_stage.py (16): routing logic, state transitions tests/stages/test_deploy_state.py (53): build state sync, orphan handling, legacy fallback, rollback ordering, audit logging tests/stages/test_discovery_state.py (60): legacy migration, exchange updates, image stripping, item CRUD, context hash tests/stages/test_backlog_state.py (31): item management, push status, context hash, conversation tracking Total: 3448 tests passing across all tiers.

Coverage improvements: - backlog_session.py: 63% → 99% - design_stage.py: 67% → 89% - discovery.py: 64% → 86% - deploy_session.py: 77% → 87% - build_session.py: 78% → 77% (largest file, needs more) New test files: - tests/stages/test_deploy_session.py (32 tests) - tests/stages/test_design_stage.py (27 tests) - tests/stages/test_discovery.py (36 tests) - tests/stages/test_backlog_session.py (45 tests) - tests/stages/test_build_session.py (+75 appended = 92 total) TDD memory updated: tests satisfy business rules, not code.

Covers: policy regen path, review loop, fallback deployment plan, PE stage injection, diff architectures, plan adjustment, transforms debug logging, QA remediation writeback, stage advisory, execute with retry/continuation, slash commands, DNS zone notes, deployment plan derivation, design change branches, output key extraction, affected stage identification, file content collection. 3787 tests passing. All 5 target files now above 85%.

Move 48 test files from tests/test_*.py to subdirectories that mirror the source tree (tests/agents/, tests/ai/, tests/governance/, etc.). Merge tests into existing mirrored files where both existed. Six root files remain for root-level source modules (custom, telemetry, tracking, debug_log, requirements). 3644 tests passing.

…cates Migrated flat test files to 1:1 test-to-source directory structure, merged split test files, and removed ~114 duplicate tests across 10 files.

…RU update - Fix QA remediation re-entry bug: mark_stage_generated -> mark_stage_validating in remediation loop so failed stages are retried on re-run instead of skipped - Add full stage retry (_MAX_FULL_STAGE_ATTEMPTS=2): when QA remediation exhausts all attempts, clean artifacts and regenerate from scratch with prior QA findings injected into the generation prompt - Harden QA checklist: response_export_values mandatory on every azapi_resource, deploy.sh -state= flag check, UUID hex validation - Front-load cross-stage dependency no-dead-code directive before architecture context - Backfill PRU multiplier table from GitHub Copilot docs (raptor-mini, gemini-2.5-pro, gpt-5.2-codex, gpt-5.3-codex, claude-opus-4.6-fast at 30 PRU) - 14 new tests (3 re-entry, 5 checklist/prompt, 6 full stage retry)

- Add stage_services field to AgentContext, populated by _agent_build_context() and passed through _apply_governance_check() to reduce false positive anti-pattern warnings for irrelevant service namespaces - Extract _find_azapi_blocks() shared brace-counting helper and rewrite _add_response_export_values, _add_resource_group_parent_id, and _remove_private_endpoint_resources to eliminate nested-quantifier regex - 5 new tests (2 service filtering, 3 brace counting safety)

- Updated extract.py to handle full stage retry (uses last task prompt per stage) - Simplified INSTRUCTIONS.md extraction section to reference extract.py directly - Individual run report: benchmarks/2026-04-08-14-40-57.html (19 stages, 14 benchmarks) - Updated overall.html trends dashboard with run #2 data - Generated PDF report with 29 charts (overall + 14 factor + 14 trend) - Updated generate_pdf.py data section with new scores and two-point trend history - Removed stale test run 2026-03-31-11-16-46.html

tests/stages/test_backlog_push.py

+        mock_run.return_value = self._mock_success(wi_id=10)
+        result = push_devops_feature("myorg", "myproj", {"title": "Infra Setup"})
+        assert result["id"] == 10
+        assert "dev.azure.com" in result["url"]


tests/stages/test_deploy_helpers.py

+
+        assert "az acr build" in script
+        assert "az containerapp update" in script
+        assert "myregistry.azurecr.io" in script


tests/stages/test_deploy_state.py

+        }
+        result = deploy_state.format_outputs()
+        assert "endpoint" in result
+        assert "https://app.com" in result


a11smiles added 12 commits April 6, 2026 18:04

Consolidate test suite: migrate to mirrored directories, remove dupli…

597f9ed

…cates Migrated flat test files to 1:1 test-to-source directory structure, merged split test files, and removed ~114 duplicate tests across 10 files.

Fix deploy guard test: patch check_az_login at import site, not source

15f02e1

Expound on full stage retry rationale in HISTORY.rst

05dccb3

github-advanced-security bot found potential problems Apr 9, 2026

View reviewed changes

a11smiles merged commit afb8821 into main Apr 9, 2026
20 of 21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.2.1b7#14

0.2.1b7#14
a11smiles merged 12 commits intomainfrom
build-deploy

a11smiles commented Apr 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

a11smiles commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Build QA resilience and benchmark tooling

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

a11smiles commented Apr 9, 2026 •

edited

Loading