Merged
Conversation
17 new tests in tests/stages/test_build_session_reentry.py covering: - Validating stage with files → QA re-run - Validating without layer field → QA re-run (Stage 16 bug) - Validating QA pass → status advances - Validating QA fail → build stops - Validating QA pass → cascade downstream to pending - App-layer validating → QA re-run - Generating re-entry → artifact cleanup - BuildState cascade_downstream_pending - BuildState status transitions (mark_generating/validating/generated) - BuildState get_pending/validating/generated queries Bug fix: Branch C (no design changes) now checks for validating stages in addition to pending. Previously, a restart with only validating stages would say 'up to date' and skip generation entirely.
… push Fix: renamed test_build_session_reentry.py → test_build_session.py to match source file structure (one test file per source file). tests/stages/test_policy_resolver.py (32 tests): - Auto-accept, interactive accept/override/regenerate, mixed resolutions, fix instruction building, rule ID extraction tests/stages/test_escalation.py (38 tests): - 4-level escalation chain, auto-escalation timeout, blocker management, state persistence, report formatting tests/stages/test_qa_router.py (24 tests): - QA routing with early returns, diagnosis, token tracking, knowledge contribution, blocker recording, error text handling tests/stages/test_backlog_push.py (56 tests): - GitHub/DevOps push, auth checks, body formatting, parent linking, label handling, error paths
…/deploy stages, state persistence tests/stages/test_deploy_helpers.py (55): CLI path resolution, deploy env, TF secret scanning, secret resolution, login, deployment context tests/stages/test_knowledge_contributor.py (32): namespace resolution, gap detection, submission with label retry, auth checks tests/stages/test_build_stage.py (17): guard validation, state transitions, reset, dry-run routing, template matching tests/stages/test_deploy_stage.py (16): routing logic, state transitions tests/stages/test_deploy_state.py (53): build state sync, orphan handling, legacy fallback, rollback ordering, audit logging tests/stages/test_discovery_state.py (60): legacy migration, exchange updates, image stripping, item CRUD, context hash tests/stages/test_backlog_state.py (31): item management, push status, context hash, conversation tracking Total: 3448 tests passing across all tiers.
Coverage improvements: - backlog_session.py: 63% → 99% - design_stage.py: 67% → 89% - discovery.py: 64% → 86% - deploy_session.py: 77% → 87% - build_session.py: 78% → 77% (largest file, needs more) New test files: - tests/stages/test_deploy_session.py (32 tests) - tests/stages/test_design_stage.py (27 tests) - tests/stages/test_discovery.py (36 tests) - tests/stages/test_backlog_session.py (45 tests) - tests/stages/test_build_session.py (+75 appended = 92 total) TDD memory updated: tests satisfy business rules, not code.
Covers: policy regen path, review loop, fallback deployment plan, PE stage injection, diff architectures, plan adjustment, transforms debug logging, QA remediation writeback, stage advisory, execute with retry/continuation, slash commands, DNS zone notes, deployment plan derivation, design change branches, output key extraction, affected stage identification, file content collection. 3787 tests passing. All 5 target files now above 85%.
Move 48 test files from tests/test_*.py to subdirectories that mirror the source tree (tests/agents/, tests/ai/, tests/governance/, etc.). Merge tests into existing mirrored files where both existed. Six root files remain for root-level source modules (custom, telemetry, tracking, debug_log, requirements). 3644 tests passing.
…cates Migrated flat test files to 1:1 test-to-source directory structure, merged split test files, and removed ~114 duplicate tests across 10 files.
…RU update - Fix QA remediation re-entry bug: mark_stage_generated -> mark_stage_validating in remediation loop so failed stages are retried on re-run instead of skipped - Add full stage retry (_MAX_FULL_STAGE_ATTEMPTS=2): when QA remediation exhausts all attempts, clean artifacts and regenerate from scratch with prior QA findings injected into the generation prompt - Harden QA checklist: response_export_values mandatory on every azapi_resource, deploy.sh -state= flag check, UUID hex validation - Front-load cross-stage dependency no-dead-code directive before architecture context - Backfill PRU multiplier table from GitHub Copilot docs (raptor-mini, gemini-2.5-pro, gpt-5.2-codex, gpt-5.3-codex, claude-opus-4.6-fast at 30 PRU) - 14 new tests (3 re-entry, 5 checklist/prompt, 6 full stage retry)
- Add stage_services field to AgentContext, populated by _agent_build_context() and passed through _apply_governance_check() to reduce false positive anti-pattern warnings for irrelevant service namespaces - Extract _find_azapi_blocks() shared brace-counting helper and rewrite _add_response_export_values, _add_resource_group_parent_id, and _remove_private_endpoint_resources to eliminate nested-quantifier regex - 5 new tests (2 service filtering, 3 brace counting safety)
- Updated extract.py to handle full stage retry (uses last task prompt per stage) - Simplified INSTRUCTIONS.md extraction section to reference extract.py directly - Individual run report: benchmarks/2026-04-08-14-40-57.html (19 stages, 14 benchmarks) - Updated overall.html trends dashboard with run #2 data - Generated PDF report with 29 charts (overall + 14 factor + 14 trend) - Updated generate_pdf.py data section with new scores and two-point trend history - Removed stale test run 2026-03-31-11-16-46.html
| mock_run.return_value = self._mock_success(wi_id=10) | ||
| result = push_devops_feature("myorg", "myproj", {"title": "Infra Setup"}) | ||
| assert result["id"] == 10 | ||
| assert "dev.azure.com" in result["url"] |
|
|
||
| assert "az acr build" in script | ||
| assert "az containerapp update" in script | ||
| assert "myregistry.azurecr.io" in script |
| } | ||
| result = deploy_state.format_outputs() | ||
| assert "endpoint" in result | ||
| assert "https://app.com" in result |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Build QA resilience and benchmark tooling