From aacb67ca93357eaefd37712eb9648810716b172f Mon Sep 17 00:00:00 2001 From: Diego Mauricio Lagos Date: Sun, 12 Apr 2026 12:48:05 +0200 Subject: [PATCH 01/10] Enhance governance and operational skills documentation across Azure, GCP, and GitHub - Updated guardrail maps for Azure and GCP to clarify control patterns and exception handling. - Revised anti-patterns to common mistakes in Azure, GCP, and GitHub skills, emphasizing validation and operational evidence. - Expanded validation sections to ensure clarity on governance scope and operational proof requirements. - Introduced new evidence patterns for monitoring, backup, and restore expectations in Azure and GCP. - Improved decision-making frameworks in strategic skills for GitHub, focusing on enterprise and repo-model choices. - Added detailed examples for safe rollout units and billing ownership in GCP and GitHub structures. --- .../skills/internal-aws-governance/SKILL.md | 25 +++++++--- .../references/guardrail-map.md | 25 ++++++++++ .../skills/internal-aws-operations/SKILL.md | 25 +++++++--- .../references/validation-and-evidence.md | 25 ++++++++++ .../SKILL.md | 25 +++++++--- .../references/control-surface-map.md | 27 +++++++++++ .../skills/internal-aws-strategic/SKILL.md | 26 +++++++---- .../references/lens-playbook.md | 46 +++++++++++++++++++ .../skills/internal-azure-governance/SKILL.md | 25 +++++++--- .../references/guardrail-map.md | 25 ++++++++++ .../skills/internal-azure-operations/SKILL.md | 25 +++++++--- .../references/validation-and-evidence.md | 24 ++++++++++ .../SKILL.md | 25 +++++++--- .../references/topology-map.md | 27 +++++++++++ .../skills/internal-gcp-governance/SKILL.md | 25 +++++++--- .../references/guardrail-map.md | 25 ++++++++++ .../skills/internal-gcp-operations/SKILL.md | 25 +++++++--- .../references/validation-and-evidence.md | 24 ++++++++++ .../SKILL.md | 25 +++++++--- .../references/topology-map.md | 35 ++++++++++++++ .../skills/internal-gcp-strategic/SKILL.md | 26 +++++++---- .../references/lens-playbook.md | 46 +++++++++++++++++++ .../internal-github-governance/SKILL.md | 25 +++++++--- .../references/guardrail-map.md | 25 ++++++++++ .../internal-github-operations/SKILL.md | 25 +++++++--- .../references/validation-and-evidence.md | 24 ++++++++++ .../skills/internal-github-strategic/SKILL.md | 26 +++++++---- .../references/lens-playbook.md | 46 +++++++++++++++++++ 28 files changed, 676 insertions(+), 101 deletions(-) diff --git a/.github/skills/internal-aws-governance/SKILL.md b/.github/skills/internal-aws-governance/SKILL.md index 5d53688..2651db6 100644 --- a/.github/skills/internal-aws-governance/SKILL.md +++ b/.github/skills/internal-aws-governance/SKILL.md @@ -74,10 +74,21 @@ For broader asks, return: - `internal-aws-operations` Use when the next need is preflight, reporting, validation, or operational evidence after the governance design is chosen. -## Anti-patterns - -- using SCPs as if they grant access -- answering a governance question without naming scope -- mixing org-wide guardrails and in-account authorization into one vague recommendation -- proposing break-glass access without boundaries or audit expectations -- recommending rollout without simulation or staged validation when the blast radius is high +## Common mistakes + +| Mistake | Why it matters | Instead | +| --- | --- | --- | +| Using SCPs as if they grant access | Preventive controls get mistaken for execution permissions | Pair SCP guidance with the required IAM grant path and keep their roles distinct | +| Answering a governance question without naming scope | Root, OU, and account-level controls behave very differently | State the exact governance scope before recommending a mechanism | +| Mixing org-wide guardrails and in-account authorization into one vague recommendation | Reviewers cannot see which control prevents versus grants | Separate the org-level mechanism from the account-level authorization design | +| Proposing break-glass access without boundaries or audit expectations | Emergency access becomes a standing privilege with weak accountability | Define who can invoke it, how it is bounded, and what audit evidence must exist | +| Recommending rollout without simulation or staged validation when the blast radius is high | A wide deny or trust failure can interrupt platform operations | Use simulation, targeted rollout, and explicit rollback triggers before widening scope | +| Treating permission boundaries as a replacement for trust design | Delegation is still too broad even if identity policies are constrained | Use permission boundaries to limit delegated builders and trust policies to control who can assume the role | + +## Validation + +- Confirm the governance scope is explicit: root, OU, account set, or single account. +- Confirm the recommended mechanism is clear about whether it prevents, grants, or constrains permissions. +- Confirm trust boundaries and exception paths are explicit when human or workload access crosses account boundaries. +- Confirm staged validation or simulation is named before high-blast-radius rollout. +- Confirm the answer says when operational proof should move to `internal-aws-operations`. diff --git a/.github/skills/internal-aws-governance/references/guardrail-map.md b/.github/skills/internal-aws-governance/references/guardrail-map.md index 88ff6b8..24fa3b3 100644 --- a/.github/skills/internal-aws-governance/references/guardrail-map.md +++ b/.github/skills/internal-aws-governance/references/guardrail-map.md @@ -25,3 +25,28 @@ Use this reference when the user needs a clearer split between AWS governance su - Structure chooses placement. - Governance chooses permissions and guardrails. - Operations verifies that the chosen governance behaves as intended after rollout. + +## Placement patterns + +| Scenario | Prefer | Why | +| --- | --- | --- | +| Block risky services or regions across many accounts | SCP at root or OU | Central preventive control with explicit blast radius | +| Allow a workload role to use only the resources in one account | IAM policy plus trust policy | Grants the action at the execution boundary | +| Let platform engineers create roles without giving them unrestricted permissions | Permission boundary plus scoped creation role | Limits delegated builders without replacing trust design | +| Standardize required tags for cost or ownership workflows | Tag policy plus enforcement in deployment paths | Keeps metadata expectations central but still operationally enforceable | + +## Trust-boundary examples + +| Need | Primary control | Review note | +| --- | --- | --- | +| Human access from an external IdP into AWS accounts | Federation plus tightly scoped assume-role paths | Keep identity-source trust separate from account authorization | +| CI or automation assuming deployment roles across accounts | Trust policy constrained to named principals and conditions | Make environment scope and break-glass path explicit | +| Shared security tooling reading logs across accounts | Resource policy or role assumption with read-only scope | Prefer narrow data-access roles over broad admin trust | + +## Exception patterns with audit expectations + +| Exception type | Pattern | Audit expectation | +| --- | --- | --- | +| Temporary break-glass for incident response | Time-bounded role path with explicit approver and logging | Record who approved, who assumed the role, and when access ended | +| Service team needs an OU-level SCP carve-out | Targeted OU or account exception with expiry review | Record the business reason, compensating controls, and review date | +| Automation cannot yet satisfy a required tag or policy condition | Narrow deployment exception with compensating report | Track affected accounts or resources and the closure plan | diff --git a/.github/skills/internal-aws-operations/SKILL.md b/.github/skills/internal-aws-operations/SKILL.md index 46d4f4a..484f0a4 100644 --- a/.github/skills/internal-aws-operations/SKILL.md +++ b/.github/skills/internal-aws-operations/SKILL.md @@ -74,10 +74,21 @@ For broader asks, return: - `internal-aws-governance` Use when the operations question is actually about IAM, SCP, trust, or guardrail design rather than validation. -## Anti-patterns - -- treating monitoring as proof that restore or recovery works -- skipping preflight for high-blast-radius rollout -- reporting only control intent without operational evidence -- mixing validation advice with new governance design instead of keeping the boundary clear -- giving a DR answer without making the business criticality assumption visible +## Common mistakes + +| Mistake | Why it matters | Instead | +| --- | --- | --- | +| Treating monitoring as proof that restore or recovery works | Healthy telemetry does not prove recovery viability | Keep backup posture, restore proof, and DR validation as separate evidence lines | +| Skipping preflight for high-blast-radius rollout | Access, logging, or automation regressions are discovered too late | Define preflight checks, rollback trigger, and owner before rollout starts | +| Reporting only control intent without operational evidence | The platform looks compliant on paper but not in practice | Record what was observed in CloudTrail, Config, logs, or recovery tests | +| Mixing validation advice with new governance design instead of keeping the boundary clear | The answer stops being a reliable operations owner | Keep new guardrail design in `internal-aws-governance` and validate the chosen design here | +| Giving a DR answer without making the business criticality assumption visible | Recovery effort may be overbuilt or underbuilt | State the assumed RTO, RPO, or criticality before recommending the evidence path | +| Treating one successful rollout wave as proof for all scopes | Wider OUs, regions, or accounts can still fail differently | Widen only after the first safe unit is validated and recorded | + +## Validation + +- Confirm the answer distinguishes confirmed evidence from inferred evidence. +- Confirm preflight checks, rollback trigger, and rollout unit are explicit for risky changes. +- Confirm backup proof and restore proof are treated as separate validation paths when state exists. +- Confirm the main operational signals are named for the affected surface, not as a generic checklist. +- Confirm DR or continuity notes are included only when business criticality or recovery posture is actually in scope. diff --git a/.github/skills/internal-aws-operations/references/validation-and-evidence.md b/.github/skills/internal-aws-operations/references/validation-and-evidence.md index 9cd7245..6414953 100644 --- a/.github/skills/internal-aws-operations/references/validation-and-evidence.md +++ b/.github/skills/internal-aws-operations/references/validation-and-evidence.md @@ -28,3 +28,28 @@ Use this reference when the base skill needs a deeper operational checklist. BC/DR stays optional here as well. Load it when the rollout affects continuity expectations, recovery posture, or business-critical platform capability. + +## Preflight evidence by rollout stage + +| Rollout stage | Evidence to collect before widening | +| --- | --- | +| First account or first OU | IAM simulation or access check, logging still arriving, automation still able to deploy or operate | +| Delegated admin or shared-service activation | Service ownership confirmed, central logs visible, failure path and rollback owner confirmed | +| Broad OU or region expansion | Prior wave observations recorded, unexpected denies investigated, alerting and escalation path confirmed | + +## Backup versus restore proof patterns + +| Need | Acceptable proof | Not enough on its own | +| --- | --- | --- | +| Backup posture exists | Scheduled backups, retention policy, backup job success, protected resource inventory | A statement that backup is enabled | +| Restore is viable | Recent restore test, recovery time observed, application or data integrity verified | Backup job success without a restore exercise | +| DR assumptions are credible | Recovery workflow exercised for the scoped critical service or control plane | Monitoring green after normal operations | + +## AWS signals that confirm intended state + +| Surface | Signals to check | What they confirm | +| --- | --- | --- | +| Access and governance rollout | CloudTrail events, policy simulation result, expected role assumption path | The control still permits intended operations and records who used it | +| Configuration posture | AWS Config evaluations, conformance-pack state, remediation outcome | Preventive and detective controls still align with the intended baseline | +| Logging and observability | Central log delivery, CloudWatch alarms, service health or metric continuity | The rollout did not break visibility on the affected surface | +| Recovery posture | Backup job history, restore test output, runbook execution notes | The stated recovery assumption has real evidence behind it | diff --git a/.github/skills/internal-aws-organization-structure/SKILL.md b/.github/skills/internal-aws-organization-structure/SKILL.md index 5f6e7a3..a3ee974 100644 --- a/.github/skills/internal-aws-organization-structure/SKILL.md +++ b/.github/skills/internal-aws-organization-structure/SKILL.md @@ -80,10 +80,21 @@ For broader asks, return: - `internal-aws-operations` Use when the structure is accepted and the next need is validation, monitoring, backup, or operational evidence. -## Anti-patterns - -- treating the management account as the default operating account -- mixing payer responsibility with day-to-day operational ownership without making the reason explicit -- proposing OU or account layouts without a rollout scope -- hiding global-resource or cross-region blast radius in StackSets discussions -- using structure answers to sneak in IAM or SCP design without separating the concerns +## Common mistakes + +| Mistake | Why it matters | Instead | +| --- | --- | --- | +| Treating the management account as the default operating account | It increases blast radius and weakens separation of duties | Keep the management account minimal and prefer delegated administrator accounts when AWS supports them | +| Mixing payer responsibility with day-to-day operational ownership without making the reason explicit | Finance and platform controls drift together and are harder to change safely | State the financial owner and the operational owner separately | +| Proposing OU or account layouts without a rollout scope | Structural changes become hard to stage or roll back | Name the smallest safe rollout unit: account, OU, or region set | +| Hiding global-resource or cross-region blast radius in StackSets discussions | Failures spread further than the rollout plan suggests | Make regional scope, global resources, and rollback boundaries explicit | +| Using structure answers to sneak in IAM or SCP design without separating the concerns | The lane boundary blurs and review gets weaker | Keep placement decisions in this skill and hand guardrail logic to `internal-aws-governance` | +| Recommending shared services placement without naming service ownership | Central accounts become generic dumping grounds | State which platform capability lives centrally and which workload teams still own their execution accounts | + +## Validation + +- Confirm the placement model is explicit: management account, delegated administrator, shared-services account, or member account. +- Confirm the smallest safe rollout unit is named and matches the proposed structural change. +- Confirm blast radius is explicit for OU moves, delegated admin changes, StackSets rollout, or regional topology shifts. +- Confirm financial ownership and operational ownership are separated when both appear in the answer. +- Confirm the next handoff is clear when the user now needs guardrails or operational validation. diff --git a/.github/skills/internal-aws-organization-structure/references/control-surface-map.md b/.github/skills/internal-aws-organization-structure/references/control-surface-map.md index 574754d..825766e 100644 --- a/.github/skills/internal-aws-organization-structure/references/control-surface-map.md +++ b/.github/skills/internal-aws-organization-structure/references/control-surface-map.md @@ -33,3 +33,30 @@ Use this reference when turning a structural AWS question into the right control - Delegated administrator accounts are still member accounts, so SCPs still apply to them. - StackSets with service-managed permissions do not deploy stacks into the management account. - Global IAM or S3 naming collisions matter more in multi-region StackSets than they do in single-account templates. + +## Starter account and OU patterns + +| Pattern | When it fits | Watch for | +| --- | --- | --- | +| Minimal foundation: management, log archive, security tooling, shared services, workload OUs | Early multi-account platforms that need clear separation without a deep OU tree | Do not overload shared services with workload execution or exception access | +| Environment-oriented workload OUs: `prod`, `nonprod`, plus platform accounts | Teams share a common control posture and rollout cadence by environment | Keep deployment-path differences out of OU names when the real split is risk or residency | +| Business-unit OUs with centralized platform accounts | Large organizations need ownership boundaries first and technical standardization second | Make sure central platform capabilities still have a clear delegated admin model | +| Regulated-segment OU alongside general workloads | A subset of accounts needs stronger residency, logging, or approval controls | Keep the regulated segment justified by requirements, not by vague "special" status | + +## Delegated administrator placement heuristics + +| Question | Prefer | Reason | +| --- | --- | --- | +| Does AWS support delegated admin for this service? | Delegated administrator account | Reduces management-account usage and tightens day-to-day blast radius | +| Does the service operate as a platform capability across many accounts? | A dedicated platform or security account | Keeps service ownership separate from workload accounts | +| Does the service need close alignment with billing or org control actions? | Management account only when AWS requires it | Avoids making the management account the default operator surface | +| Does the service have strong data-sensitivity or incident-response coupling? | Security or logging account with explicit ownership | Keeps investigation and evidence flows separate from application operations | + +## Safe rollout-unit examples + +| Structural change | Start with | Widen after | +| --- | --- | --- | +| New delegated admin activation | One non-critical OU or one service-owned account set | Service behavior, logging, and guardrails are confirmed | +| OU realignment for workloads | One workload family with a documented rollback path | SCP impact, automation paths, and billing visibility are validated | +| StackSets baseline rollout | One account in one region or one low-risk OU | Global-resource effects and failure handling are observed | +| Shared-services account introduction | One platform capability with named consumers | Ownership, network reachability, and operational evidence are proven | diff --git a/.github/skills/internal-aws-strategic/SKILL.md b/.github/skills/internal-aws-strategic/SKILL.md index 10a0b1b..8679e53 100644 --- a/.github/skills/internal-aws-strategic/SKILL.md +++ b/.github/skills/internal-aws-strategic/SKILL.md @@ -140,11 +140,21 @@ Include: - `internal-terraform`, `internal-script-python`, `internal-script-bash` Use when the decision is settled and implementation begins. -## Anti-patterns - -- forcing a full multi-lens analysis for a small question -- treating BC/DR as mandatory for every answer -- recommending a direction without current-source verification when freshness matters -- confusing decision support with implementation guidance -- expanding into tool selection when the user did not ask for it -- giving generic best-practice advice without context, tradeoff, or cost implication +## Common mistakes + +| Mistake | Why it matters | Instead | +| --- | --- | --- | +| Forcing a full multi-lens analysis for a small question | The answer gets heavy without improving the decision | Start with the smallest useful lens set and widen only if risk or ambiguity justifies it | +| Treating BC/DR as mandatory for every answer | Continuity concerns drown out the actual decision | Activate BC/DR only when recovery posture materially changes the recommendation | +| Recommending a direction without current-source verification when freshness matters | AWS support boundaries, limits, or service behavior may have changed | Call out the freshness dependency and route to `internal-aws-mcp-research` when it can change the decision | +| Confusing decision support with implementation guidance | The user loses the strategic framing they asked for | Keep the answer at decision level and hand off only after the direction is chosen | +| Expanding into tool or IaC selection when the user did not ask for it | The response drifts from AWS platform tradeoffs into execution detail | Keep the recommendation centered on the AWS choice, not the delivery tooling | +| Giving generic best-practice advice without context, tradeoff, or cost implication | Generic guidance is hard to act on and easy to misapply | Tie the recommendation to assumptions, viable options, and cost-value consequences | + +## Validation + +- Confirm the decision statement is explicit and narrow enough that the next owner is obvious. +- Confirm assumptions, active lenses, and the main tradeoff are named instead of implied. +- Confirm the recommendation includes reversibility or blast-radius guidance when the choice is hard to unwind. +- Confirm cost-value or operational impact is called out when it materially changes the recommendation. +- Confirm the answer states when freshness matters and whether `internal-aws-mcp-research` should be used. diff --git a/.github/skills/internal-aws-strategic/references/lens-playbook.md b/.github/skills/internal-aws-strategic/references/lens-playbook.md index 17dbb59..b7493dc 100644 --- a/.github/skills/internal-aws-strategic/references/lens-playbook.md +++ b/.github/skills/internal-aws-strategic/references/lens-playbook.md @@ -25,3 +25,49 @@ Use this reference when the user wants more depth than the base skill should loa - Stay in `Quick answer` mode when one option is clearly better and the user asked a narrow question. - Upgrade to `Decision note` when at least two viable options exist. - Upgrade to `Deep analysis` only when the user asks for it or the risk profile justifies it. + +## Worked AWS decision shapes + +### Landing-zone or control-plane choice + +| Situation | Frame first | Recommendation shape | +| --- | --- | --- | +| New multi-account platform with centralized controls | `organization-structure`, `governance` | Compare a thin management account plus delegated administrators against a more centralized operating model, then state the smallest safe rollout unit | +| Existing single-account estate moving into Organizations | `organization-structure`, `blast radius` | Focus on migration sequencing, shared-services placement, and how guardrails will land without locking out current operators | +| Control Tower versus lighter custom control-plane direction | `governance`, `operations` | State what managed guardrails buy, what flexibility is lost, and where current-fact validation is required before committing | + +### Identity or delegated-access choice + +| Situation | Frame first | Recommendation shape | +| --- | --- | --- | +| Central team needs day-to-day operation of an integrated service | `identity and access`, `governance` | Compare delegated administrator patterns against management-account operation, then call out the trust and audit implications | +| Workload teams need cross-account delivery access | `identity and access`, `blast radius` | Compare direct broad roles against narrower environment-scoped roles and name the rollback impact if trust is too wide | +| Federation model is still undecided | `governance`, `compliance` | Keep the answer at trust-boundary and operating-model level, not IdP implementation detail | + +### Cost-sensitive platform choice + +| Situation | Frame first | Recommendation shape | +| --- | --- | --- | +| Shared services could live centrally or per account | `FinOps`, `operations` | Compare central efficiency against tenant isolation and operational burden, then say when duplication is worth the spend | +| Multi-region posture is being considered mainly for resilience | `FinOps`, `BC/DR` | Make the continuity target explicit before recommending the extra cost or complexity | +| Organization-wide baseline tooling is under review | `FinOps`, `maintainability` | Compare managed-service convenience against steady-state platform ownership cost | + +## Decision note pattern + +Use this when the question is too consequential for a quick answer but does not need a full deep analysis. + +1. Decision statement: what AWS choice is being made. +2. Assumptions: what current state, constraints, or timelines the recommendation depends on. +3. Viable options: usually two or three realistic AWS-local paths. +4. Recommendation: which option wins and why. +5. Tradeoffs and blast radius: what gets better, what gets harder, and what is hard to reverse. +6. Validation note: what current-fact check, proof, or next-owner handoff is still required. + +## When to stay quick answer versus upgrade to a decision note + +| Stay in `Quick answer` when | Upgrade to `Decision note` when | +| --- | --- | +| One option is clearly better and the downside is local | At least two AWS-local options are still viable | +| The choice does not alter the organization, trust, or recovery posture | The choice changes account layout, delegated access, or continuity expectations | +| The answer can stay within one lens without hiding material risk | A second lens changes the recommendation or the risk statement | +| Freshness is not the deciding factor | Current AWS behavior, support boundaries, or limits could change the outcome | diff --git a/.github/skills/internal-azure-governance/SKILL.md b/.github/skills/internal-azure-governance/SKILL.md index e9efeac..051604b 100644 --- a/.github/skills/internal-azure-governance/SKILL.md +++ b/.github/skills/internal-azure-governance/SKILL.md @@ -76,10 +76,21 @@ For broader asks, return: - `awesome-copilot-azure-role-selector` Use as depth support when the governance boundary is already clear and the next need is least-privilege role selection, custom-role fallback, or assignment artifacts such as CLI commands and Bicep snippets. -## Anti-patterns - -- treating Azure Policy as if it grants access -- answering a governance question without naming scope -- mixing tenant-wide guardrails and subscription-level authorization into one vague recommendation -- proposing emergency access without boundaries, audit expectations, or privileged-access posture -- recommending rollout without staged validation when the blast radius is high +## Common mistakes + +| Mistake | Why it matters | Instead | +| --- | --- | --- | +| Treating Azure Policy as if it grants access | Preventive controls get confused with authorization paths | Pair Policy guidance with the RBAC or identity model that actually grants access | +| Answering a governance question without naming scope | Management-group, subscription, and resource scopes behave differently | State the exact scope before recommending a mechanism | +| Mixing tenant-wide guardrails and subscription-level authorization into one vague recommendation | Reviewers cannot see what prevents versus what grants | Separate Policy or PIM posture from RBAC assignments and workload identity design | +| Proposing emergency access without boundaries, audit expectations, or privileged-access posture | Break-glass becomes a standing exception instead of controlled elevation | Define who can elevate, how long it lasts, and what evidence must exist | +| Recommending rollout without staged validation when the blast radius is high | Wide RBAC or Policy errors can block operations quickly | Use scoped rollout, compliance checks, and explicit rollback triggers | +| Treating managed identities as a reason to skip scope design | Identity becomes secretless but still over-privileged | Keep identity type and authorization scope as separate decisions | + +## Validation + +- Confirm the governance scope is explicit: management group, subscription set, or single subscription. +- Confirm the recommended mechanism is clear about whether it prevents, grants, or constrains privileged access. +- Confirm identity boundaries and exception paths are explicit for human and workload access. +- Confirm staged rollout validation is named before high-blast-radius Policy, RBAC, or PIM changes. +- Confirm the answer says when operational proof should move to `internal-azure-operations` and when least-privilege role depth should move to `awesome-copilot-azure-role-selector`. diff --git a/.github/skills/internal-azure-governance/references/guardrail-map.md b/.github/skills/internal-azure-governance/references/guardrail-map.md index ca500dd..a84fed2 100644 --- a/.github/skills/internal-azure-governance/references/guardrail-map.md +++ b/.github/skills/internal-azure-governance/references/guardrail-map.md @@ -25,3 +25,28 @@ Use this reference when the user needs a clearer split between Azure governance - Structure chooses placement. - Governance chooses permissions and guardrails. - Operations verifies that the chosen governance behaves as intended after rollout. + +## Azure control patterns + +| Scenario | Prefer | Why | +| --- | --- | --- | +| Enforce allowed locations, SKUs, or network posture across many subscriptions | Azure Policy or initiative at management-group scope | Central preventive or detective guardrail with visible inheritance | +| Grant engineers or groups access to operate resources | RBAC role assignments with explicit scope | Keeps authorization tied to real ownership boundaries | +| Limit standing privilege for sensitive operations | PIM or PAM posture | Elevation becomes time-bound, reviewable, and auditable | +| Remove secrets from workload-to-service access | Managed identity or federation pattern | Keeps workload identity separate from human access grants | + +## Identity and federation examples + +| Need | Primary control | Review note | +| --- | --- | --- | +| Workload in Azure needs access to Azure resources | Managed identity plus scoped RBAC | Identity type does not replace least-privilege scope design | +| External CI system deploys into Azure | Federation plus narrow RBAC scope | Keep token trust and resource authorization as separate review points | +| Human operators need elevated production access | PIM-backed role path with approval and duration controls | Keep emergency and routine elevation paths distinct | + +## Exception patterns with audit expectations + +| Exception type | Pattern | Audit expectation | +| --- | --- | --- | +| Policy exception for a subset of subscriptions | Scoped exemption with reason, owner, and expiry or review date | Record business reason, compensating controls, and revalidation date | +| Temporary elevated operator access | PIM assignment with time bound and approval path | Record approver, duration, and activity evidence | +| Workload cannot yet use managed identity | Time-bounded secret-based fallback with closure plan | Track affected app, owner, rotation path, and migration deadline | diff --git a/.github/skills/internal-azure-operations/SKILL.md b/.github/skills/internal-azure-operations/SKILL.md index a3edd40..5b3397b 100644 --- a/.github/skills/internal-azure-operations/SKILL.md +++ b/.github/skills/internal-azure-operations/SKILL.md @@ -76,10 +76,21 @@ For broader asks, return: - `awesome-copilot-azure-resource-health-diagnose` Use as depth support when the Azure resource is already identified and the next need is deep health diagnosis, log and telemetry analysis, or a remediation plan for that specific resource. -## Anti-patterns - -- treating monitoring as proof that restore or recovery works -- skipping preflight for high-blast-radius rollout -- reporting only control intent without operational evidence -- mixing validation advice with new governance design instead of keeping the boundary clear -- giving a DR answer without making the business criticality assumption visible +## Common mistakes + +| Mistake | Why it matters | Instead | +| --- | --- | --- | +| Treating monitoring as proof that restore or recovery works | Healthy dashboards do not prove recovery viability | Keep monitoring evidence, backup proof, and restore proof as separate lines | +| Skipping preflight for high-blast-radius rollout | Policy, identity, or connectivity failures surface too late | Define rollout unit, preflight checks, rollback trigger, and owner before rollout | +| Reporting only control intent without operational evidence | The platform appears compliant without proof that it works | Record what Azure Monitor, Log Analytics, backup, or compliance signals actually showed | +| Mixing validation advice with new governance design instead of keeping the boundary clear | The operations skill stops being a reliable validation owner | Keep new Policy or RBAC design in `internal-azure-governance` and validate it here | +| Giving a DR answer without making the business criticality assumption visible | Recovery guidance can be overbuilt or incomplete | State the assumed criticality, RTO, or RPO before recommending the validation path | +| Treating one successful rollout wave as proof for all subscriptions or regions | Wider inheritance, network, or residency paths can still fail differently | Validate the first safe unit and widen only after recording real evidence | + +## Validation + +- Confirm the answer distinguishes confirmed evidence from inferred evidence. +- Confirm preflight checks, rollback trigger, and rollout unit are explicit for risky changes. +- Confirm Azure Monitor or Log Analytics signals are named for the affected surface, not as a generic checklist. +- Confirm backup proof and restore proof are treated as separate validation paths when state exists. +- Confirm DR or continuity notes are included only when business criticality or recovery posture is actually in scope. diff --git a/.github/skills/internal-azure-operations/references/validation-and-evidence.md b/.github/skills/internal-azure-operations/references/validation-and-evidence.md index 5f16f14..4e98ce5 100644 --- a/.github/skills/internal-azure-operations/references/validation-and-evidence.md +++ b/.github/skills/internal-azure-operations/references/validation-and-evidence.md @@ -28,3 +28,27 @@ Use this reference when the base skill needs a deeper operational checklist. BC/DR stays optional here as well. Load it when the rollout affects continuity expectations, recovery posture, or business-critical platform capability. + +## Azure Monitor and Log Analytics evidence patterns + +| Surface | Signals to check | What they confirm | +| --- | --- | --- | +| Identity or RBAC rollout | Sign-in or activity signals, denied action evidence, successful intended operations | The access model still permits intended work and surfaces regressions | +| Policy rollout | Compliance state, remediation outcome, and exceptions still scoped correctly | Guardrails applied as expected without unintended drift | +| Platform topology or shared services | Health, logs, and alert continuity for the affected control plane | The rollout did not break core visibility or routing | + +## Backup versus restore proof expectations + +| Need | Acceptable proof | Not enough on its own | +| --- | --- | --- | +| Backup posture exists | Protected resource inventory, backup policy attachment, recent job success | A claim that backup is enabled | +| Restore is viable | Restore exercise, observed recovery time, integrity check after recovery | Backup success without a restore test | +| DR posture is credible | Site Recovery or equivalent continuity exercise for the scoped service | General health signals during normal operations | + +## Stage-aware rollout evidence + +| Rollout stage | Evidence to collect before widening | +| --- | --- | +| First management group or subscription set | Inheritance behaves as expected, monitoring is still present, rollback owner confirmed | +| First landing-zone or platform slice | Connectivity, automation, and alerting still work under the new design | +| Broad subscription or region expansion | Prior wave observations recorded, regressions investigated, escalation path confirmed | diff --git a/.github/skills/internal-azure-organization-structure/SKILL.md b/.github/skills/internal-azure-organization-structure/SKILL.md index fa2a74a..77464ab 100644 --- a/.github/skills/internal-azure-organization-structure/SKILL.md +++ b/.github/skills/internal-azure-organization-structure/SKILL.md @@ -78,10 +78,21 @@ For broader asks, return: - `internal-azure-operations` Use when the structure is accepted and the next need is validation, monitoring, backup, or operational evidence. -## Anti-patterns - -- proposing hierarchy or subscription layouts without a rollout scope -- mixing landing-zone placement and RBAC design into one vague answer -- treating network topology as an operations concern instead of a structure concern -- using structure answers to sneak in Policy or RBAC design without separating the concerns -- ignoring region or residency implications when they materially shape layout +## Common mistakes + +| Mistake | Why it matters | Instead | +| --- | --- | --- | +| Proposing hierarchy or subscription layouts without a rollout scope | Tenant and hierarchy changes are hard to unwind if staged poorly | Name the smallest safe rollout unit: management group, subscription set, or region set | +| Mixing landing-zone placement and RBAC design into one vague answer | Structure and governance review get blurred together | Keep placement here and move authorization or guardrails to `internal-azure-governance` | +| Treating network topology as an operations concern instead of a structure concern | Connectivity design decisions get delayed until after layout is fixed | Keep hub-spoke, Virtual WAN, and regional topology in the structure lane | +| Using structure answers to sneak in Policy or RBAC design without separating the concerns | The ownership boundary becomes unreliable | State where the capability lives and hand off what controls or permissions apply | +| Ignoring region or residency implications when they materially shape layout | Subscription or landing-zone placement can violate real requirements | Make sovereignty, region pairing, and continuity assumptions explicit | +| Recommending platform subscriptions without naming their operating purpose | Platform estates become catch-all containers with unclear ownership | State whether the subscription is for connectivity, identity, management, or shared services | + +## Validation + +- Confirm the placement model is explicit: management group, subscription family, landing zone, or platform topology. +- Confirm the smallest safe rollout unit is named and matches the proposed structural change. +- Confirm region or residency implications are explicit when they shape hierarchy, subscription, or topology choices. +- Confirm platform ownership and workload ownership are separated when both appear in the recommendation. +- Confirm the next handoff is clear when the user now needs governance controls or operational proof. diff --git a/.github/skills/internal-azure-organization-structure/references/topology-map.md b/.github/skills/internal-azure-organization-structure/references/topology-map.md index 37f044b..6e4afb8 100644 --- a/.github/skills/internal-azure-organization-structure/references/topology-map.md +++ b/.github/skills/internal-azure-organization-structure/references/topology-map.md @@ -33,3 +33,30 @@ Use this reference when turning a structural Azure question into the right contr - Management groups shape policy and RBAC inheritance scope, but they do not replace subscription-level ownership. - Landing-zone design should keep platform topology separate from workload-by-workload implementation detail. - Regional placement is a structural concern when it changes connectivity, sovereignty, or continuity assumptions. + +## Starter management-group and subscription patterns + +| Pattern | When it fits | Watch for | +| --- | --- | --- | +| Platform MG plus workload MG split | A central platform team needs clear separation from application ownership | Do not hide production versus non-production risk boundaries inside one flat workload group | +| Environment-first subscription families | Teams share controls and rollout cadence by environment | Keep environment naming aligned to operating reality, not only to billing labels | +| Regulated or sovereign segment | A subset of workloads needs distinct residency, approval, or connectivity posture | Keep the segment requirement explicit so it does not become a vague exception bucket | +| Shared connectivity and management subscriptions | Platform services need stable ownership outside workload subscriptions | Keep service purpose and landing-zone expectations explicit | + +## Landing-zone placement heuristics + +| Question | Prefer | Reason | +| --- | --- | --- | +| Does the capability provide shared connectivity or central policy plumbing? | Platform landing zone or dedicated platform subscription | Keeps shared controls separate from workload delivery | +| Does the capability exist only for one workload or product boundary? | Workload landing zone or workload subscription | Avoids centralizing application-specific ownership | +| Does residency or regulated access change the operating model? | Dedicated hierarchy or landing-zone segment | Prevents mixing incompatible policy and connectivity assumptions | +| Does the change affect many subscriptions at once? | Management-group level placement with staged rollout | Keeps inheritance and blast radius visible | + +## Safe rollout-unit examples + +| Structural change | Start with | Widen after | +| --- | --- | --- | +| New management-group branch | One low-risk subscription family | Inheritance, policy scope, and operational ownership are confirmed | +| Landing-zone baseline update | One landing zone or one environment slice | Connectivity, automation, and rollback behavior are observed | +| Platform subscription introduction | One shared capability with named consumers | Ownership, routing, and dependency impact are validated | +| Region or residency split | One workload set with explicit fallback | Connectivity, sovereignty controls, and continuity assumptions are proven | diff --git a/.github/skills/internal-gcp-governance/SKILL.md b/.github/skills/internal-gcp-governance/SKILL.md index a4a9379..d6f1179 100644 --- a/.github/skills/internal-gcp-governance/SKILL.md +++ b/.github/skills/internal-gcp-governance/SKILL.md @@ -73,10 +73,21 @@ For broader asks, return: - `internal-gcp-operations` Use when the next need is preflight, reporting, validation, inventory, or operational evidence after the governance design is chosen. -## Anti-patterns - -- treating Org Policy as if it grants access -- answering a governance question without naming scope -- mixing org-wide guardrails and project-level authorization into one vague recommendation -- proposing service account or emergency access without boundaries or audit expectations -- recommending rollout without staged validation when the blast radius is high +## Common mistakes + +| Mistake | Why it matters | Instead | +| --- | --- | --- | +| Treating Org Policy as if it grants access | Preventive governance gets confused with authorization | Pair Org Policy guidance with the IAM path that actually grants access | +| Answering a governance question without naming scope | Org, folder, and project controls behave differently | State the exact governance scope before recommending a mechanism | +| Mixing org-wide guardrails and project-level authorization into one vague recommendation | Reviewers cannot see what prevents versus what grants | Separate Org Policy or inheritance posture from IAM bindings and workload identity design | +| Proposing service account or emergency access without boundaries or audit expectations | Privilege becomes durable and hard to review | Define who can use it, how it is bounded, and what evidence must exist | +| Recommending rollout without staged validation when the blast radius is high | A wide deny or identity failure can interrupt platform operations | Use targeted rollout, explicit rollback, and verification before widening scope | +| Treating workload identity federation as a reason to skip service-account boundary design | Keys may disappear but privilege can still stay too broad | Keep identity mechanism and authorization scope as separate decisions | + +## Validation + +- Confirm the governance scope is explicit: org, folder set, or project set. +- Confirm the recommended mechanism is clear about whether it prevents, grants, or constrains workload identity. +- Confirm service-account and human-access boundaries are explicit when access crosses project or folder boundaries. +- Confirm staged rollout validation is named before high-blast-radius Org Policy or IAM changes. +- Confirm the answer says when operational proof should move to `internal-gcp-operations`. diff --git a/.github/skills/internal-gcp-governance/references/guardrail-map.md b/.github/skills/internal-gcp-governance/references/guardrail-map.md index 88aadb0..c0ebe9c 100644 --- a/.github/skills/internal-gcp-governance/references/guardrail-map.md +++ b/.github/skills/internal-gcp-governance/references/guardrail-map.md @@ -25,3 +25,28 @@ Use this reference when the user needs a clearer split between GCP governance su - Structure chooses placement. - Governance chooses permissions and guardrails. - Operations verifies that the chosen governance behaves as intended after rollout. + +## GCP control patterns + +| Scenario | Prefer | Why | +| --- | --- | --- | +| Restrict risky services, locations, or default behaviors across many projects | Org Policy at org or folder scope | Central preventive control with visible inheritance | +| Grant operators or workloads access to act in one project set | IAM binding model with explicit scope | Keeps authorization tied to real ownership boundaries | +| Remove external workload keys from delivery systems | Workload identity federation | Keeps trust externalization separate from resource authorization | +| Limit service-account sprawl and privilege creep | Service account boundary design plus scoped IAM | Keeps workload identities purpose-built and reviewable | + +## Service-account and federation examples + +| Need | Primary control | Review note | +| --- | --- | --- | +| External CI system deploys into GCP | Federation plus narrow project or folder IAM scope | Keep token trust and resource authorization as separate review points | +| Shared automation spans many projects | Purpose-built service account per automation boundary | Avoid one universal service account with broad permissions | +| Human operators need elevated access during incidents | Time-bounded emergency path with explicit approval and logging | Keep routine admin access separate from incident access | + +## Exception patterns with audit expectations + +| Exception type | Pattern | Audit expectation | +| --- | --- | --- | +| Org Policy exception for a small project set | Scoped exception with reason, owner, and review date | Record business reason, compensating controls, and revalidation date | +| Temporary fallback to service-account keys | Time-bounded exception with rotation and migration plan | Track affected workload, owner, and closure deadline | +| Broader IAM grant required during migration | Narrowly scoped temporary binding with rollback note | Record who approved it, where it applies, and when it expires | diff --git a/.github/skills/internal-gcp-operations/SKILL.md b/.github/skills/internal-gcp-operations/SKILL.md index d4adcaa..21e3020 100644 --- a/.github/skills/internal-gcp-operations/SKILL.md +++ b/.github/skills/internal-gcp-operations/SKILL.md @@ -74,10 +74,21 @@ For broader asks, return: - `internal-gcp-governance` Use when the operations question is actually about IAM, workload identity, Org Policy, or guardrail design rather than validation. -## Anti-patterns - -- treating monitoring as proof that restore or recovery works -- skipping preflight for high-blast-radius rollout -- reporting only control intent without operational evidence -- mixing validation advice with new governance design instead of keeping the boundary clear -- giving a DR answer without making the business criticality assumption visible +## Common mistakes + +| Mistake | Why it matters | Instead | +| --- | --- | --- | +| Treating monitoring as proof that restore or recovery works | Healthy metrics do not prove recovery viability | Keep monitoring evidence, backup proof, and restore proof as separate lines | +| Skipping preflight for high-blast-radius rollout | IAM, Org Policy, or Shared VPC regressions surface too late | Define rollout unit, preflight checks, rollback trigger, and owner before rollout | +| Reporting only control intent without operational evidence | The platform appears compliant without proof that it works | Record what Monitoring, Logging, asset inventory, or recovery exercises actually showed | +| Mixing validation advice with new governance design instead of keeping the boundary clear | The operations skill stops being a reliable validation owner | Keep new Org Policy or IAM design in `internal-gcp-governance` and validate it here | +| Giving a DR answer without making the business criticality assumption visible | Recovery guidance can be overbuilt or incomplete | State the assumed criticality, RTO, or RPO before recommending the evidence path | +| Treating one successful rollout wave as proof for all folders or projects | Wider inheritance or network paths can still fail differently | Validate the first safe unit and widen only after recording real evidence | + +## Validation + +- Confirm the answer distinguishes confirmed evidence from inferred evidence. +- Confirm preflight checks, rollback trigger, and rollout unit are explicit for risky changes. +- Confirm Monitoring, Logging, and inventory signals are named for the affected surface, not as a generic checklist. +- Confirm backup proof and restore proof are treated as separate validation paths when state exists. +- Confirm DR or continuity notes are included only when business criticality or recovery posture is actually in scope. diff --git a/.github/skills/internal-gcp-operations/references/validation-and-evidence.md b/.github/skills/internal-gcp-operations/references/validation-and-evidence.md index 41865a0..aed92cc 100644 --- a/.github/skills/internal-gcp-operations/references/validation-and-evidence.md +++ b/.github/skills/internal-gcp-operations/references/validation-and-evidence.md @@ -29,3 +29,27 @@ Use this reference when the base skill needs a deeper operational checklist. BC/DR stays optional here as well. Load it when the rollout affects continuity expectations, recovery posture, or business-critical platform capability. + +## Monitoring and inventory evidence patterns + +| Surface | Signals to check | What they confirm | +| --- | --- | --- | +| IAM or Org Policy rollout | Expected actions still succeed, denied actions are visible, audit records exist | The control still permits intended work and surfaces regressions | +| Shared VPC or topology change | Connectivity and logging still work for the scoped projects | The structure change did not silently break shared networking | +| Asset inventory and reporting | Inventory still shows the intended projects, identities, and control surfaces | The rollout did not create untracked drift | + +## Backup versus restore proof expectations + +| Need | Acceptable proof | Not enough on its own | +| --- | --- | --- | +| Backup posture exists | Protected resource inventory, policy attachment, recent backup success | A statement that backup is enabled | +| Restore is viable | Restore exercise, observed recovery time, integrity verification after recovery | Backup success without a restore test | +| DR posture is credible | Recovery workflow exercised for the scoped critical service or control plane | General health signals during normal operations | + +## Stage-aware rollout evidence + +| Rollout stage | Evidence to collect before widening | +| --- | --- | +| First folder or project set | Inheritance behaves as expected, monitoring is still present, rollback owner confirmed | +| First Shared VPC or central-service slice | Connectivity, audit logs, and ownership paths still behave as intended | +| Broad project or region expansion | Prior wave observations recorded, regressions investigated, escalation path confirmed | diff --git a/.github/skills/internal-gcp-organization-structure/SKILL.md b/.github/skills/internal-gcp-organization-structure/SKILL.md index 0f33602..09da838 100644 --- a/.github/skills/internal-gcp-organization-structure/SKILL.md +++ b/.github/skills/internal-gcp-organization-structure/SKILL.md @@ -79,10 +79,21 @@ For broader asks, return: - `internal-gcp-operations` Use when the structure is accepted and the next need is validation, monitoring, backup, inventory, or operational evidence. -## Anti-patterns - -- proposing org or project layouts without a rollout scope -- mixing Shared VPC placement and IAM design into one vague answer -- treating billing layout as an afterthought when it changes ownership or blast radius -- using structure answers to sneak in Org Policy or IAM design without separating the concerns -- ignoring region or residency implications when they materially shape layout +## Common mistakes + +| Mistake | Why it matters | Instead | +| --- | --- | --- | +| Proposing org or project layouts without a rollout scope | Structural changes are hard to unwind if staged poorly | Name the smallest safe rollout unit: folder, project set, or region set | +| Mixing Shared VPC placement and IAM design into one vague answer | Structure and governance review get blurred together | Keep host and service project placement here and move access design to `internal-gcp-governance` | +| Treating billing layout as an afterthought when it changes ownership or blast radius | Finance and platform decisions drift together and become hard to review | Make billing ownership explicit when it differs from project or folder ownership | +| Using structure answers to sneak in Org Policy or IAM design without separating the concerns | The lane boundary becomes unreliable | State where the capability lives and hand off what controls or permissions apply | +| Ignoring region or residency implications when they materially shape layout | Project or Shared VPC placement can violate real requirements | Make sovereignty, region choice, and continuity assumptions explicit | +| Recommending central host projects without naming who operates them | Shared infrastructure becomes a vague platform bucket | State the host-project owner and which service projects depend on it | + +## Validation + +- Confirm the placement model is explicit: org branch, folder set, project family, Shared VPC host, or billing boundary. +- Confirm the smallest safe rollout unit is named and matches the proposed structural change. +- Confirm region or residency implications are explicit when they shape folder, project, or network placement. +- Confirm billing ownership and platform ownership are separated when both appear in the recommendation. +- Confirm the next handoff is clear when the user now needs governance controls or operational proof. diff --git a/.github/skills/internal-gcp-organization-structure/references/topology-map.md b/.github/skills/internal-gcp-organization-structure/references/topology-map.md index 6ed59a5..28b9ec7 100644 --- a/.github/skills/internal-gcp-organization-structure/references/topology-map.md +++ b/.github/skills/internal-gcp-organization-structure/references/topology-map.md @@ -33,3 +33,38 @@ Use this reference when turning a structural GCP question into the right control - Org and folder hierarchy shape policy inheritance scope, but they do not replace project-level ownership. - Shared VPC is a structure decision first; IAM and firewall posture come later in governance. - Regional placement is a structural concern when it changes connectivity, sovereignty, or continuity assumptions. + +## Starter folder and project patterns + +| Pattern | When it fits | Watch for | +| --- | --- | --- | +| Platform folder plus workload folders | A central platform team needs clear separation from product teams | Do not hide environment or residency differences inside one flat workload folder | +| Environment-first project families | Teams share controls and rollout cadence by environment | Keep the project purpose explicit so billing and operations remain clear | +| Regulated workload segment | A subset of projects needs stronger residency or approval posture | Keep the regulated segment justified by requirements, not vague exception status | +| Shared services projects separate from application projects | Platform services need stable ownership outside product delivery | Name which shared capabilities belong there and which do not | + +## Shared VPC placement heuristics + +| Question | Prefer | Reason | +| --- | --- | --- | +| Does the network serve many workloads across one operating boundary? | Central host project with named service projects | Keeps network ownership clear while preserving workload project ownership | +| Does the workload need isolated network administration for regulation or autonomy? | Dedicated host project or separate topology segment | Avoids forcing incompatible controls into one network boundary | +| Does the change affect many projects at once? | Folder-level rollout and one low-risk service-project set first | Keeps inheritance and blast radius visible | +| Does the host project also carry platform services? | Separate network and shared-service ownership unless clearly justified | Prevents one project from becoming the default home for everything | + +## Billing ownership examples + +| Situation | Useful pattern | Review note | +| --- | --- | --- | +| Central platform funds shared controls | Billing ownership separate from workload project ownership | Make chargeback or showback assumptions explicit | +| Business units own spend but share core network | Business-unit project ownership plus central Shared VPC service | Keep the dependency and support model explicit | +| Regulated workloads need isolated financial reporting | Dedicated billing boundary or reporting model | Do not let compliance segmentation drift from actual ownership | + +## Safe rollout-unit examples + +| Structural change | Start with | Widen after | +| --- | --- | --- | +| New folder branch | One low-risk project family | Inheritance, automation, and rollback behavior are confirmed | +| Shared VPC introduction | One host project with one low-risk service-project set | Connectivity, logging, and ownership paths are proven | +| Billing ownership realignment | One product or environment slice | Chargeback, approvals, and automation still behave as intended | +| Region or residency split | One workload set with explicit fallback | Connectivity and sovereignty assumptions are validated | diff --git a/.github/skills/internal-gcp-strategic/SKILL.md b/.github/skills/internal-gcp-strategic/SKILL.md index 0b8469f..13e2273 100644 --- a/.github/skills/internal-gcp-strategic/SKILL.md +++ b/.github/skills/internal-gcp-strategic/SKILL.md @@ -140,11 +140,21 @@ Include: - `internal-terraform`, `internal-script-python`, `internal-script-bash` Use when the decision is settled and implementation begins. -## Anti-patterns - -- forcing a full multi-lens analysis for a small question -- treating BC/DR as mandatory for every answer -- recommending a direction without current-source verification when freshness matters -- confusing decision support with implementation guidance -- expanding into tool selection when the user did not ask for it -- giving generic best-practice advice without context, tradeoff, or cost implication +## Common mistakes + +| Mistake | Why it matters | Instead | +| --- | --- | --- | +| Forcing a full multi-lens analysis for a small question | The answer becomes heavier than the decision requires | Start with the smallest useful lens set and widen only if risk or ambiguity justifies it | +| Treating BC/DR as mandatory for every answer | Continuity language crowds out the actual GCP tradeoff | Activate BC/DR only when regional continuity or recovery posture changes the recommendation | +| Recommending a direction without current-source verification when freshness matters | Product limits, Org Policy behavior, or regional support may have changed | Call out the freshness dependency and say which Google Cloud facts still need current verification | +| Confusing decision support with implementation guidance | The user loses the strategic framing they asked for | Keep the answer at decision level and hand off only after the direction is chosen | +| Expanding into tool or automation selection when the user did not ask for it | The response drifts from platform tradeoffs into delivery detail | Keep the recommendation centered on the GCP choice, not the tooling | +| Giving generic best-practice advice without context, tradeoff, or cost implication | Generic advice is hard to act on and easy to misapply | Tie the recommendation to assumptions, viable options, and cost-value consequences | + +## Validation + +- Confirm the decision statement is explicit and narrow enough that the next owner is obvious. +- Confirm assumptions, active lenses, and the main tradeoff are named instead of implied. +- Confirm the recommendation includes reversibility or blast-radius guidance when the choice is hard to unwind. +- Confirm cost-value or operational impact is called out when it materially changes the recommendation. +- Confirm the answer states when freshness matters and which current Google Cloud fact still needs validation. diff --git a/.github/skills/internal-gcp-strategic/references/lens-playbook.md b/.github/skills/internal-gcp-strategic/references/lens-playbook.md index 6099467..3c38b73 100644 --- a/.github/skills/internal-gcp-strategic/references/lens-playbook.md +++ b/.github/skills/internal-gcp-strategic/references/lens-playbook.md @@ -25,3 +25,49 @@ Use this reference when the user wants more depth than the base skill should loa - Stay in `Quick answer` mode when one option is clearly better and the user asked a narrow question. - Upgrade to `Decision note` when at least two viable options exist. - Upgrade to `Deep analysis` only when the user asks for it or the risk profile justifies it. + +## Worked GCP decision shapes + +### Org or Shared VPC choice + +| Situation | Frame first | Recommendation shape | +| --- | --- | --- | +| New platform needs clear enterprise segmentation | `organization-structure`, `governance` | Compare a simpler folder and project model against a more segmented operating model, then name the smallest safe rollout unit | +| Shared VPC ownership is undecided | `organization-structure`, `operations` | Compare central network ownership against product-aligned host projects and call out the operational burden tradeoff | +| Existing flat project estate needs stronger control boundaries | `organization-structure`, `blast radius` | Focus on migration sequencing, billing ownership, and how Org Policy inheritance will land safely | + +### Identity or workload-access choice + +| Situation | Frame first | Recommendation shape | +| --- | --- | --- | +| External delivery system needs access into GCP | `identity and access`, `governance` | Compare federation against key-based fallbacks and make the trust and audit boundary explicit | +| Service accounts have grown too broad | `governance`, `blast radius` | Compare shared service accounts against narrower purpose-built identities and name the rollback impact | +| Human and workload access patterns are being mixed | `identity and access`, `compliance` | Separate operator access from workload identity before recommending the control stack | + +### Cost-sensitive platform choice + +| Situation | Frame first | Recommendation shape | +| --- | --- | --- | +| Shared platform services could be centralized or duplicated per environment | `FinOps`, `operations` | Compare central efficiency against environment isolation and operational overhead | +| Multi-region posture is under consideration mainly for resilience | `FinOps`, `BC/DR` | Make the continuity target explicit before recommending the extra cost or complexity | +| Project-per-service versus larger shared projects is under review | `FinOps`, `maintainability` | Compare management overhead, isolation value, and chargeback clarity | + +## Decision note pattern + +Use this when the question is too consequential for a quick answer but does not need a full deep analysis. + +1. Decision statement: what GCP choice is being made. +2. Assumptions: what current state, scale, or compliance constraints matter. +3. Viable options: usually two or three realistic GCP-local paths. +4. Recommendation: which option wins and why. +5. Tradeoffs and blast radius: what gets better, what gets harder, and what is difficult to reverse. +6. Validation note: which current fact, proof, or handoff is still required. + +## When to stay quick answer versus upgrade to a decision note + +| Stay in `Quick answer` when | Upgrade to `Decision note` when | +| --- | --- | +| One option is clearly better and the downside is local | At least two GCP-local options are still viable | +| The choice does not alter org, trust, or recovery posture | The choice changes folders, projects, Shared VPC layout, or continuity expectations | +| The answer can stay within one lens without hiding material risk | A second lens changes the recommendation or the risk statement | +| Freshness is not the deciding factor | Current Google Cloud behavior, support boundaries, or limits could change the outcome | diff --git a/.github/skills/internal-github-governance/SKILL.md b/.github/skills/internal-github-governance/SKILL.md index b226f70..fbe9eef 100644 --- a/.github/skills/internal-github-governance/SKILL.md +++ b/.github/skills/internal-github-governance/SKILL.md @@ -71,10 +71,21 @@ For broader asks, return: - `internal-github-operations` Use when the next need is preflight, audit evidence, drift checks, or validation after the governance design is chosen. -## Anti-patterns - -- treating permissions as if they automatically imply the right branch or environment guardrails -- answering a governance question without naming scope -- mixing org-wide guardrails and repository-level exceptions into one vague recommendation -- proposing OIDC, Apps, or secret posture without naming trust boundaries -- recommending rollout without staged validation when the blast radius is high +## Common mistakes + +| Mistake | Why it matters | Instead | +| --- | --- | --- | +| Treating permissions as if they automatically imply the right branch or environment guardrails | Authorization and workflow controls get conflated | Pair permissions with the rulesets, environments, or approvals that actually constrain risky actions | +| Answering a governance question without naming scope | Enterprise, organization, repository, and environment scopes behave differently | State the exact scope before recommending a mechanism | +| Mixing org-wide guardrails and repository-level exceptions into one vague recommendation | Reviewers cannot see what is standard versus exceptional | Separate the baseline control from the exception path and its owner | +| Proposing OIDC, Apps, or secret posture without naming trust boundaries | Automation trust becomes broad and hard to audit | Name what actor gets access, what boundary limits it, and what evidence must exist | +| Recommending rollout without staged validation when the blast radius is high | Ruleset, permission, or environment errors can block delivery quickly | Use scoped rollout, validation, and explicit rollback triggers before widening | +| Treating Copilot governance as identical to general repository permissions | Policy, licensing, and visibility needs may differ | Keep Copilot entitlement and governance posture explicit when they diverge from repo permissions | + +## Validation + +- Confirm the governance scope is explicit: enterprise, organization, repository set, or environment. +- Confirm the recommended mechanism is clear about whether it prevents, grants, or constrains automation. +- Confirm trust boundaries are explicit for Apps, Actions, OIDC, secrets, or Copilot policy choices. +- Confirm staged rollout validation is named before high-blast-radius ruleset, permission, or environment changes. +- Confirm the answer says when operational proof should move to `internal-github-operations`. diff --git a/.github/skills/internal-github-governance/references/guardrail-map.md b/.github/skills/internal-github-governance/references/guardrail-map.md index b1fae03..1723f74 100644 --- a/.github/skills/internal-github-governance/references/guardrail-map.md +++ b/.github/skills/internal-github-governance/references/guardrail-map.md @@ -25,3 +25,28 @@ Use this reference when the user needs a clearer split between GitHub governance - Strategic absorbs light enterprise, org, and repo-shape decisions. - Governance chooses permissions and guardrails. - Operations verifies that the chosen governance behaves as intended after rollout. + +## GitHub control patterns + +| Scenario | Prefer | Why | +| --- | --- | --- | +| Enforce merge, review, and branch standards broadly | Rulesets or branch protection at the appropriate scope | Central preventive guardrail with explicit inheritance | +| Limit what automations may do in repositories | GitHub App or scoped Actions permissions | Keeps automation trust and authorization reviewable | +| Protect production delivery steps | Environments plus approvals and secret boundaries | Separates release control from daily development activity | +| Reduce static cloud secrets in workflows | OIDC posture with cloud-side least privilege | Keeps federation separate from repository permissions | + +## Trust-boundary examples + +| Need | Primary control | Review note | +| --- | --- | --- | +| Repository automation needs repo-level write operations | GitHub App with narrow repository permissions | Keep installation scope and token privileges explicit | +| Workflow needs cloud access without long-lived credentials | OIDC trust plus environment or branch guardrails | Review both the GitHub trust boundary and the cloud-side role scope | +| Reusable workflow needs elevated deployment rights | Environment approval plus scoped workflow permissions | Avoid giving every workflow the same broad token surface | + +## Exception patterns with audit expectations + +| Exception type | Pattern | Audit expectation | +| --- | --- | --- | +| Ruleset exception for a subset of repositories | Scoped exception with owner, reason, and review date | Record why the exception exists, where it applies, and when it will be rechecked | +| Temporary environment-bypass or elevated automation access | Time-bounded exception with explicit approver and rollback note | Record who approved it, how long it lasts, and what activity occurred | +| Copilot governance carve-out for a pilot group | Narrow org or repo set with policy note and review point | Record the pilot scope, entitlement reason, and follow-up decision date | diff --git a/.github/skills/internal-github-operations/SKILL.md b/.github/skills/internal-github-operations/SKILL.md index 8e03c4d..59fb26f 100644 --- a/.github/skills/internal-github-operations/SKILL.md +++ b/.github/skills/internal-github-operations/SKILL.md @@ -70,10 +70,21 @@ For broader asks, return: - `internal-github-governance` Use when the operations question is actually about rulesets, permissions, OIDC, secret posture, or guardrail design rather than validation. -## Anti-patterns - -- treating workflow success as proof that permissions or guardrails are correct -- skipping preflight for high-blast-radius rollout -- reporting only intended policy without operational evidence -- mixing validation advice with new governance design instead of keeping the boundary clear -- giving a continuity answer without making the build or release criticality assumption visible +## Common mistakes + +| Mistake | Why it matters | Instead | +| --- | --- | --- | +| Treating workflow success as proof that permissions or guardrails are correct | A single green run can hide excessive privilege or missing failure paths | Check expected permission boundaries, audit trails, and negative cases as separate signals | +| Skipping preflight for high-blast-radius rollout | Ruleset, token, runner, or environment regressions surface too late | Define rollout unit, preflight checks, rollback trigger, and owner before rollout | +| Reporting only intended policy without operational evidence | Governance looks correct on paper without proof that delivery still works | Record what workflows, runners, and audit surfaces actually showed | +| Mixing validation advice with new governance design instead of keeping the boundary clear | The operations skill stops being a reliable validation owner | Keep new guardrail design in `internal-github-governance` and validate it here | +| Giving a continuity answer without making the build or release criticality assumption visible | Continuity guidance can be overbuilt or incomplete | State the assumed pipeline or release criticality before recommending the evidence path | +| Treating one successful rollout wave as proof for all repositories or environments | Wider repo sets or runner groups can still fail differently | Validate the first safe unit and widen only after recording real evidence | + +## Validation + +- Confirm the answer distinguishes confirmed evidence from inferred evidence. +- Confirm preflight checks, rollback trigger, and rollout unit are explicit for risky changes. +- Confirm workflow, runner, and audit signals are named for the affected surface, not as a generic checklist. +- Confirm permission proof and operational proof are treated as separate validation paths. +- Confirm continuity notes are included only when build, release, or repository criticality is actually in scope. diff --git a/.github/skills/internal-github-operations/references/validation-and-evidence.md b/.github/skills/internal-github-operations/references/validation-and-evidence.md index b151317..62965c4 100644 --- a/.github/skills/internal-github-operations/references/validation-and-evidence.md +++ b/.github/skills/internal-github-operations/references/validation-and-evidence.md @@ -28,3 +28,27 @@ Use this reference when the base skill needs a deeper operational checklist. Continuity stays optional here as well. Load it when the rollout affects build, release, or repository continuity expectations. + +## Runner health evidence patterns + +| Surface | Signals to check | What they confirm | +| --- | --- | --- | +| Hosted or self-hosted runner rollout | Queue time, runner availability, job execution success, failed startup or registration events | Runner capacity and health still match the operating assumptions | +| Environment or protected deployment path | Approval flow, environment access, deployment job outcome | Release controls still allow intended delivery | +| Repository or org workflow change | Workflow run outcome, token scope behavior, audit events | The rollout did not silently widen or break automation behavior | + +## Workflow-permission validation + +| Need | Acceptable proof | Not enough on its own | +| --- | --- | --- | +| Workflow still has the intended permissions | Successful expected action plus evidence that denied actions remain denied | A single green run without checking token scope | +| Environment controls still work | Approval, secret access, and deployment path behave as designed | Deployment success without confirming who could trigger it | +| Automation trust is still constrained | Audit records plus scoped actor behavior match the design | The absence of visible failures | + +## Audit and drift follow-up patterns + +| Rollout stage | Evidence to collect before widening | +| --- | --- | +| First repository or environment | Audit trail exists, permission behavior matches expectation, rollback owner confirmed | +| First runner group or automation boundary | Queue health, registration health, and failure handling still work | +| Broad organization rollout | Prior wave observations recorded, drift checks reviewed, regressions investigated | diff --git a/.github/skills/internal-github-strategic/SKILL.md b/.github/skills/internal-github-strategic/SKILL.md index babda0c..25e8e51 100644 --- a/.github/skills/internal-github-strategic/SKILL.md +++ b/.github/skills/internal-github-strategic/SKILL.md @@ -141,11 +141,21 @@ Include: - `internal-github-actions`, `internal-script-python`, `internal-script-bash` Use when the decision is settled and implementation begins. -## Anti-patterns - -- forcing a full multi-lens analysis for a small question -- treating BC/DR as mandatory for every answer -- recommending a direction without current-source verification when freshness matters -- confusing decision support with implementation guidance -- expanding into tool selection when the user did not ask for it -- forcing GitHub into a cloud-provider structure pattern when the boundary is weaker +## Common mistakes + +| Mistake | Why it matters | Instead | +| --- | --- | --- | +| Forcing a full multi-lens analysis for a small question | The answer gets heavy without improving the decision | Start with the smallest useful lens set and widen only if risk or ambiguity justifies it | +| Treating BC/DR as mandatory for every answer | Continuity concerns can crowd out the actual GitHub operating-model choice | Activate BC/DR only when build, release, or repository continuity materially changes the recommendation | +| Recommending a direction without current-source verification when freshness matters | Product boundaries, permission models, or licensing limits may have changed | Call out the freshness dependency and say which GitHub fact still needs current verification | +| Confusing decision support with implementation guidance | The user loses the strategic framing they asked for | Keep the answer at decision level and hand off only after the direction is chosen | +| Expanding into tool selection when the user did not ask for it | The response drifts from GitHub tradeoffs into execution detail | Keep the recommendation centered on the platform choice, not the delivery tooling | +| Forcing GitHub into a cloud-provider structure pattern when the boundary is weaker | The strategic lane gets distorted and a fake structure owner pressure appears | Keep light enterprise, org, and repo-shape decisions inside strategic unless a real boundary emerges | + +## Validation + +- Confirm the decision statement is explicit and narrow enough that the next owner is obvious. +- Confirm assumptions, active lenses, and the main tradeoff are named instead of implied. +- Confirm the recommendation includes reversibility or blast-radius guidance when the choice is hard to unwind. +- Confirm viable options are compared in GitHub-local terms such as repo model, Apps trust, runners, or Copilot posture. +- Confirm the answer states when freshness matters and which current GitHub fact still needs validation. diff --git a/.github/skills/internal-github-strategic/references/lens-playbook.md b/.github/skills/internal-github-strategic/references/lens-playbook.md index 0b85f38..d59951a 100644 --- a/.github/skills/internal-github-strategic/references/lens-playbook.md +++ b/.github/skills/internal-github-strategic/references/lens-playbook.md @@ -25,3 +25,49 @@ Use this reference when the user wants more depth than the base skill should loa - Stay in `Quick answer` mode when one option is clearly better and the user asked a narrow question. - Upgrade to `Decision note` when at least two viable options exist. - Upgrade to `Deep analysis` only when the user asks for it or the risk profile justifies it. + +## Worked GitHub decision shapes + +### Enterprise or repo-model choice + +| Situation | Frame first | Recommendation shape | +| --- | --- | --- | +| Team is choosing mono-repo versus multi-repo | `maintainability`, `governance` | Compare developer workflow, ruleset scope, and ownership blast radius before recommending a model | +| Organization layout or repository ownership is unclear | `governance`, `blast radius` | Compare centralized versus delegated ownership and name the smallest safe rollout unit | +| Enterprise-level GitHub rollout is still forming | `governance`, `FinOps` | Focus on entitlement, operating boundaries, and support model before implementation detail | + +### Apps, automation trust, or runner choice + +| Situation | Frame first | Recommendation shape | +| --- | --- | --- | +| Automation could run as GitHub App, Actions token, or external integration | `security`, `governance` | Compare trust boundary, permission surface, and operational burden, then say which path is easier to audit | +| Runner platform is undecided | `runner model`, `operations` | Compare managed convenience against fleet ownership and continuity expectations | +| OIDC posture is being considered mainly to remove secrets | `security`, `governance` | Keep trust design, scope, and rollout risk explicit before implementation details | + +### Copilot rollout or licensing choice + +| Situation | Frame first | Recommendation shape | +| --- | --- | --- | +| Copilot rollout is limited by budget or policy | `Copilot`, `FinOps` | Compare broad rollout against staged enablement and name the governance implications | +| A team wants to pilot Copilot features in a narrower scope | `Copilot`, `governance` | State how policy, visibility, and exception handling will be kept explicit | +| Enterprise wants a platform-wide enablement direction | `Copilot`, `compliance` | Keep entitlement decisions separate from repo-permission design | + +## Decision note pattern + +Use this when the question is too consequential for a quick answer but does not need a full deep analysis. + +1. Decision statement: what GitHub choice is being made. +2. Assumptions: what current org model, runner posture, compliance needs, or licensing limits matter. +3. Viable options: usually two or three realistic GitHub-local paths. +4. Recommendation: which option wins and why. +5. Tradeoffs and blast radius: what gets better, what gets harder, and what is difficult to reverse. +6. Validation note: which current fact, proof, or handoff is still required. + +## When to stay quick answer versus upgrade to a decision note + +| Stay in `Quick answer` when | Upgrade to `Decision note` when | +| --- | --- | +| One option is clearly better and the downside is local | At least two GitHub-local options are still viable | +| The choice does not alter trust, continuity, or repo ownership posture | The choice changes repo model, Apps trust, runner model, or Copilot posture | +| The answer can stay within one lens without hiding material risk | A second lens changes the recommendation or the risk statement | +| Freshness is not the deciding factor | Current GitHub behavior, permissions, or product limits could change the outcome | From 13fd1016642c710d570937c5dd40afbebe8ae436 Mon Sep 17 00:00:00 2001 From: Diego Mauricio Lagos Date: Sun, 12 Apr 2026 12:48:10 +0200 Subject: [PATCH 02/10] feat: enhance GitHub Actions documentation and templates for composite actions --- ...al-github-action-composite.instructions.md | 6 +- .../internal-github-actions.instructions.md | 9 ++- .../internal-github-action-composite/SKILL.md | 29 +++++++- .../references/action-readme-template.md | 40 ++++++++++ .../references/multi-step-template.md | 49 ++++++++++++ .../references/output-forwarding-pattern.md | 48 ++++++++++++ .../references/testing-pattern.md | 41 ++++++++++ .../references/versioning-strategy.md | 21 ++++++ .../skills/internal-github-actions/SKILL.md | 34 ++++++--- .../references/auth-snippets.md | 15 +++- .../references/caching-and-artifacts.md | 54 ++++++++++++++ .../references/reusable-workflow-template.md | 57 ++++++++++++++ .../references/reuse-decision-tree.md | 20 +++++ .../security-hardening-checklist.md | 26 +++++++ .../references/workflow-example.md | 74 +++++++++++++++++-- .../references/workflow-patterns-catalog.md | 58 +++++++++++++++ 16 files changed, 554 insertions(+), 27 deletions(-) create mode 100644 .github/skills/internal-github-action-composite/references/action-readme-template.md create mode 100644 .github/skills/internal-github-action-composite/references/multi-step-template.md create mode 100644 .github/skills/internal-github-action-composite/references/output-forwarding-pattern.md create mode 100644 .github/skills/internal-github-action-composite/references/testing-pattern.md create mode 100644 .github/skills/internal-github-action-composite/references/versioning-strategy.md create mode 100644 .github/skills/internal-github-actions/references/caching-and-artifacts.md create mode 100644 .github/skills/internal-github-actions/references/reusable-workflow-template.md create mode 100644 .github/skills/internal-github-actions/references/reuse-decision-tree.md create mode 100644 .github/skills/internal-github-actions/references/security-hardening-checklist.md create mode 100644 .github/skills/internal-github-actions/references/workflow-patterns-catalog.md diff --git a/.github/instructions/internal-github-action-composite.instructions.md b/.github/instructions/internal-github-action-composite.instructions.md index 4c9fff5..5fe6ac0 100644 --- a/.github/instructions/internal-github-action-composite.instructions.md +++ b/.github/instructions/internal-github-action-composite.instructions.md @@ -10,8 +10,12 @@ applyTo: "**/actions/**/action.y*ml" - Keep only composite-specific rules here. ## Composite-specific rules -- Define explicit `inputs` and validate required values early. +- Define explicit `inputs` and `outputs`, and keep published names stable for existing callers. +- Validate required values early and fail before the main logic runs. - Pass `${{ inputs.* }}` through `env:` before shell usage. +- Use `$GITHUB_OUTPUT` and `$GITHUB_ENV` for multi-step coordination instead of ad hoc temp files when shell steps need to share state. - Keep `shell: bash` explicit and start shell blocks with `set -euo pipefail`. - Move longer logic into dedicated scripts instead of large inline `run:` blocks. +- Document inputs, outputs, and a minimal usage example next to the action. +- Keep a lightweight happy-path validation path before release, such as a smoke workflow or fixture-based script. - Preserve backward-compatible input and output contracts when modifying an existing composite action. diff --git a/.github/instructions/internal-github-actions.instructions.md b/.github/instructions/internal-github-actions.instructions.md index 1e63385..ec885bd 100644 --- a/.github/instructions/internal-github-actions.instructions.md +++ b/.github/instructions/internal-github-actions.instructions.md @@ -14,6 +14,8 @@ applyTo: "**/workflows/**,**/actions/**/action.y*ml" - Start with `contents: read` and add write scopes only when the job requires them. - Avoid `pull_request_target` for untrusted code. - Pass secrets only through `secrets.*` or protected environments; never hardcode them in `env`. +- For production deployments, use protected `environment:` gates with required reviewers instead of relying on branch conditions alone. +- Treat self-hosted runners as trusted infrastructure and scope them to the repositories, runner groups, and network access they actually need. ## Family baseline - Use clear English step names and deterministic outputs. @@ -22,9 +24,10 @@ applyTo: "**/workflows/**,**/actions/**/action.y*ml" - Prefer reusable workflows (`workflow_call`) for repeated job orchestration inside one repository. - Prefer smaller jobs with explicit `needs` over monolithic workflows when phases are logically separate. - Use `if` conditions deliberately for branch, event, and environment-specific execution. -- Keep cache and artifact usage explicit, deterministic, and scoped to real reuse. -- Use self-hosted runners only for justified hardware, network, or cost reasons, and note the security and maintenance tradeoff. +- Keep cache keys deterministic from lockfiles, tool versions, or other stable inputs instead of timestamps or branch-only entropy. +- Set explicit artifact `retention-days` when artifacts bridge review, release, or deploy stages. +- Validate `workflow_dispatch` free-form inputs before shell, deploy, or infrastructure steps consume them. ## Use the skill for deeper guidance -- Load `.github/skills/internal-github-actions/SKILL.md` for workflow-vs-reusable-vs-composite decisions, reusable workflow patterns, and examples. +- Load `.github/skills/internal-github-actions/SKILL.md` for workflow-vs-reusable-vs-composite decisions, reusable workflow templates, cache and artifact patterns, and workflow hardening checklists. - Keep this instruction as the auto-loaded baseline; keep authoring depth and examples in the skill. diff --git a/.github/skills/internal-github-action-composite/SKILL.md b/.github/skills/internal-github-action-composite/SKILL.md index 91c5fb0..0532d21 100644 --- a/.github/skills/internal-github-action-composite/SKILL.md +++ b/.github/skills/internal-github-action-composite/SKILL.md @@ -9,6 +9,7 @@ description: Use when creating or modifying a reusable GitHub composite action u - Create a new reusable composite action under `.github/actions/`. - Modify an existing composite action and preserve compatibility. - Deepen a GitHub Actions task that has already been classified as composite-action authoring. +- Add or revise composite-action documentation, outputs, testing guidance, or release discipline. ## Relationship to the umbrella skill - `internal-github-actions` is the default entry point for GitHub Actions authoring. @@ -28,12 +29,19 @@ description: Use when creating or modifying a reusable GitHub composite action u - Pass expression inputs via `env:` instead of interpolating `${{ }}` directly in `run:`. - Keep `shell: bash` explicit on composite steps. - Start shell blocks with `set -euo pipefail`. +- Forward caller-visible values through action `outputs:` mapped from `$GITHUB_OUTPUT`. +- Use `$GITHUB_ENV` only for step-to-step state inside the action. - Extract long shell logic into a dedicated script early. - Preserve backward compatibility when modifying existing inputs or outputs. -## Minimal template +## Reference map -Load `references/minimal-template.md` when you need the starter `action.yml` shape. Keep the initial template small, validate required inputs early, and move longer logic into a script instead of growing a large inline `run:` block. +- Load [minimal template](references/minimal-template.md) for the smallest safe starter `action.yml`. +- Load [multi-step template](references/multi-step-template.md) when the action coordinates validation, shared state, and caller-visible outputs across several steps. +- Load [output forwarding pattern](references/output-forwarding-pattern.md) when a step result must become a documented action output. +- Load [testing pattern](references/testing-pattern.md) for smoke workflow, failure-path, and contract checks. +- Load [action README template](references/action-readme-template.md) to document inputs, outputs, side effects, and usage. +- Load [versioning strategy](references/versioning-strategy.md) when the action is published or compatibility-sensitive. ## Common mistakes @@ -44,6 +52,8 @@ Load `references/minimal-template.md` when you need the starter `action.yml` sha | Large inline `run:` blocks | They are hard to read, lint, and review | Extract the logic into a dedicated script early | | No input validation before the main logic | Failures surface later with weaker error messages | Validate required inputs in the first step and fail fast | | Forgetting `shell: bash` on composite steps | Runner defaults can differ and change behavior | Keep `shell: bash` explicit | +| Hiding caller-visible values in `$GITHUB_ENV` or temp files | Callers cannot read the value and the contract stays implicit | Forward the value through `$GITHUB_OUTPUT` and `outputs:` | +| Leaving outputs or usage undocumented | Consumers guess the contract and misuse the action | Keep a minimal README with inputs, outputs, side effects, and an example | | Breaking an existing input or output contract | Callers can fail without a clear migration path | Add new fields compatibly and preserve the old contract where possible | ## Cross-references @@ -51,9 +61,20 @@ Load `references/minimal-template.md` when you need the starter `action.yml` sha - **internal-github-actions** (`.github/skills/internal-github-actions/SKILL.md`): for the umbrella GitHub Actions lane and reuse-pattern selection. - **internal-script-bash** (`.github/skills/internal-script-bash/SKILL.md`): for extracted shell scripts inside the action. +## Validation checklist + +- [ ] Inputs and outputs are explicit. +- [ ] Required inputs are validated in the first step. +- [ ] Multi-step state uses `$GITHUB_OUTPUT` and `$GITHUB_ENV` intentionally. +- [ ] Longer logic lives in a dedicated script. +- [ ] The README example matches the real inputs and outputs. +- [ ] A happy-path smoke validation exists. +- [ ] Existing input and output names remain compatible, or the breaking change is versioned deliberately. + ## Validation -- Inputs are explicit and validated early. +- Inputs and outputs are explicit and validated early. - Shell code uses `set -euo pipefail`, quoted variables, and explicit `shell: bash`. -- Longer logic moves into a dedicated script instead of staying inline. +- A smoke workflow or equivalent fixture validates the happy path. +- Caller-visible outputs are forwarded through documented `outputs:` mappings. - Existing input and output contracts remain backward compatible. diff --git a/.github/skills/internal-github-action-composite/references/action-readme-template.md b/.github/skills/internal-github-action-composite/references/action-readme-template.md new file mode 100644 index 0000000..e0cc79a --- /dev/null +++ b/.github/skills/internal-github-action-composite/references/action-readme-template.md @@ -0,0 +1,40 @@ +# Action README Template + +Use this template when a composite action needs a minimal, repeatable consumer contract. + +````md +# + +One sentence on what the action does and what it does not do. + +## Inputs + +| Name | Required | Description | +| --- | --- | --- | +| `source-directory` | Yes | Directory to package. | + +## Outputs + +| Name | Description | +| --- | --- | +| `archive-path` | Archive path produced by the action. | + +## Side effects + +- Writes an archive under `$RUNNER_TEMP`. +- Requires `tar` on the runner. + +## Example + +```yaml +steps: + - uses: ./.github/actions/ + with: + source-directory: dist +``` + +## Compatibility notes + +- Existing input and output names remain stable within a major release. +- Document breaking changes and migration steps in release notes. +```` diff --git a/.github/skills/internal-github-action-composite/references/multi-step-template.md b/.github/skills/internal-github-action-composite/references/multi-step-template.md new file mode 100644 index 0000000..bc9d88e --- /dev/null +++ b/.github/skills/internal-github-action-composite/references/multi-step-template.md @@ -0,0 +1,49 @@ +# Multi-Step Composite Template + +Use this starter when a composite action must validate input, share state across steps, and expose a caller-visible output. + +```yaml +name: Package Directory +description: Validate input, create an archive, and expose the archive path +inputs: + source-directory: + description: Directory to package + required: true +outputs: + archive-path: + description: Archive path for later workflow steps + value: ${{ steps.bundle.outputs.archive-path }} +runs: + using: composite + steps: + - name: Validate source directory + shell: bash + env: + SOURCE_DIRECTORY: ${{ inputs.source-directory }} + run: | + set -euo pipefail + if [[ ! -d "$SOURCE_DIRECTORY" ]]; then + echo "source-directory does not exist: $SOURCE_DIRECTORY" >&2 + exit 1 + fi + echo "SOURCE_DIRECTORY=$SOURCE_DIRECTORY" >> "$GITHUB_ENV" + - name: Define archive path + id: bundle + shell: bash + run: | + set -euo pipefail + archive_path="$RUNNER_TEMP/source.tgz" + echo "archive-path=$archive_path" >> "$GITHUB_OUTPUT" + echo "ARCHIVE_PATH=$archive_path" >> "$GITHUB_ENV" + - name: Create archive + shell: bash + run: | + set -euo pipefail + tar -czf "$ARCHIVE_PATH" -C "$SOURCE_DIRECTORY" . +``` + +Keep the shape intentional: + +- use `$GITHUB_ENV` for intra-action state and `$GITHUB_OUTPUT` for caller-visible data +- give output-producing steps stable `id` values +- extract longer build or deploy logic into a script once the action becomes orchestration-heavy diff --git a/.github/skills/internal-github-action-composite/references/output-forwarding-pattern.md b/.github/skills/internal-github-action-composite/references/output-forwarding-pattern.md new file mode 100644 index 0000000..f651b3c --- /dev/null +++ b/.github/skills/internal-github-action-composite/references/output-forwarding-pattern.md @@ -0,0 +1,48 @@ +# Output Forwarding Pattern + +Use this pattern when a composite action computes a value in one step and needs to expose it to callers as an action output. + +```yaml +name: Normalize Target +description: Validate and forward a deployment target +inputs: + target: + description: Deployment target + required: true +outputs: + normalized-target: + description: Lowercase target value for caller workflows + value: ${{ steps.normalize.outputs.normalized-target }} +runs: + using: composite + steps: + - name: Validate input + shell: bash + env: + TARGET: ${{ inputs.target }} + run: | + set -euo pipefail + if [[ -z "$TARGET" ]]; then + echo "target is required" >&2 + exit 1 + fi + - name: Normalize target + id: normalize + shell: bash + env: + TARGET: ${{ inputs.target }} + run: | + set -euo pipefail + value="$(printf '%s' "$TARGET" | tr '[:upper:]' '[:lower:]')" + echo "normalized-target=$value" >> "$GITHUB_OUTPUT" + echo "NORMALIZED_TARGET=$value" >> "$GITHUB_ENV" +``` + +Keep the contract stable: + +- declare the public output in `outputs:` +- give the producing step an `id` +- write caller-visible key-value pairs to `$GITHUB_OUTPUT` +- use `$GITHUB_ENV` only for step-to-step state inside the action + +Do not forward values through temp files when `$GITHUB_OUTPUT` and `$GITHUB_ENV` already express the contract clearly. diff --git a/.github/skills/internal-github-action-composite/references/testing-pattern.md b/.github/skills/internal-github-action-composite/references/testing-pattern.md new file mode 100644 index 0000000..d7d88b2 --- /dev/null +++ b/.github/skills/internal-github-action-composite/references/testing-pattern.md @@ -0,0 +1,41 @@ +# Composite Action Testing Pattern + +Use a small layered strategy instead of assuming the action works because `action.yml` looks correct. + +## 1. Smoke workflow + +Run the action from a workflow that checks out the repository, calls the local action, and asserts one success-path output. + +```yaml +jobs: + smoke: + runs-on: ubuntu-latest + steps: + - name: Check out repository + # actions/checkout@v6.0.2 + # https://github.com/actions/checkout/releases/tag/v6.0.2 + uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd + - name: Run local action + id: action + uses: ./.github/actions/package-directory + with: + source-directory: tests/fixtures/package-directory + - name: Assert archive output + shell: bash + env: + ARCHIVE_PATH: ${{ steps.action.outputs.archive-path }} + run: | + set -euo pipefail + test -n "$ARCHIVE_PATH" + test -f "$ARCHIVE_PATH" +``` + +## 2. Failure-path check + +- Trigger at least one invalid input case and assert the action fails early with a clear error. + +## 3. Contract check + +- Keep README inputs and outputs in sync with `action.yml`. +- Re-run smoke checks when adding or renaming any input or output. +- Treat output name changes as compatibility changes, not as harmless refactors. diff --git a/.github/skills/internal-github-action-composite/references/versioning-strategy.md b/.github/skills/internal-github-action-composite/references/versioning-strategy.md new file mode 100644 index 0000000..5fda33e --- /dev/null +++ b/.github/skills/internal-github-action-composite/references/versioning-strategy.md @@ -0,0 +1,21 @@ +# Versioning Strategy + +Use versioning to communicate compatibility, not as a substitute for SHA pinning. + +## Rules + +- Treat the full commit SHA as the immutable security pin in caller workflows. +- Use release tags as human-readable labels and move major tags only within a compatibility line. +- Cut a new major when inputs, outputs, required tools, or externally visible behavior break callers. +- Keep deprecated inputs or outputs long enough for callers to migrate when practical. +- Document compatibility and migration notes in the action README and release notes. +- When publishing examples, pair the human tag with the pinned SHA in comments instead of telling callers to pin only the tag. + +## Compatibility questions before release + +- Can existing callers keep the same `with:` block? +- Do output names and meanings stay the same? +- Are failure messages or side effects materially different? +- Does the action now require extra permissions or tools on the runner? + +If the answer to any of these is no, treat the change as a versioning event rather than a silent cleanup. diff --git a/.github/skills/internal-github-actions/SKILL.md b/.github/skills/internal-github-actions/SKILL.md index 11283a9..591de6b 100644 --- a/.github/skills/internal-github-actions/SKILL.md +++ b/.github/skills/internal-github-actions/SKILL.md @@ -10,6 +10,7 @@ description: Use when authoring or revising GitHub Actions workflows, reusable w - Create or modify reusable workflows exposed through `workflow_call`. - Decide whether repeated step logic should stay inline, move to a reusable workflow, or move to a composite action. - Add CI/CD jobs such as build, test, lint, release, or deployment. +- Review whether a workflow should stay inline, move to `workflow_call`, or extract script-first logic before introducing another reusable layer. Use `internal-devops-core-principles` when the question is delivery strategy, release model, or operating-model health rather than workflow authoring. @@ -19,6 +20,18 @@ Use `internal-devops-core-principles` when the question is delivery strategy, re - Pin every third-party action to a full-length SHA with adjacent release comment. - Keep `permissions` least-privilege. - Keep step names and logs in English. +- Validate `workflow_dispatch` free-form inputs before shell or deploy steps consume them. + +## Reference map + +- Load [auth snippets](references/auth-snippets.md) for AWS, Azure, and GCP OIDC examples. +- Load [workflow example](references/workflow-example.md) for a compact validated deploy archetype. +- Load [reusable workflow template](references/reusable-workflow-template.md) when the repeated unit is one or more jobs. +- Load [workflow patterns catalog](references/workflow-patterns-catalog.md) for matrix, scheduled, environment-gated, and reusable-workflow shapes. +- Load [caching and artifacts](references/caching-and-artifacts.md) for deterministic cache keys and reviewed artifact handoffs. +- Load [reuse decision tree](references/reuse-decision-tree.md) when inline steps, scripts, reusable workflows, and composite actions all look plausible. +- Load [security hardening checklist](references/security-hardening-checklist.md) when the workflow touches deployment, secrets, self-hosted runners, or untrusted events. +- Load `.github/skills/internal-github-action-composite/SKILL.md` when the reusable unit becomes step-level or contract-sensitive. ## Choose the right reuse pattern @@ -27,23 +40,20 @@ Use `internal-devops-core-principles` when the question is delivery strategy, re | Simple pipeline in one repository | Standard workflow | | Repeated job orchestration inside one repository | Reusable workflow (`workflow_call`) | | Shared step logic across repositories or many workflows | Composite action | +| Mostly shell or language-specific commands with thin orchestration | Script called from the workflow | | Composite action authoring or contract changes | Load `internal-github-action-composite` | -## Auth and workflow examples - -- Load `references/auth-snippets.md` for AWS, Azure, and GCP OIDC snippets. -- Load `references/workflow-example.md` for the minimal workflow skeleton. - ## Common mistakes | Mistake | Why it matters | Instead | |---|---|---| | Using `permissions: write-all` or omitting permissions entirely | Grants the token maximum access and widens the blast radius of a compromised step | Declare only the permissions the job needs | | Pinning actions by tag (`@v4`) instead of SHA | Tags are mutable and can be retargeted upstream | Pin to a full-length commit SHA with a release comment | -| Using `secrets.GITHUB_TOKEN` for cross-repo operations | The token is scoped to the current repository only | Use a GitHub App or PAT with minimal scope | | Long-lived cloud credentials in secrets instead of OIDC | Static credentials can leak and do not expire automatically | Configure OIDC federation for AWS, Azure, or GCP | | Missing `environment` protection on production deploys | Anyone who can push to the branch can deploy to production | Add an environment with required reviewers | -| Running `terraform apply` without a plan artifact | Plan drift can occur between review and deployment | Save the plan in the PR job and load it in the deploy job | +| Letting `workflow_dispatch` inputs flow directly into shell or deploy steps | Free-form input becomes a control path without validation | Validate and normalize inputs in an early step or job | +| Using timestamp-driven or branch-only cache keys | Cache hits become noisy, stale, or misleading across runs | Key caches from lockfiles, tool versions, and other stable inputs | +| Uploading artifacts without explicit name or retention | Review and deploy handoffs become ambiguous and harder to clean up | Name artifacts deliberately and set `retention-days` | | Duplicating steps across workflows instead of reusable workflow or composite action | Maintenance burden grows with every copy | Extract to a reusable workflow in one repo or a composite action across repos | ## Cross-references @@ -56,11 +66,15 @@ Use `internal-devops-core-principles` when the question is delivery strategy, re - [ ] OIDC configured for cloud auth. - [ ] All third-party actions pinned by full SHA with release comments. - [ ] `permissions` minimized per job. -- [ ] Environment protection enabled for production. -- [ ] Validation steps included where the workflow changes infrastructure or releases. +- [ ] `workflow_dispatch` inputs validated before shell, deploy, or infrastructure steps use them. +- [ ] Cache keys are deterministic and artifacts set explicit `retention-days` when they bridge jobs. +- [ ] Environment protection and `concurrency` are enabled for production deploys. +- [ ] Self-hosted runner use is justified and scoped. +- [ ] The reuse choice has been checked against `references/reuse-decision-tree.md`. ## Validation - `actionlint` on workflow files, if available. - Verify there is no `permissions: write-all` and no missing permissions block where least privilege matters. -- Verify all `uses:` lines reference full SHAs instead of tags. +- Verify all third-party `uses:` lines reference full SHAs instead of tags. +- Re-check that every referenced local guide resolves before treating the skill update as complete. diff --git a/.github/skills/internal-github-actions/references/auth-snippets.md b/.github/skills/internal-github-actions/references/auth-snippets.md index 1563f65..f7441d1 100644 --- a/.github/skills/internal-github-actions/references/auth-snippets.md +++ b/.github/skills/internal-github-actions/references/auth-snippets.md @@ -7,7 +7,10 @@ permissions: contents: read steps: - - uses: aws-actions/configure-aws-credentials@ + - name: Configure AWS credentials + # aws-actions/configure-aws-credentials@v6.1.0 + # https://github.com/aws-actions/configure-aws-credentials/releases/tag/v6.1.0 + uses: aws-actions/configure-aws-credentials@ec61189d14ec14c8efccab744f656cffd0e33f37 with: role-to-assume: ${{ secrets.AWS_ROLE_ARN }} aws-region: eu-south-1 @@ -25,7 +28,10 @@ permissions: contents: read steps: - - uses: azure/login@ + - name: Sign in to Azure + # Azure/login@v3.0.0 + # https://github.com/Azure/login/releases/tag/v3.0.0 + uses: Azure/login@93381592711f247e165c389ebb30b596c84cdc48 with: client-id: ${{ secrets.AZURE_CLIENT_ID }} tenant-id: ${{ secrets.AZURE_TENANT_ID }} @@ -44,7 +50,10 @@ permissions: contents: read steps: - - uses: google-github-actions/auth@ + - name: Authenticate to Google Cloud + # google-github-actions/auth@v3.0.0 + # https://github.com/google-github-actions/auth/releases/tag/v3.0.0 + uses: google-github-actions/auth@7c6bc770dae815cd3e89ee6cdf493a5fab2cc093 with: workload_identity_provider: ${{ secrets.GCP_WIF_PROVIDER }} service_account: ${{ secrets.GCP_SERVICE_ACCOUNT }} diff --git a/.github/skills/internal-github-actions/references/caching-and-artifacts.md b/.github/skills/internal-github-actions/references/caching-and-artifacts.md new file mode 100644 index 0000000..2de4a6a --- /dev/null +++ b/.github/skills/internal-github-actions/references/caching-and-artifacts.md @@ -0,0 +1,54 @@ +# Caching and Artifacts + +Use cache for reproducible dependency reuse across runs. Use artifacts for explicit handoffs or human review between jobs. + +| Need | Prefer | Why | +| --- | --- | --- | +| Restore dependency state across runs | Cache | Keyed reuse is automatic and disposable | +| Pass reviewed output to another job or stage | Artifact | Named handoff with retention and download semantics | +| Keep logs or plans for human inspection | Artifact | Retention and download are part of the contract | + +## Deterministic cache example + +```yaml +- name: Cache npm data + # actions/cache@v5.0.4 + # https://github.com/actions/cache/releases/tag/v5.0.4 + uses: actions/cache@668228422ae6a00e4ad889ee87cd7109ec5666a7 + with: + path: ~/.npm + key: npm-${{ runner.os }}-${{ hashFiles('**/package-lock.json') }} + restore-keys: | + npm-${{ runner.os }}- +``` + +Use stable inputs like lockfiles, tool versions, or build configuration. Do not use timestamps or raw `github.run_id` in cache keys. + +## Artifact handoff example + +```yaml +- name: Upload reviewed plan + # actions/upload-artifact@v7.0.1 + # https://github.com/actions/upload-artifact/releases/tag/v7.0.1 + uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a + with: + name: terraform-plan-${{ github.sha }} + path: plan.tfplan + retention-days: 7 + if-no-files-found: error + +- name: Download reviewed plan + # actions/download-artifact@v8.0.1 + # https://github.com/actions/download-artifact/releases/tag/v8.0.1 + uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c + with: + name: terraform-plan-${{ github.sha }} + path: ./artifacts +``` + +Keep artifact usage explicit: + +- name artifacts for the exact handoff they represent +- set `retention-days` deliberately instead of inheriting defaults +- upload only reviewed or reusable outputs, not hidden mutable state +- prefer job outputs or reusable workflows when the handoff is small and immediate diff --git a/.github/skills/internal-github-actions/references/reusable-workflow-template.md b/.github/skills/internal-github-actions/references/reusable-workflow-template.md new file mode 100644 index 0000000..e241b61 --- /dev/null +++ b/.github/skills/internal-github-actions/references/reusable-workflow-template.md @@ -0,0 +1,57 @@ +# Reusable Workflow Template + +Use this starter when multiple workflows in one repository need the same job orchestration through `workflow_call`. + +```yaml +name: reusable-validate + +on: + workflow_call: + inputs: + working-directory: + description: Directory to validate + required: false + type: string + default: . + outputs: + normalized-working-directory: + description: Sanitized directory forwarded to caller jobs + value: ${{ jobs.validate.outputs.working-directory }} + +permissions: + contents: read + +jobs: + validate: + runs-on: ubuntu-latest + timeout-minutes: 15 + outputs: + working-directory: ${{ steps.normalize.outputs.working-directory }} + steps: + - name: Check out repository + # actions/checkout@v6.0.2 + # https://github.com/actions/checkout/releases/tag/v6.0.2 + uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd + - name: Normalize caller input + id: normalize + env: + WORKING_DIRECTORY: ${{ inputs.working-directory }} + run: | + set -euo pipefail + if [[ ! -d "$WORKING_DIRECTORY" ]]; then + echo "working-directory does not exist: $WORKING_DIRECTORY" >&2 + exit 1 + fi + echo "working-directory=$WORKING_DIRECTORY" >> "$GITHUB_OUTPUT" + - name: Run validation + working-directory: ${{ steps.normalize.outputs.working-directory }} + run: make test +``` + +Use a reusable workflow when: + +- the unit of reuse is one or more jobs rather than a step sequence +- the callee needs its own `permissions`, `runs-on`, or `concurrency` +- the caller should pass typed inputs or consume workflow outputs + +Prefer a composite action instead when the reusable unit is step-level logic inside a job. diff --git a/.github/skills/internal-github-actions/references/reuse-decision-tree.md b/.github/skills/internal-github-actions/references/reuse-decision-tree.md new file mode 100644 index 0000000..e11cbd8 --- /dev/null +++ b/.github/skills/internal-github-actions/references/reuse-decision-tree.md @@ -0,0 +1,20 @@ +# Reuse Decision Tree + +Start with the smallest reusable unit that solves the repetition. + +1. Is the repeated logic only one or two adjacent steps in a single workflow? + Keep it inline. +2. Is the repeated logic mostly shell or language-specific commands that workflows call the same way? + Extract a repository script and keep the workflow thin. +3. Does the reusable unit need its own jobs, runners, permissions, or concurrency? + Use a reusable workflow. +4. Does the reusable unit stay inside one job and need to expose step outputs? + Use a composite action. +5. Does the candidate reuse span several repositories or carry a compatibility contract for many callers? + Prefer a composite action with documented inputs, outputs, and versioning. + +Sanity checks before extracting: + +- if the abstraction saves fewer than a couple of real call sites, keep it local +- if the reuse choice changes trust boundaries or secret flow, prefer the more explicit contract +- if reusable workflow and composite action both look plausible, default to the unit of reuse: jobs versus steps diff --git a/.github/skills/internal-github-actions/references/security-hardening-checklist.md b/.github/skills/internal-github-actions/references/security-hardening-checklist.md new file mode 100644 index 0000000..7fab48e --- /dev/null +++ b/.github/skills/internal-github-actions/references/security-hardening-checklist.md @@ -0,0 +1,26 @@ +# Workflow Security Hardening Checklist + +Use this checklist when a workflow touches deployment, secrets, self-hosted runners, or untrusted contribution paths. + +## Event and token boundary + +- [ ] Event choice avoids `pull_request_target` for untrusted code. +- [ ] `permissions` are declared explicitly and no job uses `write-all`. +- [ ] `id-token: write` appears only on jobs that actually use OIDC. +- [ ] `workflow_dispatch` inputs are validated before they influence shell or deployment logic. + +## Supply chain and execution + +- [ ] Third-party actions are pinned to full SHAs with adjacent release comments and URLs. +- [ ] `docker://` references and workflow container images are pinned by digest. +- [ ] Shell steps quote variables and avoid evaluating user-controlled input. +- [ ] Caches and artifacts do not smuggle unreviewed build output into privileged jobs. + +## Runner and secret posture + +- [ ] Self-hosted runners are restricted to trusted repositories or runner groups. +- [ ] Secrets are scoped to the narrowest job or protected environment that needs them. +- [ ] Forked or untrusted jobs do not gain access to deployment credentials. +- [ ] Production deploys use protected environments with reviewers. + +Route organization-wide rulesets, runner governance, and GitHub App policy questions to `.github/skills/internal-github-governance/SKILL.md`. diff --git a/.github/skills/internal-github-actions/references/workflow-example.md b/.github/skills/internal-github-actions/references/workflow-example.md index 41c1166..fc7d888 100644 --- a/.github/skills/internal-github-actions/references/workflow-example.md +++ b/.github/skills/internal-github-actions/references/workflow-example.md @@ -1,17 +1,79 @@ -# Minimal Workflow Example +# Workflow Archetype: Validated Manual Deploy + +Use this starter when a workflow needs a manual entrypoint, input validation, and protected deployment gates without turning into a one-size-fits-all catalog. ```yaml -name: CI -on: [pull_request] +name: deploy-service + +on: + workflow_dispatch: + inputs: + target: + description: Deployment target + required: true + type: choice + options: + - staging + - production permissions: contents: read - id-token: write + deployments: write + +concurrency: + group: deploy-${{ github.ref }}-${{ inputs.target }} + cancel-in-progress: false jobs: validate: runs-on: ubuntu-latest + timeout-minutes: 10 + outputs: + target: ${{ steps.target.outputs.value }} + steps: + - name: Check out repository + # actions/checkout@v6.0.2 + # https://github.com/actions/checkout/releases/tag/v6.0.2 + uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd + - name: Validate dispatch input + id: target + env: + TARGET: ${{ inputs.target }} + run: | + set -euo pipefail + case "$TARGET" in + staging|production) ;; + *) + echo "Unsupported target: $TARGET" >&2 + exit 1 + ;; + esac + echo "value=$TARGET" >> "$GITHUB_OUTPUT" + - name: Run pre-deploy checks + run: make test + + deploy: + needs: validate + if: ${{ needs.validate.outputs.target != '' }} + runs-on: ubuntu-latest + timeout-minutes: 20 + environment: + name: ${{ needs.validate.outputs.target }} steps: - - uses: actions/checkout@ - - run: terraform fmt -check -recursive + - name: Check out repository + # actions/checkout@v6.0.2 + # https://github.com/actions/checkout/releases/tag/v6.0.2 + uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd + - name: Deploy + env: + TARGET: ${{ needs.validate.outputs.target }} + run: | + set -euo pipefail + ./scripts/deploy.sh "$TARGET" ``` + +Adapt this archetype by: + +- adding `workflow_call` when the same deploy orchestration is reused inside the repository +- introducing artifacts with explicit `retention-days` only when a reviewed handoff is required +- moving repeated step logic into a composite action instead of expanding the deploy job inline diff --git a/.github/skills/internal-github-actions/references/workflow-patterns-catalog.md b/.github/skills/internal-github-actions/references/workflow-patterns-catalog.md new file mode 100644 index 0000000..72bb176 --- /dev/null +++ b/.github/skills/internal-github-actions/references/workflow-patterns-catalog.md @@ -0,0 +1,58 @@ +# Workflow Patterns Catalog + +Use this catalog when the baseline example is too small but a repository-specific one-off workflow would still repeat known shapes. + +## Matrix validation + +```yaml +strategy: + fail-fast: false + matrix: + os: [ubuntu-latest, macos-latest] + runtime: ['18', '20'] +``` + +- Use when the same checks must run across versions or platforms. +- Keep the steps identical across matrix entries and vary only the inputs. +- Add `max-parallel` when runners or external quotas are limited. + +## Reusable workflow delegation + +```yaml +jobs: + validate: + uses: ./.github/workflows/reusable-validate.yml + with: + working-directory: services/api +``` + +- Use when the reusable unit is one or more jobs. +- Keep caller inputs small and typed. +- Expose only the outputs that downstream jobs actually need. + +## Environment-gated manual deploy + +```yaml +concurrency: + group: deploy-${{ github.ref }}-${{ inputs.target }} + cancel-in-progress: false + +environment: + name: production +``` + +- Validate the target in an earlier job and deploy only from the normalized value. +- Keep deployment permissions and secrets scoped to the deploy job. +- Use protected environments instead of bespoke approval logic in shell steps. + +## Scheduled housekeeping + +```yaml +on: + schedule: + - cron: '17 3 * * 1' +``` + +- Make scheduled jobs idempotent and safe to rerun. +- Add explicit `timeout-minutes` and narrow `permissions`. +- Avoid destructive mutations unless the workflow revalidates state and emits clear logs or notifications. From 54683cdab798aba53326b384311f1a7c400fbe97 Mon Sep 17 00:00:00 2001 From: Diego Mauricio Lagos Date: Sun, 12 Apr 2026 13:09:25 +0200 Subject: [PATCH 03/10] refactor: update Copilot and GitHub Actions instructions for clarity and lean guidance --- .github/copilot-instructions.md | 3 +++ .github/instructions/internal-github-actions.instructions.md | 1 + LESSONS.md | 5 ++--- 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index 0224e25..11b4bb1 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -21,6 +21,7 @@ You are an expert software and platform engineer. Protect correctness, security, - `internal-sync-*` assets stay sync-specific and must not become second canonical homes for repository-wide policy. - When repository-wide defaults change, update `AGENTS.md` first, then refresh this file, then realign narrower governance assets that reference the change. +- When source-managed guidance from this repository is mirrored into consumer repositories, phrase source-side rules conditionally so they remain true in targets and do not imply that the target repository is the source of truth. - `.github/local-copilot-overrides.md` may override synced defaults from `AGENTS.md` or this file only when the exception makes the conflict, scope, reason, and required disclosure explicit. - If `.github/local-copilot-overrides.md` exists but declares no active overrides, keep following the synced baseline. - When following a local override instead of the synced baseline, say that a consumer-local exception is in effect and cite `.github/local-copilot-overrides.md`. @@ -59,7 +60,9 @@ You are an expert software and platform engineer. Protect correctness, security, - Keep business logic separated from I/O and infrastructure concerns. - Apply only the instruction files relevant to the files being changed. - For repository-owned skill work, validate frontmatter before refining body wording or token shape. +- For source-side repository-owned standards work that deepens parallel skill families, stage planning in `tmp/superpowers/`, make anti-scope explicit, and close parity gaps in existing `Common mistakes`, `Validation`, and current reference depth before adding optional new skills, validators, or shared assets. - Keep repository-owned skill `description:` lines trigger-first, and do not rewrite a working route during token optimization unless improving retrieval is the explicit goal. +- For provider-specific cloud skills, keep guidance provider-native and omit cross-cloud comparison or provider-selection content when provider choice is already upstream of skill activation. - Prefer `references/` over new `scripts/` for static checklists, lookup tables, and starter templates; add scripts only when the workflow is deterministic, repeated, and execution-heavy. - Keep Python tests under the repository-root `tests/` tree with mirrored source paths, and make Bash wrappers runnable with internal defaults plus optional overrides. - Run the applicable validation that actually exists for the files you changed. diff --git a/.github/instructions/internal-github-actions.instructions.md b/.github/instructions/internal-github-actions.instructions.md index ec885bd..8866962 100644 --- a/.github/instructions/internal-github-actions.instructions.md +++ b/.github/instructions/internal-github-actions.instructions.md @@ -30,4 +30,5 @@ applyTo: "**/workflows/**,**/actions/**/action.y*ml" ## Use the skill for deeper guidance - Load `.github/skills/internal-github-actions/SKILL.md` for workflow-vs-reusable-vs-composite decisions, reusable workflow templates, cache and artifact patterns, and workflow hardening checklists. +- Keep this instruction lean because it is part of the source-managed baseline mirrored into consumer repositories; add new always-on GitHub Actions depth only when downstream reuse justifies that sync cost, and keep advanced patterns, examples, and decision support in the skill references. - Keep this instruction as the auto-loaded baseline; keep authoring depth and examples in the skill. diff --git a/LESSONS.md b/LESSONS.md index 8e2f357..1d978d9 100644 --- a/LESSONS.md +++ b/LESSONS.md @@ -18,6 +18,5 @@ This file retains durable lessons discovered while completing tasks in this repo | Date | Lesson | Status | Intended canonical target | | --- | --- | --- | --- | -| 2026-04-12 | For repo-owned standards work that deepens parallel skill sets, split planning into staged working documents under `tmp/superpowers/`, make anti-scope explicit, and sequence in-place parity work (`Common mistakes`, `Validation`, existing reference depth) before optional new skills, validators, or shared assets. | Pending | .github/copilot-instructions.md | -| 2026-04-12 | For provider-specific cloud skills, keep guidance provider-native and omit cross-cloud comparison or cloud-selection content when provider choice is already upstream of skill activation. | Pending | .github/copilot-instructions.md | -| 2026-04-12 | In source-of-truth guidance repositories, judge new GitHub Actions depth against downstream reuse needs, but keep auto-loaded instructions lean and move advanced patterns into skill references. | Pending | .github/copilot-instructions.md | + +No pending lessons currently. From a30463da5cfd523942331c6638f27f67651114e6 Mon Sep 17 00:00:00 2001 From: Diego Mauricio Lagos Date: Sun, 12 Apr 2026 13:23:55 +0200 Subject: [PATCH 04/10] refactor: enhance .editorconfig for improved coding style consistency and clarity --- .editorconfig | 63 +++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 53 insertions(+), 10 deletions(-) diff --git a/.editorconfig b/.editorconfig index 88cb251..3458b55 100644 --- a/.editorconfig +++ b/.editorconfig @@ -1,30 +1,73 @@ -# EditorConfig is awesome: http://EditorConfig.org -# Uses editorconfig to maintain consistent coding styles +# EditorConfig is awesome: https://editorconfig.org/ +# Keep shared editing defaults predictable across source, tests, and synced repos. -# top-most EditorConfig file root = true -# Unix-style newlines with a newline ending every file [*] charset = utf-8 end_of_line = lf -indent_size = 2 indent_style = space +indent_size = 2 insert_final_newline = true -max_line_length = 80 trim_trailing_whitespace = true -[*.{tf,tfvars}] +[*.{py,pyi}] +indent_size = 4 +max_line_length = 88 + +[*.{sh,bash,zsh}] indent_size = 2 -indent_style = space +max_line_length = 100 + +[*.{tf,tfvars,hcl}] +indent_size = 2 +max_line_length = 100 + +[*.{yml,yaml}] +indent_size = 2 +max_line_length = 120 + +[*.{json,jsonc,toml}] +indent_size = 2 +max_line_length = 100 + +[*.{cfg,conf,ini}] +indent_size = 2 +max_line_length = 100 + +[.editorconfig] +indent_size = 2 +max_line_length = 100 [*.md] max_line_length = 0 trim_trailing_whitespace = false [Makefile] -tab_width = 2 indent_style = tab +indent_size = 4 +tab_width = 4 -[COMMIT_EDITMSG] +[*.mk] +indent_style = tab +indent_size = 4 +tab_width = 4 + +[.gitignore] +max_line_length = 0 + +[.gitattributes] +max_line_length = 0 + +[.env] max_line_length = 0 +trim_trailing_whitespace = false + +[.env.*] +max_line_length = 0 +trim_trailing_whitespace = false + +[COMMIT_EDITMSG] +insert_final_newline = false +max_line_length = 72 +trim_trailing_whitespace = false From 1aa610dc1fa679330933e4f85aefc7e8fa783a1c Mon Sep 17 00:00:00 2001 From: Diego Mauricio Lagos Date: Sun, 12 Apr 2026 13:24:00 +0200 Subject: [PATCH 05/10] refactor: update isort configuration to use black profile for Python files --- .pre-commit-config.yaml | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index c767b1b..fd482dc 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -71,15 +71,18 @@ repos: exclude: '^\.github/skills/' # Python + - repo: https://github.com/pycqa/isort + rev: 8.0.1 # isort 8.0.1 + hooks: + - id: isort + args: + - --profile + - black + files: ^(scripts|tests)/.*\.py$ + - repo: https://github.com/psf/black rev: 26.3.1 # black 26.3.1 hooks: - id: black language_version: python3 files: ^(scripts|tests)/.*\.py$ - - - repo: https://github.com/pycqa/isort - rev: 8.0.1 # isort 8.0.1 - hooks: - - id: isort - files: ^(scripts|tests)/.*\.py$ From 0bcd08591f34c8bcfae5f698b4393523e29b18ee Mon Sep 17 00:00:00 2001 From: Diego Mauricio Lagos Date: Sun, 12 Apr 2026 13:24:08 +0200 Subject: [PATCH 06/10] refactor: align repository hygiene files and enhance sync contract for shared configurations --- .github/CHANGELOG.md | 2 + ...-global-copilot-configs-into-repo.agent.md | 4 +- .github/scripts/lib/shared.py | 3 + .github/scripts/lib/syncing.py | 3 + .../SKILL.md | 4 +- .../references/sync-contract.md | 5 +- .../github/scripts/lib/test_fingerprinting.py | 12 +++- .../scripts/lib/test_internal_skills.py | 9 ++- tests/github/scripts/lib/test_shared.py | 18 +++-- tests/test_sync_and_token_risks.py | 71 +++++++++++++++++++ 10 files changed, 116 insertions(+), 15 deletions(-) diff --git a/.github/CHANGELOG.md b/.github/CHANGELOG.md index 3d581cd..7dc9784 100644 --- a/.github/CHANGELOG.md +++ b/.github/CHANGELOG.md @@ -7,6 +7,8 @@ Use this format for new updates: - Include file/path scope when useful. ## 2026-04-12 +- Aligned `.pre-commit-config.yaml` and expanded `.editorconfig` with file-type defaults for Python, shell, Terraform/HCL, YAML, JSON/TOML, Markdown, Make, and local config files so the repo and synced consumers get a practical editor baseline without the formatter ping-pong that left `pre-commit` failing with no visible git diff. +- Expanded the cross-repository sync baseline to include `.editorconfig`, `.pre-commit-config.yaml`, and `.github/workflows/terraform-pre-commit.yml`, then updated the sync agent/skill contract and sync planner tests to keep that scope explicit and narrow. - Renamed the workflow skill from `internal-cicd-workflow` to `internal-github-actions`, renamed `internal-github-composite-action` to `internal-github-action-composite`, and realigned the GitHub Actions instructions so the umbrella instruction is the family baseline while the composite instruction now carries only composite-specific delta guidance. - Added a retained-learning governance contract: root `AGENTS.md` now defines repository-root `LESSONS.md` as a non-canonical ledger for durable lessons learned during completed tasks, `.github/copilot-instructions.md` projects the same behavior into native Copilot flows, `INTERNAL_CONTRACT.md` source-governs the invariant, and the new `LESSONS.md` file records retained lessons with canonical-owner pointers. - Slimmed the retained-learning section in root `AGENTS.md` so the bridge keeps only strategic ownership and boundary language while `.github/copilot-instructions.md` remains the detailed operational projection. diff --git a/.github/agents/internal-sync-global-copilot-configs-into-repo.agent.md b/.github/agents/internal-sync-global-copilot-configs-into-repo.agent.md index 7e58f93..8f7fe84 100644 --- a/.github/agents/internal-sync-global-copilot-configs-into-repo.agent.md +++ b/.github/agents/internal-sync-global-copilot-configs-into-repo.agent.md @@ -1,6 +1,6 @@ --- name: internal-sync-global-copilot-configs-into-repo -description: Use this agent when aligning a consumer repository to the managed GitHub Copilot baseline from this standards repository. Keep the paired sync skill as the reusable sync-procedure owner, preserve target `local-*` extensions plus any `.github/local-copilot-overrides.md` layer, and keep root-guidance files aligned to their separate ownership layers. +description: Use this agent when aligning a consumer repository to the managed GitHub Copilot baseline from this standards repository, plus the explicitly shared repository-hygiene files declared by the paired sync skill. Keep the paired sync skill as the reusable sync-procedure owner, preserve target `local-*` extensions plus any `.github/local-copilot-overrides.md` layer, and keep root-guidance files aligned to their separate ownership layers. tools: ["read", "edit", "search", "execute", "web", "agent"] agents: [] --- @@ -42,7 +42,7 @@ Treat this agent plus `.github/skills/internal-agent-sync-global-copilot-configs - Start in `plan` by default. Move to `apply` only on explicit request and only when the plan is conflict-safe. - Keep target assumptions narrow and let the paired skill own the mirrored-scope and plan-file details. - Preserve target `local-*` assets plus any consumer-owned `.github/local-copilot-overrides.md` file, exclude repository-owned `internal-sync-*` resources from mirroring, and keep root-guidance files layered according to the paired skill contract. -- Sync GitHub Copilot assets only unless the user explicitly expands scope. +- Sync only the managed cross-repository baseline declared by the paired skill contract; do not expand beyond that scope unless the user explicitly asks for more. - Do not restate reusable sync procedure in this agent; when the contract drifts, update the paired skill first and then realign this agent. ## Routing diff --git a/.github/scripts/lib/shared.py b/.github/scripts/lib/shared.py index 6f37207..646a839 100644 --- a/.github/scripts/lib/shared.py +++ b/.github/scripts/lib/shared.py @@ -26,6 +26,8 @@ CONSUMER_SYNC_EXCLUDED_PREFIX = "internal-sync-" MANAGED_ROOT_FILES = ( "AGENTS.md", + ".editorconfig", + ".pre-commit-config.yaml", ".github/copilot-instructions.md", ".github/copilot-code-review-instructions.md", ".github/copilot-commit-message-instructions.md", @@ -33,6 +35,7 @@ ".github/DEPRECATION.md", ".github/repo-profiles.yml", ) +MANAGED_WORKFLOW_FILES = (".github/workflows/terraform-pre-commit.yml",) LOCAL_COPILOT_OVERRIDES_PATH = ".github/local-copilot-overrides.md" INVENTORY_PATH = ".github/INVENTORY.md" diff --git a/.github/scripts/lib/syncing.py b/.github/scripts/lib/syncing.py index 6038bc6..1d5f9d0 100644 --- a/.github/scripts/lib/syncing.py +++ b/.github/scripts/lib/syncing.py @@ -11,6 +11,7 @@ INVENTORY_PATH, LOCAL_COPILOT_OVERRIDES_PATH, MANAGED_ROOT_FILES, + MANAGED_WORKFLOW_FILES, SyncOperation, SyncPlan, action_sort_key, @@ -197,6 +198,7 @@ def build_sync_plan(source_root: Path, target_root: Path) -> SyncPlan: def discover_source_sync_files(root: Path) -> set[str]: files = {relative_path for relative_path in MANAGED_ROOT_FILES if (root / relative_path).exists()} + files.update(relative_path for relative_path in MANAGED_WORKFLOW_FILES if (root / relative_path).exists()) files.update(all_files_under(root, ".github/agents")) files.update(all_files_under(root, ".github/instructions")) files.update(all_files_under(root, MANAGED_SKILL_DIR)) @@ -209,6 +211,7 @@ def discover_source_sync_files(root: Path) -> set[str]: def discover_target_managed_files(root: Path) -> set[str]: files = {relative_path for relative_path in MANAGED_ROOT_FILES if (root / relative_path).exists()} + files.update(relative_path for relative_path in MANAGED_WORKFLOW_FILES if (root / relative_path).exists()) if (root / INVENTORY_PATH).exists(): files.add(INVENTORY_PATH) files.update(all_files_under(root, ".github/agents")) diff --git a/.github/skills/internal-agent-sync-global-copilot-configs-into-repo/SKILL.md b/.github/skills/internal-agent-sync-global-copilot-configs-into-repo/SKILL.md index 8cd770d..d407c16 100644 --- a/.github/skills/internal-agent-sync-global-copilot-configs-into-repo/SKILL.md +++ b/.github/skills/internal-agent-sync-global-copilot-configs-into-repo/SKILL.md @@ -1,6 +1,6 @@ --- name: internal-agent-sync-global-copilot-configs-into-repo -description: Use when aligning a consumer repository to this repository's managed GitHub Copilot baseline, including mirror planning, apply runs, drift checks, and preservation of target `local-*` assets plus any `.github/local-copilot-overrides.md` layer. +description: Use when aligning a consumer repository to this repository's managed GitHub Copilot baseline plus the explicitly shared repository-hygiene files, including mirror planning, apply runs, drift checks, and preservation of target `local-*` assets plus any `.github/local-copilot-overrides.md` layer. --- # Internal Agent Sync Global Copilot Configs Into Repo @@ -15,6 +15,7 @@ The paired agent should not restate default mode handling, preserved `local-*` b - Align a consumer repository with the managed GitHub Copilot baseline from this repository. - Refresh target `AGENTS.md`, `.github/copilot-instructions.md`, and `.github/INVENTORY.md` to the current bridge model after mirroring. +- Refresh shared repository-hygiene files that are part of the managed sync baseline, currently `.editorconfig`, `.pre-commit-config.yaml`, and `.github/workflows/terraform-pre-commit.yml`. - Preserve or review a target `.github/local-copilot-overrides.md` file that locally overrides the synced baseline. - Run or interpret `.github/scripts/sync_copilot_catalog.sh` or `.github/scripts/sync_copilot_catalog.py`. - Audit source-target drift before or after a sync. @@ -27,6 +28,7 @@ The paired agent should not restate default mode handling, preserved `local-*` b - Exclude source resources named `internal-sync-*` from consumer mirroring and remove any target copies of those resources during `apply`. - Do not mirror a source `.github/local-copilot-overrides.md`; it stays consumer-owned even when the source repository has one. - Keep root guidance layered: `AGENTS.md` is the bridge, `.github/copilot-instructions.md` is the repo-wide projection, `.github/local-copilot-overrides.md` is the consumer-local exception layer, and `.github/INVENTORY.md` is the live catalog. +- Mirror only the explicitly shared repository-hygiene files declared in `references/sync-contract.md`; do not widen workflow or root-file mirroring implicitly. - Ensure the target repository `.gitignore` contains an ignore rule for `tmp/superpowers/`. - Prefer the bundled sync automation when it matches the requested mode instead of re-deriving the workflow manually. - Keep detailed operating rules in `references/sync-contract.md` instead of re-expanding them in the agent body. diff --git a/.github/skills/internal-agent-sync-global-copilot-configs-into-repo/references/sync-contract.md b/.github/skills/internal-agent-sync-global-copilot-configs-into-repo/references/sync-contract.md index 99118d9..889189c 100644 --- a/.github/skills/internal-agent-sync-global-copilot-configs-into-repo/references/sync-contract.md +++ b/.github/skills/internal-agent-sync-global-copilot-configs-into-repo/references/sync-contract.md @@ -7,17 +7,20 @@ Use this reference when the paired agent or this skill needs the exact sync rule Mirror these source-managed paths into the consumer repository: - `AGENTS.md` +- `.editorconfig` +- `.pre-commit-config.yaml` - `.github/copilot-instructions.md` - `.github/copilot-code-review-instructions.md` - `.github/copilot-commit-message-instructions.md` - `.github/security-baseline.md` - `.github/DEPRECATION.md` - `.github/repo-profiles.yml` +- `.github/workflows/terraform-pre-commit.yml` - `.github/agents/**` - `.github/instructions/**` - `.github/skills/**`, including bundled `references/`, `assets/`, `scripts/`, `agents/`, and licenses -Do not sync `README.md`, changelogs, workflows, templates, or bootstrap helpers unless the user explicitly expands scope. +Do not sync `README.md`, changelogs, other workflows, templates, or bootstrap helpers unless the user explicitly expands scope. Do not sync consumer-facing resources whose file or directory name starts with `internal-sync-`; those remain source-only operational controls for the standards repository. ## Target Rules diff --git a/tests/github/scripts/lib/test_fingerprinting.py b/tests/github/scripts/lib/test_fingerprinting.py index 13b8f38..564c391 100644 --- a/tests/github/scripts/lib/test_fingerprinting.py +++ b/tests/github/scripts/lib/test_fingerprinting.py @@ -3,9 +3,15 @@ import json from pathlib import Path -from lib.fingerprinting import (build_fingerprint, build_manifest, - collect_files, diff_manifests, load_manifest, - normalize_content, render_diff_text) +from lib.fingerprinting import ( + build_fingerprint, + build_manifest, + collect_files, + diff_manifests, + load_manifest, + normalize_content, + render_diff_text, +) def write_file(path: Path, content: str) -> None: diff --git a/tests/github/scripts/lib/test_internal_skills.py b/tests/github/scripts/lib/test_internal_skills.py index 99a9ff9..82fe0a2 100644 --- a/tests/github/scripts/lib/test_internal_skills.py +++ b/tests/github/scripts/lib/test_internal_skills.py @@ -2,9 +2,12 @@ from pathlib import Path -from lib.internal_skills import (detect_internal_skill_findings, - markdown_targets, resolve_reference, - validate_internal_skill) +from lib.internal_skills import ( + detect_internal_skill_findings, + markdown_targets, + resolve_reference, + validate_internal_skill, +) def write_file(path: Path, content: str) -> None: diff --git a/tests/github/scripts/lib/test_shared.py b/tests/github/scripts/lib/test_shared.py index 0d031e3..bc1f139 100644 --- a/tests/github/scripts/lib/test_shared.py +++ b/tests/github/scripts/lib/test_shared.py @@ -3,11 +3,19 @@ from pathlib import Path import pytest -from lib.shared import (Finding, all_files_under, dedupe_preserve_order, - find_repo_root, finding_sort_key, - is_consumer_sync_excluded_path, is_local_asset, - path_list, resolve_markdown_target, split_frontmatter, - strip_frontmatter) +from lib.shared import ( + Finding, + all_files_under, + dedupe_preserve_order, + find_repo_root, + finding_sort_key, + is_consumer_sync_excluded_path, + is_local_asset, + path_list, + resolve_markdown_target, + split_frontmatter, + strip_frontmatter, +) def write_file(path: Path, content: str) -> None: diff --git a/tests/test_sync_and_token_risks.py b/tests/test_sync_and_token_risks.py index 9fabc95..4b77b92 100644 --- a/tests/test_sync_and_token_risks.py +++ b/tests/test_sync_and_token_risks.py @@ -175,6 +175,77 @@ def test_build_sync_plan_accepts_existing_tmp_superpowers_gitignore_entry( assert plan.generated_gitignore == "node_modules/\ntmp/superpowers/\n" +def test_build_sync_plan_includes_shared_repo_hygiene_files_only( + tmp_path: Path, +) -> None: + source_root = tmp_path / "source" + target_root = tmp_path / "target" + + write_file(source_root / "AGENTS.md", "# AGENTS\nsource\n") + write_file(source_root / ".github/copilot-instructions.md", "# Copilot\nsource\n") + write_file(source_root / ".editorconfig", "root = true\n") + write_file(source_root / ".pre-commit-config.yaml", "repos: []\n") + write_file( + source_root / ".github/workflows/terraform-pre-commit.yml", + "name: terraform-pre-commit\n", + ) + write_file(target_root / "AGENTS.md", "# AGENTS\ntarget\n") + write_file(target_root / ".github/copilot-instructions.md", "# Copilot\ntarget\n") + write_file(target_root / ".editorconfig", "root = false\n") + write_file(target_root / ".pre-commit-config.yaml", "repos:\n - repo: old\n") + write_file( + target_root / ".github/workflows/terraform-pre-commit.yml", + "name: old-terraform-pre-commit\n", + ) + write_file(target_root / ".github/workflows/local-only.yml", "name: local-only\n") + + plan = build_sync_plan(source_root, target_root) + actions = {(operation.action, operation.path) for operation in plan.operations} + planned_paths = {operation.path for operation in plan.operations} + + assert ("update", ".editorconfig") in actions + assert ("update", ".pre-commit-config.yaml") in actions + assert ("update", ".github/workflows/terraform-pre-commit.yml") in actions + assert ".github/workflows/local-only.yml" not in planned_paths + + +def test_apply_sync_plan_mirrors_shared_repo_hygiene_files( + tmp_path: Path, +) -> None: + source_root = tmp_path / "source" + target_root = tmp_path / "target" + + write_file(source_root / "AGENTS.md", "# AGENTS\nsource\n") + write_file(source_root / ".github/copilot-instructions.md", "# Copilot\nsource\n") + write_file(source_root / ".editorconfig", "root = true\n") + write_file(source_root / ".pre-commit-config.yaml", "repos: []\n") + write_file( + source_root / ".github/workflows/terraform-pre-commit.yml", + "name: terraform-pre-commit\n", + ) + write_file(target_root / "AGENTS.md", "# AGENTS\ntarget\n") + write_file(target_root / ".github/copilot-instructions.md", "# Copilot\ntarget\n") + write_file(target_root / ".editorconfig", "root = false\n") + write_file(target_root / ".pre-commit-config.yaml", "repos:\n - repo: old\n") + write_file( + target_root / ".github/workflows/terraform-pre-commit.yml", + "name: old-terraform-pre-commit\n", + ) + + plan = build_sync_plan(source_root, target_root) + apply_sync_plan(plan) + + assert (target_root / ".editorconfig").read_text( + encoding="utf-8" + ) == "root = true\n" + assert (target_root / ".pre-commit-config.yaml").read_text( + encoding="utf-8" + ) == "repos: []\n" + assert (target_root / ".github/workflows/terraform-pre-commit.yml").read_text( + encoding="utf-8" + ) == "name: terraform-pre-commit\n" + + def test_detect_token_risks_reports_bridge_overlap(tmp_path: Path) -> None: repeated_lines = "\n".join( [ From 5cfe9b64fcacf8c5ed6ffd8048f3217abd228bd0 Mon Sep 17 00:00:00 2001 From: Diego Mauricio Lagos Date: Sun, 12 Apr 2026 13:42:14 +0200 Subject: [PATCH 07/10] refactor: enhance pre-commit workflow with caching and directory setup --- .github/workflows/terraform-pre-commit.yml | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/.github/workflows/terraform-pre-commit.yml b/.github/workflows/terraform-pre-commit.yml index 662b12c..66255bb 100644 --- a/.github/workflows/terraform-pre-commit.yml +++ b/.github/workflows/terraform-pre-commit.yml @@ -37,6 +37,7 @@ jobs: timeout-minutes: 30 env: # Local-only validation: Terraform must not reach remote backends. + PRE_COMMIT_CACHE_DIR: ${{ runner.temp }}/pre-commit-cache TF_INPUT: "0" TF_IN_AUTOMATION: "1" defaults: @@ -49,6 +50,18 @@ jobs: with: persist-credentials: false + - name: Restore pre-commit cache + # actions/cache@v5.0.4 + # https://github.com/actions/cache/releases/tag/v5.0.4 + uses: actions/cache@668228422ae6a00e4ad889ee87cd7109ec5666a7 + with: + path: ${{ env.PRE_COMMIT_CACHE_DIR }} + key: pre-commit-${{ runner.os }}-v1-${{ hashFiles('.pre-commit-config.yaml') }} + + - name: Ensure pre-commit cache directory + run: | + mkdir -p "${PRE_COMMIT_CACHE_DIR}" + - name: Pull pre-commit container image run: | docker pull "${PRE_COMMIT_IMAGE}" @@ -61,8 +74,10 @@ jobs: run: | docker run --rm \ -e USERID="$(id -u):$(id -g)" \ + -e PRE_COMMIT_HOME=/pre-commit-cache \ -e TF_INPUT \ -e TF_IN_AUTOMATION \ + -v "${PRE_COMMIT_CACHE_DIR}:/pre-commit-cache" \ -v "${PWD}:/lint" \ -w /lint \ "${PRE_COMMIT_IMAGE}" run \ From 7f60da322417526ab65fa0d8b1bcb78d2c9b51e9 Mon Sep 17 00:00:00 2001 From: Diego Mauricio Lagos Date: Sun, 12 Apr 2026 14:05:07 +0200 Subject: [PATCH 08/10] refactor: standardize pre-commit cache directory usage across workflow --- .github/workflows/terraform-pre-commit.yml | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/.github/workflows/terraform-pre-commit.yml b/.github/workflows/terraform-pre-commit.yml index 66255bb..c4b0ef2 100644 --- a/.github/workflows/terraform-pre-commit.yml +++ b/.github/workflows/terraform-pre-commit.yml @@ -37,7 +37,6 @@ jobs: timeout-minutes: 30 env: # Local-only validation: Terraform must not reach remote backends. - PRE_COMMIT_CACHE_DIR: ${{ runner.temp }}/pre-commit-cache TF_INPUT: "0" TF_IN_AUTOMATION: "1" defaults: @@ -55,10 +54,12 @@ jobs: # https://github.com/actions/cache/releases/tag/v5.0.4 uses: actions/cache@668228422ae6a00e4ad889ee87cd7109ec5666a7 with: - path: ${{ env.PRE_COMMIT_CACHE_DIR }} + path: ${{ runner.temp }}/pre-commit-cache key: pre-commit-${{ runner.os }}-v1-${{ hashFiles('.pre-commit-config.yaml') }} - name: Ensure pre-commit cache directory + env: + PRE_COMMIT_CACHE_DIR: ${{ runner.temp }}/pre-commit-cache run: | mkdir -p "${PRE_COMMIT_CACHE_DIR}" @@ -71,6 +72,8 @@ jobs: docker run --rm --entrypoint cat "${PRE_COMMIT_IMAGE}" /usr/bin/tools_versions_info - name: Run pre-commit in container + env: + PRE_COMMIT_CACHE_DIR: ${{ runner.temp }}/pre-commit-cache run: | docker run --rm \ -e USERID="$(id -u):$(id -g)" \ From 8b1c7df0dec2cc65127246af8b77bf49f0c6420b Mon Sep 17 00:00:00 2001 From: Diego Mauricio Lagos Date: Sun, 12 Apr 2026 14:07:46 +0200 Subject: [PATCH 09/10] refactor: enhance documentation for GitHub Actions and Copilot instructions with emphasis on context and validation --- .github/copilot-instructions.md | 4 ++++ .../internal-github-actions.instructions.md | 3 +++ .../skills/internal-github-actions/SKILL.md | 4 ++++ .../references/caching-and-artifacts.md | 21 +++++++++++++++++++ AGENTS.md | 2 ++ 5 files changed, 34 insertions(+) diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index 11b4bb1..fd4fae9 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -59,6 +59,7 @@ You are an expert software and platform engineer. Protect correctness, security, - Prefer the simplest correct change. - Keep business logic separated from I/O and infrastructure concerns. - Apply only the instruction files relevant to the files being changed. +- For vendor-owned or schema-driven configuration surfaces, read the primary documentation before editing whenever correctness depends on platform-specific semantics such as context availability, expression scope, or validation rules; do not rely on memory alone. - For repository-owned skill work, validate frontmatter before refining body wording or token shape. - For source-side repository-owned standards work that deepens parallel skill families, stage planning in `tmp/superpowers/`, make anti-scope explicit, and close parity gaps in existing `Common mistakes`, `Validation`, and current reference depth before adding optional new skills, validators, or shared assets. - Keep repository-owned skill `description:` lines trigger-first, and do not rewrite a working route during token optimization unless improving retrieval is the explicit goal. @@ -79,8 +80,11 @@ You are an expert software and platform engineer. Protect correctness, security, - Whenever work reveals a new durable lesson, regardless of whether the task is in planning, review, debugging, or implementation, check whether it was already codified in repository resources when discovered. - Also treat a repeated or consequential misapplication of an already-codified repository rule as a lesson when the correction is likely to prevent the same mistake in future work. +- When a validator, IDE, schema check, or runtime error overturns an earlier assumption, immediately re-check whether that correction is durable enough to retain or codify. +- Before finalizing such a correction, read the primary documentation for the relevant platform or schema instead of relying on memory or partial recall. - Before editing repository-root `LESSONS.md`, read its current on-disk contents and treat them as the source of truth for in-progress local lessons, including uncommitted rows already present on disk. - When a durable lesson is clear and still uncodified, append one concise, reusable row to the pending table in `LESSONS.md` instead of waiting for task completion; do not regenerate, reorder, or rewrite unrelated rows. +- If you decide not to record a lesson after such a correction, make that decision explicit in the completion report with a short reason. - Treat `LESSONS.md` as a learning ledger, not as canonical policy. Do not dump transient notes, full debugging timelines, sensitive content, or conversational noise into it. - Preserve unrelated existing lessons in `LESSONS.md`, including local uncommitted ones already on disk. - If a lesson is later disproven, narrowed, deduplicated, or codified elsewhere in the same task, update or remove only that lesson's row before completion. diff --git a/.github/instructions/internal-github-actions.instructions.md b/.github/instructions/internal-github-actions.instructions.md index 8866962..f3fdc65 100644 --- a/.github/instructions/internal-github-actions.instructions.md +++ b/.github/instructions/internal-github-actions.instructions.md @@ -24,6 +24,9 @@ applyTo: "**/workflows/**,**/actions/**/action.y*ml" - Prefer reusable workflows (`workflow_call`) for repeated job orchestration inside one repository. - Prefer smaller jobs with explicit `needs` over monolithic workflows when phases are logically separate. - Use `if` conditions deliberately for branch, event, and environment-specific execution. +- Before making or validating workflow changes that depend on expression scope, context usage, or key-specific rules, read GitHub's official workflow syntax and context-availability documentation; do not rely on memory. +- Do not place runner-derived paths such as `runner.temp` in workflow-root `env` or `jobs..env`; resolve them in step-level keys that allow `runner`, or derive them from default runner environment variables inside `run`. +- Treat IDE, parser, `actionlint`, and queue-time errors such as `Unrecognized named-value` as mandatory documentation-check triggers. - Keep cache keys deterministic from lockfiles, tool versions, or other stable inputs instead of timestamps or branch-only entropy. - Set explicit artifact `retention-days` when artifacts bridge review, release, or deploy stages. - Validate `workflow_dispatch` free-form inputs before shell, deploy, or infrastructure steps consume them. diff --git a/.github/skills/internal-github-actions/SKILL.md b/.github/skills/internal-github-actions/SKILL.md index 591de6b..daafe42 100644 --- a/.github/skills/internal-github-actions/SKILL.md +++ b/.github/skills/internal-github-actions/SKILL.md @@ -20,6 +20,7 @@ Use `internal-devops-core-principles` when the question is delivery strategy, re - Pin every third-party action to a full-length SHA with adjacent release comment. - Keep `permissions` least-privilege. - Keep step names and logs in English. +- Read GitHub's official workflow syntax and context-availability documentation before making or validating changes that depend on expression scope, context usage, or key-specific rules; do not rely on memory. - Validate `workflow_dispatch` free-form inputs before shell or deploy steps consume them. ## Reference map @@ -52,6 +53,7 @@ Use `internal-devops-core-principles` when the question is delivery strategy, re | Long-lived cloud credentials in secrets instead of OIDC | Static credentials can leak and do not expire automatically | Configure OIDC federation for AWS, Azure, or GCP | | Missing `environment` protection on production deploys | Anyone who can push to the branch can deploy to production | Add an environment with required reviewers | | Letting `workflow_dispatch` inputs flow directly into shell or deploy steps | Free-form input becomes a control path without validation | Validate and normalize inputs in an early step or job | +| Using `runner.temp` or other runner-scoped contexts in workflow-root `env` or `jobs..env` | The workflow fails validation before it even queues | Use `runner` only in keys that allow it, or derive the path from runner environment variables inside `run` | | Using timestamp-driven or branch-only cache keys | Cache hits become noisy, stale, or misleading across runs | Key caches from lockfiles, tool versions, and other stable inputs | | Uploading artifacts without explicit name or retention | Review and deploy handoffs become ambiguous and harder to clean up | Name artifacts deliberately and set `retention-days` | | Duplicating steps across workflows instead of reusable workflow or composite action | Maintenance burden grows with every copy | Extract to a reusable workflow in one repo or a composite action across repos | @@ -67,6 +69,7 @@ Use `internal-devops-core-principles` when the question is delivery strategy, re - [ ] All third-party actions pinned by full SHA with release comments. - [ ] `permissions` minimized per job. - [ ] `workflow_dispatch` inputs validated before shell, deploy, or infrastructure steps use them. +- [ ] Context usage matches GitHub's context-availability rules for the specific workflow keys involved. - [ ] Cache keys are deterministic and artifacts set explicit `retention-days` when they bridge jobs. - [ ] Environment protection and `concurrency` are enabled for production deploys. - [ ] Self-hosted runner use is justified and scoped. @@ -75,6 +78,7 @@ Use `internal-devops-core-principles` when the question is delivery strategy, re ## Validation - `actionlint` on workflow files, if available. +- When expressions use non-global contexts, compare each workflow key against GitHub's official context-availability table. - Verify there is no `permissions: write-all` and no missing permissions block where least privilege matters. - Verify all third-party `uses:` lines reference full SHAs instead of tags. - Re-check that every referenced local guide resolves before treating the skill update as complete. diff --git a/.github/skills/internal-github-actions/references/caching-and-artifacts.md b/.github/skills/internal-github-actions/references/caching-and-artifacts.md index 2de4a6a..4173ecb 100644 --- a/.github/skills/internal-github-actions/references/caching-and-artifacts.md +++ b/.github/skills/internal-github-actions/references/caching-and-artifacts.md @@ -24,6 +24,27 @@ Use cache for reproducible dependency reuse across runs. Use artifacts for expli Use stable inputs like lockfiles, tool versions, or build configuration. Do not use timestamps or raw `github.run_id` in cache keys. +When a cache path needs a runner-scoped location such as `runner.temp`, resolve it in a step key that allows the `runner` context. Do not put `runner.temp` in workflow-root `env` or `jobs..env`. + +## Runner temp cache example + +```yaml +- name: Restore pre-commit cache + # actions/cache@v5.0.4 + # https://github.com/actions/cache/releases/tag/v5.0.4 + uses: actions/cache@668228422ae6a00e4ad889ee87cd7109ec5666a7 + with: + path: ${{ runner.temp }}/pre-commit-cache + key: pre-commit-${{ runner.os }}-${{ hashFiles('.pre-commit-config.yaml') }} + +- name: Use cached path in a shell step + env: + PRE_COMMIT_CACHE_DIR: ${{ runner.temp }}/pre-commit-cache + run: | + mkdir -p "${PRE_COMMIT_CACHE_DIR}" + echo "Using cache at ${PRE_COMMIT_CACHE_DIR}" +``` + ## Artifact handoff example ```yaml diff --git a/AGENTS.md b/AGENTS.md index e751c95..883dba6 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -72,6 +72,8 @@ This file is the stable entrypoint for the repository instruction architecture. - Root `LESSONS.md` is the repository learning ledger for durable lessons discovered during repository work, regardless of phase. - Record or codify a durable lesson as soon as it becomes clear enough to be reusable; do not wait for task completion only because the work is still in planning, review, debugging, or implementation. +- When a validator, IDE, schema check, or runtime error overturns an earlier implementation assumption, re-evaluate retained learning immediately instead of treating the correction as task-local by default. +- When correctness depends on vendor-owned workflow semantics, schema constraints, or context availability, read the primary documentation before editing or asserting that a change is valid. - Keep `LESSONS.md` non-canonical. It must not replace `AGENTS.md`, `.github/copilot-instructions.md`, scoped instructions, skills, or agents. - Keep `LESSONS.md` append-preserving by default: preserve unrelated rows already on disk, including local uncommitted lessons, and change a specific row only when that same lesson is being codified, disproven, narrowed, or deduplicated. - Durable corrections to repeated or consequential misapplication of existing repository rules may also be retained as lessons. From d33d0970da5f0111e5f5c32bafb40ae0662e4e0d Mon Sep 17 00:00:00 2001 From: Diego Mauricio Lagos Date: Sun, 12 Apr 2026 14:07:50 +0200 Subject: [PATCH 10/10] refactor: clarify entry rules in lessons documentation for better guidance --- LESSONS.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/LESSONS.md b/LESSONS.md index 1d978d9..fb8f5e7 100644 --- a/LESSONS.md +++ b/LESSONS.md @@ -7,6 +7,8 @@ This file retains durable lessons discovered while completing tasks in this repo - Before editing this file, read its current on-disk contents and treat them as the source of truth for in-progress local lessons, including local uncommitted rows already present on disk. - Record only lessons that were not already codified in repository resources at the time they were learned. - Also record durable corrections to repeated or consequential misapplication of already-codified repository rules when that correction is likely to prevent future mistakes. +- When a validator, IDE, schema check, or runtime error overturns an earlier assumption, re-check immediately whether the correction is durable enough to retain until it is codified or deliberately dropped. +- Before deciding whether to retain, codify, or drop such a correction, read the relevant primary documentation instead of relying on memory alone. - Keep only stable, reusable, repository-relevant lessons. - Exclude secrets, transient debugging notes, raw conversation logs, and task-local noise. - Keep new or still-uncodified lessons in the pending table until they are codified or deliberately dropped.