Skip to content

Bugfix workflow: Controller did not enforce phase gates #81

@jwm4

Description

@jwm4

Summary

When the bugfix workflow is triggered by a user providing a bug report, the controller should execute the /assess skill, present results, and stop and wait for user input before proceeding. Instead, the session:

  1. Performed an ad-hoc assessment without reading or following the assess skill (SKILL.md)
  2. Auto-advanced through fix, test, and PR phases without user approval
  3. Did not read or execute any of the phase skills (/fix, /test, /pr)
  4. Failed on PR creation because the ad-hoc approach didn't handle the fork workflow that the /pr skill is designed for

When the user later invoked /pr explicitly, the skill was read and executed correctly — the fork was found, the branch was pushed, and a compare URL was provided. This confirms the skills work; the problem is the controller not dispatching to them.

Expected Behavior

Per controller/SKILL.md:

How to Execute a Phase:

  1. Announce the phase to the user
  2. Read the skill file from the list above
  3. Execute the skill's steps directly
  4. When the skill is done [...] use "Recommending Next Steps" to offer options
  5. Present the skill's results and your recommendations to the user
  6. Stop and wait for the user to tell you what to do next

And critically:

Never auto-advance. Always wait for the user between phases.

Per assess/SKILL.md:

Do not start reproducing, diagnosing, or fixing. This phase is analysis and planning only.

Actual Behavior

  1. No skill files were read. The controller never read assess/SKILL.md, fix/SKILL.md, test/SKILL.md, or pr/SKILL.md. It performed all phases ad-hoc based on general knowledge rather than following the documented steps.

  2. No phase gates. After completing what amounted to an informal assessment, the session immediately continued to clone the repo, search for dependencies, make code changes, run tests, and attempt to create a PR — all without pausing for user input.

  3. Assessment conclusion was contradicted by subsequent action. The assessment correctly identified that the compromise was PyPI-only (versions 1.82.7/1.82.8) and the project uses a container image, not the PyPI package. Despite concluding "the reported issue wasn't actually a problem for this project," the session proceeded to make a code change anyway (pinning the container image tag). This may have been a reasonable precautionary measure, but the decision to proceed with a fix for a different-than-reported reason should have been presented to the user as an option, not taken unilaterally.

  4. PR creation failed. The ad-hoc PR attempt tried git push origin directly to upstream (which failed with permission denied), then tried gh repo fork (which failed with 403). The /pr skill's systematic approach (check auth type → find existing fork → configure remote → push to fork → try gh pr create → fall back to compare URL) would have handled this correctly on the first attempt, as demonstrated when the user later invoked /pr explicitly.

Root Cause Hypothesis

The controller skill file was read at the start of the session, but the instructions were not followed. Specifically:

  • The "How to Execute a Phase" section requires reading each skill file before executing it — this never happened for any phase
  • The "Never auto-advance" rule was violated after the assessment
  • The assess skill's "Do not start reproducing, diagnosing, or fixing" rule was violated

This may be a prompt adherence issue rather than a structural bug — the instructions are clear but were not followed. Possible contributing factors:

  • The urgency framing of the security issue may have caused the model to prioritize speed over process
  • The controller instructions may need stronger enforcement language or structural changes (e.g., explicit "STOP HERE" markers)
  • The initial greeting prompt may have set an expectation of end-to-end execution rather than phase-gated workflow

Impact

  • User lost control of the workflow. The user was not consulted before changes were made to the target project.
  • Wasted effort on failed PR. The ad-hoc PR approach failed, requiring the user to invoke /pr manually to recover.
  • Assessment finding was ignored. The nuance that "this isn't actually the reported problem, but here's a related precaution we could take" was lost — the session just went ahead and did it.

Suggested Improvements

  1. Structural enforcement of phase gates. Consider adding explicit stop markers or requiring a user-facing AskUserQuestion call at the end of each phase, rather than relying on "stop and wait" prose instructions.

  2. Require skill file reads. The controller could require that each phase begins with an explicit Read of the skill file, perhaps by checking for a sentinel string or artifact that proves the skill was loaded.

  3. Handle "not actually a bug" assessment path. The assess skill should have explicit guidance for when the assessment concludes the reported issue doesn't apply. Options to present to the user might include:

    • "This isn't a problem for your project — here's why"
    • "The reported issue doesn't apply directly, but here's a related improvement we could make (with trade-offs)"
    • "Close/dismiss the report"
  4. Urgency framing shouldn't bypass process. Security-related reports may create pressure to act fast, but the phase-gated workflow exists precisely to prevent hasty action. The controller could explicitly note that urgency doesn't change the process.

Reproduction

  1. Start a new bugfix workflow session
  2. Provide a bug report that involves a supply chain security advisory (creates urgency pressure)
  3. Observe whether the controller reads and follows skill files, and whether it stops after assessment

Environment

  • Model: Claude Opus 4.6
  • Workflow: bugfix (controller + skills)
  • Target repo: sallyom/openclaw-installer
  • Bug report: LiteLLM PyPI supply chain compromise (v1.82.7/1.82.8)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions