Skip to content

fix: handle double setup-docker-builder invocations gracefully#71

Open
pbardea wants to merge 4 commits intomainfrom
fix-double-setup-error
Open

fix: handle double setup-docker-builder invocations gracefully#71
pbardea wants to merge 4 commits intomainfrom
fix-double-setup-error

Conversation

@pbardea
Copy link
Copy Markdown
Contributor

@pbardea pbardea commented Mar 16, 2026

When this action is invoked multiple times within the same job (e.g. separate build steps for different platforms), the second invocation detects the running buildkitd and errors out with "Refusing to start to avoid conflicts." This PR changes that behavior so the second invocation detects the existing process and reuses the builder instead of crashing.

How it works:

  • The pgrep check moves to the very top of startBlacksmithBuilder() — before sticky disk setup — and returns { addr: null, exposeId: "" } instead of throwing. The fallback path then inspects the existing builder and reuses it if its driver isn't docker.
  • A new maybeShutdownBuildkitd() function gates shutdown on getBuildkitdAddr(): only the instance that actually started buildkitd (and saved the addr to state) will shut it down. Subsequent instances skip shutdown entirely.
  • A logBuildkitdCrashLogs() helper is extracted for the crash-detection log tailing, removing ~60 lines of duplication from the inline cleanup.

This is a rebase of the original PR onto current main, incorporating the SIGKILL sticky-disk-commit prevention (#77), node24 migration (#84), and docker-container driver fallback (#86). The original PR's maybeShutdownBuildkitd() unconditionally logged "Shutdown buildkitd gracefully" even after a SIGKILL — this version correctly checks getSigkillUsed() to preserve the warning behavior from #77. The dist/ rebuild is expected to happen via CI.

@chadxz
Copy link
Copy Markdown

chadxz commented Mar 18, 2026

When you rebuild I will test this again!

@pbardea
Copy link
Copy Markdown
Contributor Author

pbardea commented Mar 19, 2026

@chadxz - sorry for the delay you should be good to go here!

@chadxz
Copy link
Copy Markdown

chadxz commented Mar 24, 2026

Tested this against our monorepo where we hit the original bug. We have composite actions that call setup-docker-builder internally, and when two of those run in the same job, the second one would remount the sticky disk and corrupt the overlayfs state.

Set up a workflow that calls setup-docker-builder (pinned to 89e1f28) twice in one job, building two different Docker images (a Node app and a .NET app). Both builds completed successfully.

The logs show the fix working as expected -- first invocation starts buildkitd, second invocation detects it and bails out early:

Setup Blacksmith Docker Builder (1st): Starting buildkitd ...
Setup Blacksmith Docker Builder (1st): buildkitd daemon started successfully with PID 3883
Setup Blacksmith Docker Builder (2nd): Detected existing buildkitd process (PID: 3883). Skipping builder setup - builder is already initialized.

Cleanup also looked correct -- the second post-step skipped shutdown since it did not start buildkitd, and the first post-step shut it down gracefully.

Full job logs: https://app.blacksmith.sh/convergint/runs/23514311534/jobs/68442899155

When this action is invoked multiple times within the same job,
the second invocation now detects the existing buildkitd process
and reuses the builder instead of throwing an error.

Key changes:
- Move pgrep check to the top of startBlacksmithBuilder() and
  return early with addr=null when buildkitd is already running
- Extract maybeShutdownBuildkitd() so only the instance that
  started buildkitd performs shutdown (checked via getBuildkitdAddr)
- Preserve SIGKILL detection from the shutdown path
- Update fallback warning to cover both skipped and failed states

Co-authored-by: Codesmith <codesmith@blacksmith.sh>
@adityamaru adityamaru force-pushed the fix-double-setup-error branch from 89e1f28 to dc5a171 Compare March 27, 2026 19:00
@adityamaru adityamaru changed the title fix: check for existing buildkitd before mounting sticky disk fix: handle double setup-docker-builder invocations gracefully Mar 27, 2026
@adityamaru adityamaru self-requested a review March 27, 2026 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants