Skip to content

Add --with-heartbeat flag for heartbeat logging during long-running tasks#714

Merged
Alphasite merged 1 commit intocloudfoundry:mainfrom
ay901246:task-heartbeat-logger
Mar 17, 2026
Merged

Add --with-heartbeat flag for heartbeat logging during long-running tasks#714
Alphasite merged 1 commit intocloudfoundry:mainfrom
ay901246:task-heartbeat-logger

Conversation

@ay901246
Copy link
Contributor

@ay901246 ay901246 commented Mar 11, 2026

Rev 5 (per reviewer feedback from @Alphasite)

Verified it still works:

ay901246@L14Q6V7XGG hello-errand-release % ../bosh -d my-errand-deployment run-errand hello-errand --with-heartbeat=5
Using environment '10.0.16.5' as client 'admin'

Using deployment 'my-errand-deployment'

Task 185735
Task 185735 | 23:13:37 | Task state: queued

Task 185735 | 23:13:37 | Preparing deployment: Preparing deployment
Task 185735 | 23:13:38 | Preparing package compilation: Finding packages to compile (00:00:00)
Task 185735 | 23:13:38 | Preparing deployment: Preparing deployment (00:00:01)
Task 185735 | 23:13:38 | Creating missing vms: errand-runner/e53c6176-c712-4180-81b8-3088c47ef53f (0)
Task 185735 | 23:13:42 | Task state: processing (5s elapsed)
Task 185735 | 23:13:48 | Task state: processing (11s elapsed)
Task 185735 | 23:13:53 | Task state: processing (16s elapsed)
Task 185735 | 23:13:59 | Task state: processing (22s elapsed)
  • Removed sessionImpl() — session() now returns Session directly; the errand case type-asserts to *SessionImpl
  • Merged HeartbeatReporter into TaskReporter — TaskHeartbeat is now part of the core interface with a no-op on NoopTaskReporter; withHeartbeatInterval <= 0 gates output
  • Renamed heartbeatInterval -> withHeartbeatInterval and EnableHeartbeat -> EnableWithHeartbeat for clarity
  • Removed custom fakeHeartbeatReporter test mock — tests now use counterfeiter-generated FakeTaskReporter directly
  • Removed "does not implement HeartbeatReporter" test (no longer applicable)
  • Regenerated counterfeiter fakes for director and ui/task

Files changed: cmd/cmd.go, director/interfaces.go, director/noop_reporters.go, director/task_client_request.go, director/task_client_request_test.go, ui/task/interfaces.go, ui/task/reporter.go, ui/task/reporter_test.go + regenerated fakes

Rev 4

  • Rename --poll-state -> --with-heartbeat

Usage update:

bosh run-errand smoke_tests --with-heartbeat
bosh run-errand smoke_tests --with-heartbeat=10

Rev 3

  • Removed description from tests, task response, and no longer passing it through heartbeat reporter

Revision 2 (per reviewer feedback)

Removed --poll-state from bosh task — scoped to bosh run-errand only
Removed task description from heartbeat output to avoid redundancy with the event log
Kept elapsed time for operator visibility
Heartbeat output is now minimal:

Task 185528 | 16:16:23 | Task state: processing (5s elapsed)
Files changed in this revision: cmd/cmd.go, cmd/opts/opts.go, cmd/opts/opts_test.go, ui/task/reporter.go, ui/task/reporter_test.go (net -16 lines)

Summary

Related to issue #715
Errands and other long-running BOSH tasks produce no CLI output while the Agent executes the errand script. This silence causes CI/CD systems (Concourse, GitHub Actions, Jenkins) to kill the process due to inactivity timeouts, and leaves human operators unsure whether the task is still running.

This PR adds an opt-in --poll-state flag to bosh run-errand and bosh task that prints periodic heartbeat status lines while a task is in the processing or queued state. The flag is fully backward-compatible — without it, behavior is identical to today.

Usage

Default 30s interval
bosh run-errand smoke_tests --poll-state

Custom 10s interval
bosh run-errand smoke_tests --poll-state=10

Reconnect to an in-progress task
bosh task 185528 --poll-state

Output format

Heartbeat lines match the existing BOSH event log format:
Task 185528 | 16:16:18 | Task state: queued
Task 185528 | 16:16:23 | Task state: processing (5s elapsed) (run errand hello-errand from deployment my-errand-deployment)

Architecture

  • No Director changes required: Uses existing task API fields (state, description, started_at).
  • Optional interface (HeartbeatReporter): Extends TaskReporter via safe type assertion — does not break any existing code.
  • Timestamp-based throttling: Handled in ReporterImpl — no time.Sleep in the UI layer, the WaitForCompletion loop spins at its normal rate and the reporter only prints when the interval has passed.
  • Thread-safe: Uses sync.Mutex for concurrent access to heartbeat state.
  • Output goes through boshui.UI: Ensures --json mode works correctly.

Files changed

File Change
director/interfaces.go New HeartbeatReporter interface
director/task_client_request.go Emit heartbeat on each poll cycle; added description/started_at to taskShortResp
ui/task/reporter.go EnableHeartbeat() + TaskHeartbeat() with throttling
cmd/opts/opts.go --poll-state flag on RunErrandOpts and TaskOpts
cmd/cmd.go Wire flag to enable heartbeat on reporters
cmd/session.go Expose taskReporter for run-errand path

Test plan

Unit tests (included)

  • Heartbeat not printed when disabled
  • Queued task output without elapsed time
  • Processing task with startedAt shows elapsed time
  • Processing task without startedAt (value 0) — no elapsed, no "queued" confusion
  • Throttling: multiple calls within interval produce only one output
  • WaitForCompletion emits heartbeats when reporter implements HeartbeatReporter
  • WaitForCompletion does not emit when reporter does not implement the interface
  • No heartbeats for tasks that immediately finish
  • Opts struct tags for PollState on both TaskOpts and RunErrandOpts

@aramprice aramprice moved this from Inbox to Pending Review | Discussion in Foundational Infrastructure Working Group Mar 12, 2026
@aramprice aramprice requested a review from rkoster March 12, 2026 16:06
@ay901246
Copy link
Contributor Author

Discussed with the team offline - the general use case is subscribe and watch for users, so there is likely no useful use case here for having to reconnect via tasks where the user wouldn't be a power-user who could do something else that already exists - pushed out revision 2 to reflect the removal of description (kept it in the task response, though), and --poll-state from task.

@github-project-automation github-project-automation bot moved this from Pending Review | Discussion to Waiting for Changes | Open for Contribution in Foundational Infrastructure Working Group Mar 12, 2026
@selzoc
Copy link
Member

selzoc commented Mar 12, 2026

Note - once this is merged, we also need to update https://github.com/cloudfoundry/docs-bosh/blob/master/content/cli-v2.md

@ay901246 ay901246 force-pushed the task-heartbeat-logger branch from 3d887a0 to ae86fc9 Compare March 12, 2026 21:25
selzoc
selzoc previously approved these changes Mar 12, 2026
@github-project-automation github-project-automation bot moved this from Waiting for Changes | Open for Contribution to Pending Merge | Prioritized in Foundational Infrastructure Working Group Mar 12, 2026
@Alphasite
Copy link
Contributor

Can we take a look at the name?

@selzoc selzoc requested a review from Alphasite March 13, 2026 22:05
@ay901246 ay901246 force-pushed the task-heartbeat-logger branch from ae86fc9 to cab07db Compare March 13, 2026 22:07
@ay901246 ay901246 changed the title Add --poll-state flag for heartbeat logging during long-running tasks Add --with-heartbeat flag for heartbeat logging during long-running tasks Mar 13, 2026
Alphasite
Alphasite previously approved these changes Mar 13, 2026
Copy link
Contributor

@Alphasite Alphasite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some structural nits, but nothing awful. If you want to defer this to a seperate pr, feel free.

@ay901246
Copy link
Contributor Author

Some structural nits, but nothing awful. If you want to defer this to a seperate pr, feel free.

I think if we can agree on changing point 1 to explicitly cast rather than utilize a method to return the concrete implementation, but also agree on leaving 2 and 3, then I can make these changes and we can approve/push.

If you disagree, let's hop on a quick call Monday to hash it out, in case I'm misunderstanding.

…rrands

Errands produce no CLI output while the Agent executes the script.
This silence causes CI/CD systems to kill the process due to inactivity
timeouts and leaves operators unsure whether the task is still running.

Add an opt-in --with-heartbeat flag to `bosh run-errand` that prints
periodic heartbeat status lines while a task is processing or queued.

  bosh run-errand smoke_tests --with-heartbeat
  bosh run-errand smoke_tests --with-heartbeat=10

Output:
  Task 185528 | 16:16:23 | Task state: processing (5s elapsed)

No Director changes required — uses existing task API fields (state,
started_at). TaskHeartbeat is part of the TaskReporter interface with
a no-op default on NoopTaskReporter; withHeartbeatInterval gates
whether output is actually emitted. Throttling is handled in the
reporter via timestamp comparison.

Made-with: Cursor
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in --with-heartbeat flag to bosh run-errand that prints periodic heartbeat status lines (task state + elapsed time) while a task is running, preventing CI/CD timeout kills during long-running errands.

Changes:

  • Added TaskHeartbeat method to the TaskReporter interface and implemented it in ReporterImpl with time-based throttling
  • Added --with-heartbeat flag to RunErrandOpts with a default 30s interval
  • Wired the flag through SessionImpl to enable heartbeat on the task reporter

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.

Show a summary per file
File Description
ui/task/interfaces.go Added TaskHeartbeat to Reporter interface
ui/task/reporter.go Implemented TaskHeartbeat with throttling and EnableWithHeartbeat
ui/task/reporter_test.go Tests for heartbeat enable/disable, throttling, elapsed time
director/interfaces.go Added TaskHeartbeat to TaskReporter interface
director/noop_reporters.go No-op implementation of TaskHeartbeat
director/task_client_request.go Added StartedAt to taskShortResp, emit heartbeat in poll loop
director/task_client_request_test.go Tests for heartbeat emission during wait
cmd/opts/opts.go Added WithHeartbeat flag to RunErrandOpts
cmd/opts/opts_test.go Test for flag struct tags
cmd/cmd.go Wired heartbeat flag, changed session() to return *SessionImpl
cmd/session.go Exposed taskReporter field on SessionImpl
director/directorfakes/fake_task_reporter.go Regenerated counterfeiter fake
ui/task/taskfakes/fake_reporter.go Regenerated counterfeiter fake

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}

taskReporter := boshuit.NewReporter(c.ui, true)
c.taskReporter = boshuit.NewReporter(c.ui, true)
Copy link
Contributor

@Alphasite Alphasite Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is never used except directly below, it probably doesnt need to be a property

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reporter is created inside Director() and wired into the director's HTTP client chain. We access the same instance afterward to enable heartbeats.

		if opts.WithHeartbeat != nil && sess.taskReporter != nil {
			sess.taskReporter.EnableWithHeartbeat(time.Duration(*opts.WithHeartbeat) * time.Second)
		}

Storing it as a field avoids changing Director()'s signature (which is called from ~10 places). Open to changing if there's a better way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh. It’s fine. Sorry if I misunderstood something. I figured this was just leftover from some ai thing.

Copy link
Contributor

@Alphasite Alphasite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modulo a small nit, lgtm. ship it

@Alphasite Alphasite merged commit 9e93907 into cloudfoundry:main Mar 17, 2026
27 checks passed
@github-project-automation github-project-automation bot moved this from Pending Merge | Prioritized to Done in Foundational Infrastructure Working Group Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

7 participants