Add --with-heartbeat flag for heartbeat logging during long-running tasks#714
Conversation
|
Discussed with the team offline - the general use case is |
|
Note - once this is merged, we also need to update https://github.com/cloudfoundry/docs-bosh/blob/master/content/cli-v2.md |
3d887a0 to
ae86fc9
Compare
|
Can we take a look at the name? |
ae86fc9 to
cab07db
Compare
Alphasite
left a comment
There was a problem hiding this comment.
Some structural nits, but nothing awful. If you want to defer this to a seperate pr, feel free.
I think if we can agree on changing point 1 to explicitly cast rather than utilize a method to return the concrete implementation, but also agree on leaving 2 and 3, then I can make these changes and we can approve/push. If you disagree, let's hop on a quick call Monday to hash it out, in case I'm misunderstanding. |
cab07db to
de52eea
Compare
…rrands Errands produce no CLI output while the Agent executes the script. This silence causes CI/CD systems to kill the process due to inactivity timeouts and leaves operators unsure whether the task is still running. Add an opt-in --with-heartbeat flag to `bosh run-errand` that prints periodic heartbeat status lines while a task is processing or queued. bosh run-errand smoke_tests --with-heartbeat bosh run-errand smoke_tests --with-heartbeat=10 Output: Task 185528 | 16:16:23 | Task state: processing (5s elapsed) No Director changes required — uses existing task API fields (state, started_at). TaskHeartbeat is part of the TaskReporter interface with a no-op default on NoopTaskReporter; withHeartbeatInterval gates whether output is actually emitted. Throttling is handled in the reporter via timestamp comparison. Made-with: Cursor
de52eea to
6f192ee
Compare
There was a problem hiding this comment.
Pull request overview
Adds an opt-in --with-heartbeat flag to bosh run-errand that prints periodic heartbeat status lines (task state + elapsed time) while a task is running, preventing CI/CD timeout kills during long-running errands.
Changes:
- Added
TaskHeartbeatmethod to theTaskReporterinterface and implemented it inReporterImplwith time-based throttling - Added
--with-heartbeatflag toRunErrandOptswith a default 30s interval - Wired the flag through
SessionImplto enable heartbeat on the task reporter
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
ui/task/interfaces.go |
Added TaskHeartbeat to Reporter interface |
ui/task/reporter.go |
Implemented TaskHeartbeat with throttling and EnableWithHeartbeat |
ui/task/reporter_test.go |
Tests for heartbeat enable/disable, throttling, elapsed time |
director/interfaces.go |
Added TaskHeartbeat to TaskReporter interface |
director/noop_reporters.go |
No-op implementation of TaskHeartbeat |
director/task_client_request.go |
Added StartedAt to taskShortResp, emit heartbeat in poll loop |
director/task_client_request_test.go |
Tests for heartbeat emission during wait |
cmd/opts/opts.go |
Added WithHeartbeat flag to RunErrandOpts |
cmd/opts/opts_test.go |
Test for flag struct tags |
cmd/cmd.go |
Wired heartbeat flag, changed session() to return *SessionImpl |
cmd/session.go |
Exposed taskReporter field on SessionImpl |
director/directorfakes/fake_task_reporter.go |
Regenerated counterfeiter fake |
ui/task/taskfakes/fake_reporter.go |
Regenerated counterfeiter fake |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
|
|
||
| taskReporter := boshuit.NewReporter(c.ui, true) | ||
| c.taskReporter = boshuit.NewReporter(c.ui, true) |
There was a problem hiding this comment.
nit: this is never used except directly below, it probably doesnt need to be a property
There was a problem hiding this comment.
The reporter is created inside Director() and wired into the director's HTTP client chain. We access the same instance afterward to enable heartbeats.
if opts.WithHeartbeat != nil && sess.taskReporter != nil {
sess.taskReporter.EnableWithHeartbeat(time.Duration(*opts.WithHeartbeat) * time.Second)
}
Storing it as a field avoids changing Director()'s signature (which is called from ~10 places). Open to changing if there's a better way.
There was a problem hiding this comment.
Eh. It’s fine. Sorry if I misunderstood something. I figured this was just leftover from some ai thing.
Alphasite
left a comment
There was a problem hiding this comment.
modulo a small nit, lgtm. ship it
Rev 5 (per reviewer feedback from @Alphasite)
Verified it still works:
Files changed: cmd/cmd.go, director/interfaces.go, director/noop_reporters.go, director/task_client_request.go, director/task_client_request_test.go, ui/task/interfaces.go, ui/task/reporter.go, ui/task/reporter_test.go + regenerated fakes
Rev 4
Usage update:
Rev 3
descriptionfrom tests, task response, and no longer passing it through heartbeat reporterRevision 2 (per reviewer feedback)
Removed --poll-state from bosh task — scoped to bosh run-errand only
Removed task description from heartbeat output to avoid redundancy with the event log
Kept elapsed time for operator visibility
Heartbeat output is now minimal:
Task 185528 | 16:16:23 | Task state: processing (5s elapsed)
Files changed in this revision: cmd/cmd.go, cmd/opts/opts.go, cmd/opts/opts_test.go, ui/task/reporter.go, ui/task/reporter_test.go (net -16 lines)
Summary
Related to issue #715
Errands and other long-running BOSH tasks produce no CLI output while the Agent executes the errand script. This silence causes CI/CD systems (Concourse, GitHub Actions, Jenkins) to kill the process due to inactivity timeouts, and leaves human operators unsure whether the task is still running.
This PR adds an opt-in
--poll-stateflag tobosh run-errandandbosh taskthat prints periodic heartbeat status lines while a task is in theprocessingorqueuedstate. The flag is fully backward-compatible — without it, behavior is identical to today.Usage
Default 30s interval
bosh run-errand smoke_tests --poll-stateCustom 10s interval
bosh run-errand smoke_tests --poll-state=10Reconnect to an in-progress task
bosh task 185528 --poll-stateOutput format
Heartbeat lines match the existing BOSH event log format:
Task 185528 | 16:16:18 | Task state: queuedTask 185528 | 16:16:23 | Task state: processing (5s elapsed) (run errand hello-errand from deployment my-errand-deployment)Architecture
state,description,started_at).HeartbeatReporter): ExtendsTaskReportervia safe type assertion — does not break any existing code.ReporterImpl— notime.Sleepin the UI layer, theWaitForCompletionloop spins at its normal rate and the reporter only prints when the interval has passed.sync.Mutexfor concurrent access to heartbeat state.boshui.UI: Ensures--jsonmode works correctly.Files changed
director/interfaces.goHeartbeatReporterinterfacedirector/task_client_request.godescription/started_attotaskShortRespui/task/reporter.goEnableHeartbeat()+TaskHeartbeat()with throttlingcmd/opts/opts.go--poll-stateflag onRunErrandOptsandTaskOptscmd/cmd.gocmd/session.gotaskReporterforrun-errandpathTest plan
Unit tests (included)
startedAtshows elapsed timestartedAt(value 0) — no elapsed, no "queued" confusionWaitForCompletionemits heartbeats when reporter implementsHeartbeatReporterWaitForCompletiondoes not emit when reporter does not implement the interfacePollStateon bothTaskOptsandRunErrandOpts