fix: Handle errors from event stream callbacks by tpoliaw · Pull Request #1302 · DiamondLightSource/blueapi

tpoliaw · 2025-12-12T15:52:49Z

If a callback raises an exception, it shouldn't prevent other callbacks
receiving the same event. Instead, catch any exceptions and re-raise
them after all callbacks have been called as a single ExceptionGroup.
This allows the task to be aborted if any of the callbacks fail but
still allows callbacks that require the "plan failed" events to run.

codecov · 2025-12-12T15:59:18Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.19%. Comparing base (90e5546) to head (9b996ef).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1302      +/-   ##
==========================================
+ Coverage   95.17%   95.19%   +0.01%     
==========================================
  Files          43       43              
  Lines        3111     3119       +8     
==========================================
+ Hits         2961     2969       +8     
  Misses        150      150

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tpoliaw · 2025-12-12T18:29:08Z

The main symptom of this was that if a scan failed due to eg a stomp connection error, even if it reconnected, any subsequent scans would fail as the unsubscribe the tiled writer callback would never be called and tiled would return 409 conflict errors when two tiled writers tried to write the same events.

abbiemery

Demo'd in person

abbiemery · 2026-01-13T13:43:21Z

As it stands this will continue a scan running, even if rabbitmq is down. So we will loose data from the nexus file. Looking instead at a different way to get the errors to shut things down correctly.

Not desired behaviour.

abbiemery

I have played with this and it does make the error state server side nicer. Client side instead of a 500 error you now get this:

Not sure how much information this should contain or if this is better than the latter but I'm happy to deal with niceness discussions later.

My only complaint is, and this is a general complaint I guess rather than for this PR, it is confusing having two levels of callbacks. We should probably think about how we want to deal with people wanting callbacks, run_engine or otherwise. As this won't help the same issue if it occurs within the run_engine.

If a callback raises an exception, it shouldn't prevent other callbacks receiving the same event. It should also not raise the exception at the call site that published the event.

tpoliaw requested a review from a team as a code owner December 12, 2025 15:52

tpoliaw force-pushed the failsafe-event-handling branch from 6eda306 to 73206be Compare December 12, 2025 16:43

tpoliaw changed the title ~~Ignore (but log) errors from event stream callbacks~~ fix: Handle errors from event stream callbacks Dec 12, 2025

tpoliaw mentioned this pull request Dec 12, 2025

run_task/blueapi hangs until it times out #1230

Open

tpoliaw force-pushed the failsafe-event-handling branch from 73206be to 67c37d1 Compare January 6, 2026 14:40

abbiemery previously approved these changes Jan 6, 2026

View reviewed changes

tpoliaw force-pushed the failsafe-event-handling branch 3 times, most recently from deed376 to 7d6cd80 Compare January 13, 2026 14:38

tpoliaw requested a review from abbiemery January 13, 2026 14:39

abbiemery mentioned this pull request Feb 25, 2026

Failed RabbitMQ Connection on Kubernetes not Handled #459

Open

abbiemery force-pushed the failsafe-event-handling branch from 7d6cd80 to 3d95c43 Compare February 27, 2026 15:47

abbiemery linked an issue Feb 27, 2026 that may be closed by this pull request

run_task/blueapi hangs until it times out #1230

Open

abbiemery approved these changes Mar 2, 2026

View reviewed changes

tpoliaw added 3 commits March 2, 2026 11:19

Ignore (but log) errors from event stream callbacks

6487dd2

If a callback raises an exception, it shouldn't prevent other callbacks receiving the same event. It should also not raise the exception at the call site that published the event.

Test event callback exception handling

b235cdd

Wrap callback exceptions into single group

9b996ef

abbiemery force-pushed the failsafe-event-handling branch from 3d95c43 to 9b996ef Compare March 2, 2026 11:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Handle errors from event stream callbacks#1302

fix: Handle errors from event stream callbacks#1302
tpoliaw wants to merge 3 commits intomainfrom
failsafe-event-handling

tpoliaw commented Dec 12, 2025 •

edited

Loading

Uh oh!

codecov bot commented Dec 12, 2025 •

edited

Loading

Uh oh!

tpoliaw commented Dec 12, 2025

Uh oh!

abbiemery left a comment

Uh oh!

abbiemery commented Jan 13, 2026

Uh oh!

abbiemery left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tpoliaw commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

tpoliaw commented Dec 12, 2025

Uh oh!

abbiemery left a comment

Choose a reason for hiding this comment

Uh oh!

abbiemery commented Jan 13, 2026

Uh oh!

abbiemery left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tpoliaw commented Dec 12, 2025 •

edited

Loading

codecov bot commented Dec 12, 2025 •

edited

Loading