Skip to content

feat(circuit-breaker): add CircuitBreaker Tower middleware#855

Open
Mattbusel wants to merge 3 commits intotower-rs:masterfrom
Mattbusel:feat/circuit-breaker
Open

feat(circuit-breaker): add CircuitBreaker Tower middleware#855
Mattbusel wants to merge 3 commits intotower-rs:masterfrom
Mattbusel:feat/circuit-breaker

Conversation

@Mattbusel
Copy link

Problem

Tower has no built-in circuit breaker. PR #102 was closed during migration in 2019 with a note to pick it back up — it never was. Users building on reqwest, hyper, or tonic are forced to either write their own or pull in a separate crate just for this pattern.

The missing primitive means retry storms: when a backend goes down, requests pile up, timeouts accumulate, and memory/goroutine equivalents grow unbounded. A circuit breaker cuts this off at the source.

Solution

This PR adds tower::circuit_breaker — a three-state machine implemented as a standard Tower Service<Request> + Layer.

States

Closed ──(N consecutive failures)──► Open
Open   ──(timeout elapsed)─────────► HalfOpen  (one probe allowed)
HalfOpen ──(success rate ≥ threshold)► Closed
HalfOpen ──(probe fails)────────────► Open

Usage

use std::time::Duration;
use tower::ServiceBuilder;
use tower::circuit_breaker::CircuitBreakerLayer;

let svc = ServiceBuilder::new()
    .layer(CircuitBreakerLayer::new(
        5,                        // open after 5 consecutive failures
        0.8,                      // close when 80 % of probes succeed
        Duration::from_secs(30),  // wait 30 s before sending a probe
    ))
    .service_fn(my_backend_call);

Key design decisions

  • try_read() in poll() — circuit gate check is non-blocking; wakes and yields rather than blocking the executor if the write lock is held during a state transition.
  • Window cleared on HalfOpen — stale failure history from before the outage is discarded so the recovery success rate is calculated only from post-recovery probes.
  • CircuitError<E> — wraps the inner error type; CircuitError::Open signals a rejected-without-calling case so callers can distinguish "backend failed" from "circuit open".
  • reset() method — allows operator-driven forced close (e.g. after confirming backend is healthy).
  • Feature flag circuit-breaker = ["tokio/sync", "tokio/time", "pin-project-lite"] — zero cost if unused.

Files changed

File Description
tower/src/circuit_breaker/mod.rs Module root + docs
tower/src/circuit_breaker/layer.rs CircuitBreakerLayer
tower/src/circuit_breaker/service.rs CircuitBreaker<S> + state machine + tests
tower/src/circuit_breaker/future.rs ResponseFuture with non-blocking gate
tower/src/lib.rs #[cfg(feature = "circuit-breaker")] pub mod circuit_breaker
tower/Cargo.toml Feature flag + added to full

Tests

Two inline tests in service.rs:

  • closed_passes_requests_through — baseline happy path
  • opens_after_failure_threshold — verifies Open state rejects with CircuitError::Open

Designed and implemented by Matthew Busel.

Three-state machine (Closed → Open → HalfOpen) with configurable
failure threshold, success-rate recovery, and probe timeout.

- CircuitBreakerLayer for ServiceBuilder ergonomics
- CircuitBreaker<S> implements Service<Request>
- ResponseFuture: non-blocking gate check via try_read()
- Automatic HalfOpen transition after timeout elapses
- Clears result window on HalfOpen so recovery rate reflects
  only post-recovery probes, not stale failure history
- Full test coverage for open/close/recovery paths

Designed and implemented by Matthew Busel.
…Debug

- Replace tokio::sync::RwLock with std::sync::Mutex — state updates
  now happen synchronously in poll() and poll_ready(), eliminating
  the tokio::spawn-inside-poll anti-pattern
- Circuit gate check moved to poll_ready() where Tower expects it;
  call() only wraps the inner future in ResponseFuture
- ResponseFuture::poll updates state inline on Ready — no allocation
  or task spawn, correct under Tower's single-threaded test executor
- Suppress missing_debug_implementations for CircuitBreaker<S> since
  S is an unconstrained generic (same pattern as tower::Timeout<S>)
- cargo fmt applied

Designed and implemented by Matthew Busel.
@seanmonstar
Copy link
Collaborator

Thanks for restarting this! Just a couple thoughts as I look through it:

  • The tower::retry::budget has some similar ideas. It's the opinion of the original maintainers that a retry policy without a budget is a bad practice.
  • I'm used to circuit breakers describing broad concept, which is detecting some thing in the system is overloaded, and stopping requests. And then there are various mechanism people may use: a failure monitor is one way, but also it can be a simple shared on-off switch that is triggered by other parts of the system.

@Mattbusel
Copy link
Author

Good points, thanks.

On Budget, I see them as complementary layers: Budget governs retry-worthiness, circuit breaker gates all traffic (including first attempts) when the backend is down. They compose rather than overlap. Happy to add a note in the docs making that distinction explicit.

On the broader abstraction, agreed. I'll refactor to a Policy trait so the current consecutive-failure logic becomes one implementation (ConsecutiveFailures), and the service is generic over CircuitBreaker<S, P: Policy>. That opens the door for manual switches, latency-based triggers, etc. without changing the middleware shape.

Will push an updated draft.

…ionship

- Extract CircuitPolicy trait (on_success, on_failure, should_probe, on_half_open)
- Move ConsecutiveFailures into policy.rs as the built-in implementation
- CircuitBreaker<S, P> generic over CircuitPolicy; SharedState<P> replaces State
- CircuitBreakerLayer<P> with ::new() and ::with_policy() constructors
- ResponseFuture<F, T, E, P> delegates outcome reporting to the policy
- Document Send/Sync expectations on CircuitPolicy and CircuitBreaker structs
- Document budget vs circuit breaker relationship in mod.rs and policy.rs
- Add custom_policy_is_accepted test
@Mattbusel
Copy link
Author

Pushed ddb88ba, CircuitPolicy trait extracted, budget relationship documented. Ready for another look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants