Timeouts cascade across microservices

Scenario

Service C slows down. B waits 30s for C, A waits 30s for B, the gateway waits 60s for A—users see 504s and every tier’s thread pools are full of blocked waiters. Retries double the load. One slow leaf turns into a system-wide outage. You need a timeout budget, aligned retries, and circuit breakers so failures fail fast at the right layer.

After reading, you should be able to:

Why — threads wait the full timeout at every hop

A timeout cascade happens when many services each hold resources (threads, connections) for their full configured timeout while waiting on a deeper slow or failed service. The user sees one slow request; internally you have N × timeout worth of blocked capacity and retry traffic amplifies the blast radius—a metastable failure that persists after the root cause heals.

Common mistakes

MistakeEffect
Same 30s timeout everywhereEach hop can consume 30s; edge waits minutes
Gateway timeout < sum of internal timeouts504 to client while backends still working
Retry on timeout without idempotencyDuplicate work — idempotency
Retry storm on 503Load multiplier — 429 guide
No deadline propagationDownstream does not know caller has 200ms left
No breaker on slow dependencyKeep hitting sick service — circuit breaker

What — see the cascade in traces

  1. Waterfall shows stacked blocking — B’s HTTP client span to C is 29s; A’s call to B is 29s; little CPU work — distributed trace.
  2. Thread dumps — many threads in socket read to same downstream host — pool exhausted.
  3. Error pattern — mix of timeouts, 504, ReadTimeoutException, 504 Gateway Timeout at edge.
  4. Retry metrics — client/gateway retry count spikes with incident.
  5. Map timeout config — table per service: connect vs read timeout, gateway, JDBC (often separate).

How — timeout budget and resilience stack

1. Deadline budget (end-to-end)

Start at the edge with max user-facing latency (e.g. 3s). Subtract overhead per hop.

Edge (gateway):     3000 ms total
  Service A:        2500 ms (must finish before gateway gives up)
    Service B:        2000 ms
      Service C:      1500 ms
        DB:           500 ms statement timeout

Each hop’s timeout must be less than its caller’s remaining budget—not equal.

2. Propagate remaining time

3. Connect vs read timeout

connectTimeout: 1-2s    # fail fast if host dead
readTimeout:    min(remainingBudget, per-hop cap)

4. Retries (strict rules)

5. Bulkhead + circuit breaker (per dependency)

Limit concurrent calls and open breaker on slow/failure rate—stops occupying all threads for full timeout window — circuit breaker guide.

6. Cancel work when client disconnects

Servlet/async: abort downstream calls when client closes connection—frees threads earlier.

7. JDBC and messaging

Reference chain (502 guide)

client_timeout ≥ gateway_timeout ≥ service_read_timeout ≥ db_timeout

Document in repo README or platform standards — see 502 healthy pods.

Verify

  1. Integration test: slow C → A returns within gateway budget; does not wait 30s×3.
  2. Load test with retry disabled vs enabled—measure amplification.
  3. Trace shows decreasing deadline per hop.

Interview one-liner

“I set an end-to-end deadline at the edge and give each hop a shorter timeout than its caller, propagate remaining time, limit retries to idempotent ops with backoff, and use bulkheads and circuit breakers so one slow service does not hold every thread for the full 30 seconds.”

Related scenarios