Timeouts cascade across microservices
Scenario
Service C slows down. B waits 30s for C, A waits 30s for B, the gateway waits 60s for A—users see 504s and every tier’s thread pools are full of blocked waiters. Retries double the load. One slow leaf turns into a system-wide outage. You need a timeout budget, aligned retries, and circuit breakers so failures fail fast at the right layer.
After reading, you should be able to:
- Explain why independent 30s timeouts per hop do not add up safely.
- Design a deadline budget from edge to leaf services.
- Pair timeouts with bulkheads, limited retries, and idempotency.
- Read traces for stacked waits and fix the deepest slow span first.
Why — threads wait the full timeout at every hop
A timeout cascade happens when many services each hold resources (threads, connections) for their full configured timeout while waiting on a deeper slow or failed service. The user sees one slow request; internally you have N × timeout worth of blocked capacity and retry traffic amplifies the blast radius—a metastable failure that persists after the root cause heals.
Common mistakes
| Mistake | Effect |
|---|---|
| Same 30s timeout everywhere | Each hop can consume 30s; edge waits minutes |
| Gateway timeout < sum of internal timeouts | 504 to client while backends still working |
| Retry on timeout without idempotency | Duplicate work — idempotency |
| Retry storm on 503 | Load multiplier — 429 guide |
| No deadline propagation | Downstream does not know caller has 200ms left |
| No breaker on slow dependency | Keep hitting sick service — circuit breaker |
What — see the cascade in traces
- Waterfall shows stacked blocking — B’s HTTP client span to C is 29s; A’s call to B is 29s; little CPU work — distributed trace.
- Thread dumps — many threads in socket read to same downstream host — pool exhausted.
-
Error pattern
— mix of timeouts, 504,
ReadTimeoutException,504 Gateway Timeoutat edge. - Retry metrics — client/gateway retry count spikes with incident.
- Map timeout config — table per service: connect vs read timeout, gateway, JDBC (often separate).
How — timeout budget and resilience stack
1. Deadline budget (end-to-end)
Start at the edge with max user-facing latency (e.g. 3s). Subtract overhead per hop.
Edge (gateway): 3000 ms total
Service A: 2500 ms (must finish before gateway gives up)
Service B: 2000 ms
Service C: 1500 ms
DB: 500 ms statement timeout
Each hop’s timeout must be less than its caller’s remaining budget—not equal.
2. Propagate remaining time
- gRPC:
deadlinepropagated automatically. - HTTP: custom header
X-Deadline-Msor use service mesh (Envoy) timeout per route. - Before outbound call:
remaining = deadline - now(); set client timeout toremaining.
3. Connect vs read timeout
connectTimeout: 1-2s # fail fast if host dead readTimeout: min(remainingBudget, per-hop cap)
4. Retries (strict rules)
- Max 1–2 retries; exponential backoff + jitter.
- Only idempotent operations or with
Idempotency-Key. - Do not retry if deadline nearly exhausted.
- Do not retry timeouts on POST without dedupe.
5. Bulkhead + circuit breaker (per dependency)
Limit concurrent calls and open breaker on slow/failure rate—stops occupying all threads for full timeout window — circuit breaker guide.
6. Cancel work when client disconnects
Servlet/async: abort downstream calls when client closes connection—frees threads earlier.
7. JDBC and messaging
queryTimeout/ HikariconnectionTimeoutwithin DB slice of budget.- Kafka consumer:
max.poll.interval.msaligned with processing time.
Reference chain (502 guide)
client_timeout ≥ gateway_timeout ≥ service_read_timeout ≥ db_timeout
Document in repo README or platform standards — see 502 healthy pods.
Verify
- Integration test: slow C → A returns within gateway budget; does not wait 30s×3.
- Load test with retry disabled vs enabled—measure amplification.
- Trace shows decreasing deadline per hop.
Interview one-liner
“I set an end-to-end deadline at the edge and give each hop a shorter timeout than its caller, propagate remaining time, limit retries to idempotent ops with backoff, and use bulkheads and circuit breakers so one slow service does not hold every thread for the full 30 seconds.”