Timeouts cascade across microservices

Scenario

Service C slows down. B waits 30s for C, A waits 30s for B, the gateway waits 60s for A—users see 504s and every tier’s thread pools are full of blocked waiters. Retries double the load. One slow leaf turns into a system-wide outage. You need a timeout budget, aligned retries, and circuit breakers so failures fail fast at the right layer.

After reading, you should be able to:

Explain why independent 30s timeouts per hop do not add up safely.
Design a deadline budget from edge to leaf services.
Pair timeouts with bulkheads, limited retries, and idempotency.
Read traces for stacked waits and fix the deepest slow span first.

Why — threads wait the full timeout at every hop

A timeout cascade happens when many services each hold resources (threads, connections) for their full configured timeout while waiting on a deeper slow or failed service. The user sees one slow request; internally you have N × timeout worth of blocked capacity and retry traffic amplifies the blast radius—a metastable failure that persists after the root cause heals.

Common mistakes

Mistake	Effect
Same 30s timeout everywhere	Each hop can consume 30s; edge waits minutes
Gateway timeout < sum of internal timeouts	504 to client while backends still working
Retry on timeout without idempotency	Duplicate work — idempotency
Retry storm on 503	Load multiplier — 429 guide
No deadline propagation	Downstream does not know caller has 200ms left
No breaker on slow dependency	Keep hitting sick service — circuit breaker

What — see the cascade in traces

Waterfall shows stacked blocking — B’s HTTP client span to C is 29s; A’s call to B is 29s; little CPU work — distributed trace.
Thread dumps — many threads in socket read to same downstream host — pool exhausted.
Error pattern — mix of timeouts, 504, ReadTimeoutException, 504 Gateway Timeout at edge.
Retry metrics — client/gateway retry count spikes with incident.
Map timeout config — table per service: connect vs read timeout, gateway, JDBC (often separate).

How — timeout budget and resilience stack

1. Deadline budget (end-to-end)

Start at the edge with max user-facing latency (e.g. 3s). Subtract overhead per hop.

Edge (gateway):     3000 ms total
  Service A:        2500 ms (must finish before gateway gives up)
    Service B:        2000 ms
      Service C:      1500 ms
        DB:           500 ms statement timeout

Each hop’s timeout must be less than its caller’s remaining budget—not equal.

2. Propagate remaining time

gRPC: deadline propagated automatically.
HTTP: custom header X-Deadline-Ms or use service mesh (Envoy) timeout per route.
Before outbound call: remaining = deadline - now(); set client timeout to remaining.

3. Connect vs read timeout

connectTimeout: 1-2s    # fail fast if host dead
readTimeout:    min(remainingBudget, per-hop cap)

4. Retries (strict rules)

Max 1–2 retries; exponential backoff + jitter.
Only idempotent operations or with Idempotency-Key.
Do not retry if deadline nearly exhausted.
Do not retry timeouts on POST without dedupe.

5. Bulkhead + circuit breaker (per dependency)

Limit concurrent calls and open breaker on slow/failure rate—stops occupying all threads for full timeout window — circuit breaker guide.

6. Cancel work when client disconnects

Servlet/async: abort downstream calls when client closes connection—frees threads earlier.

7. JDBC and messaging

queryTimeout / Hikari connectionTimeout within DB slice of budget.
Kafka consumer: max.poll.interval.ms aligned with processing time.

Reference chain (502 guide)

client_timeout ≥ gateway_timeout ≥ service_read_timeout ≥ db_timeout

Document in repo README or platform standards — see 502 healthy pods.

Verify

Integration test: slow C → A returns within gateway budget; does not wait 30s×3.
Load test with retry disabled vs enabled—measure amplification.
Trace shows decreasing deadline per hop.

Interview one-liner

“I set an end-to-end deadline at the edge and give each hop a shorter timeout than its caller, propagate remaining time, limit retries to idempotent ops with backoff, and use bulkheads and circuit breakers so one slow service does not hold every thread for the full 30 seconds.”