Clients retry on 429 and make the outage worse

Scenario

You rate-limit to protect the API. Clients receive 429 Too Many Requests, immediately retry—often in sync—and effective load becomes 2–10× intended RPS. The service never recovers; everyone sees errors. This retry storm is a classic metastable failure, related to timeout cascades and brief unresponsive windows.

After reading, you should be able to:

Design server 429 responses with Retry-After and clear limits.
Configure clients: backoff, jitter, max retries, respect 429 semantics.
Pair retries with idempotency and circuit breakers.
Detect retry amplification in metrics during incidents.

Why — retries turn protection into overload

Rate limiting protects your service by rejecting excess requests fast (cheap). If every rejected client retries immediately—and many clients share the same backoff schedule— traffic stays above capacity. 429 means “slow down”; ignoring it defeats the limiter. Mobile apps, SDKs, gateways, and batch jobs may all retry unless explicitly disciplined.

429 vs 503 (client behavior)

Code	Meaning	Typical client action
429	Rate limit / quota	Backoff per `Retry-After`
503	Unavailable / overload	Retry with backoff (risk storm)
500	Server error	Retry only if idempotent

What — detect a retry storm

429 rate high but unique clients moderate—same clients hammering.
Gateway access logs — same client_id / IP with many requests per second after 429s.
RPS > sustainable capacity even after limiting enabled—retry multiplier.
Retry headers — Retry-After ignored; fixed-interval retries visible in client metrics.
Downstream sync retries — job scheduler retries all failed rows at once after batch 429.

How — server and client design

Server: rate limit correctly

Limit per API key / tenant / IP—not only global (fairness).
Return 429 (not 503) when limit exceeded—signals client to throttle.
Include headers:

HTTP/1.1 429 Too Many Requests
Retry-After: 2
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1716123456

Implement with gateway (Kong, Envoy), Redis token bucket, or Bucket4j in app—fail fast, cheap path.

Client: retry policy (required)

# Pseudocode
if response.status == 429:
  wait = parse_retry_after(response)  # seconds; cap max e.g. 60s
  wait += random_jitter(0, wait * 0.2)
  sleep(wait)
  retry_at_most(3_total)
else if response.status in (500, 502, 503, 504):
  if idempotent:
    exponential_backoff_with_jitter(base=1s, max=30s, max_attempts=3)
  else:
    do_not_retry

Rules of thumb

Rule	Why
Exponential backoff + jitter	Desynchronize clients
Cap max retries	Stop infinite loops
Respect `Retry-After`	Server tells you minimum wait
Idempotency-Key on POST retries	— idempotency guide
Circuit breaker after repeated 429	— circuit breaker
No retry parallel fan-out	One retry per logical operation

SDK / platform guidance

Document client retry policy in API docs; provide reference implementation.
GraphQL/mobile: debounce user double-submit at UI layer.
Batch workers: spread retries over minutes, not seconds.

Incident response

Identify top retrying clients; contact owners to disable or fix backoff.
Temporary stricter limits at edge; shed non-critical routes.
Scale only if capacity is the issue—not if retries are the multiplier.

Verify

Load test: clients with bad retry policy vs good—good stays under limit.
429 rate drops when incident clients back off.
Metrics: retry_count per client id bounded.

Interview one-liner

“On the server I return 429 with Retry-After and per-tenant limits; on the client I use exponential backoff with jitter, cap retries, only retry idempotent calls, and open a circuit after sustained 429—so rate limiting protects the service instead of triggering a retry storm.”