Clients retry on 429 and make the outage worse
Scenario
You rate-limit to protect the API. Clients receive 429 Too Many Requests, immediately retry—often in sync—and effective load becomes 2–10× intended RPS. The service never recovers; everyone sees errors. This retry storm is a classic metastable failure, related to timeout cascades and brief unresponsive windows.
After reading, you should be able to:
- Design server 429 responses with
Retry-Afterand clear limits. - Configure clients: backoff, jitter, max retries, respect 429 semantics.
- Pair retries with idempotency and circuit breakers.
- Detect retry amplification in metrics during incidents.
Why — retries turn protection into overload
Rate limiting protects your service by rejecting excess requests fast (cheap). If every rejected client retries immediately—and many clients share the same backoff schedule— traffic stays above capacity. 429 means “slow down”; ignoring it defeats the limiter. Mobile apps, SDKs, gateways, and batch jobs may all retry unless explicitly disciplined.
429 vs 503 (client behavior)
| Code | Meaning | Typical client action |
|---|---|---|
| 429 | Rate limit / quota | Backoff per Retry-After |
| 503 | Unavailable / overload | Retry with backoff (risk storm) |
| 500 | Server error | Retry only if idempotent |
What — detect a retry storm
- 429 rate high but unique clients moderate—same clients hammering.
-
Gateway access logs
— same
client_id/ IP with many requests per second after 429s. - RPS > sustainable capacity even after limiting enabled—retry multiplier.
-
Retry headers
—
Retry-Afterignored; fixed-interval retries visible in client metrics. - Downstream sync retries — job scheduler retries all failed rows at once after batch 429.
How — server and client design
Server: rate limit correctly
- Limit per API key / tenant / IP—not only global (fairness).
- Return 429 (not 503) when limit exceeded—signals client to throttle.
- Include headers:
HTTP/1.1 429 Too Many Requests Retry-After: 2 X-RateLimit-Limit: 1000 X-RateLimit-Remaining: 0 X-RateLimit-Reset: 1716123456
Implement with gateway (Kong, Envoy), Redis token bucket, or Bucket4j in app—fail fast, cheap path.
Client: retry policy (required)
# Pseudocode
if response.status == 429:
wait = parse_retry_after(response) # seconds; cap max e.g. 60s
wait += random_jitter(0, wait * 0.2)
sleep(wait)
retry_at_most(3_total)
else if response.status in (500, 502, 503, 504):
if idempotent:
exponential_backoff_with_jitter(base=1s, max=30s, max_attempts=3)
else:
do_not_retry
Rules of thumb
| Rule | Why |
|---|---|
| Exponential backoff + jitter | Desynchronize clients |
| Cap max retries | Stop infinite loops |
Respect Retry-After | Server tells you minimum wait |
| Idempotency-Key on POST retries | — idempotency guide |
| Circuit breaker after repeated 429 | — circuit breaker |
| No retry parallel fan-out | One retry per logical operation |
SDK / platform guidance
- Document client retry policy in API docs; provide reference implementation.
- GraphQL/mobile: debounce user double-submit at UI layer.
- Batch workers: spread retries over minutes, not seconds.
Incident response
- Identify top retrying clients; contact owners to disable or fix backoff.
- Temporary stricter limits at edge; shed non-critical routes.
- Scale only if capacity is the issue—not if retries are the multiplier.
Verify
- Load test: clients with bad retry policy vs good—good stays under limit.
- 429 rate drops when incident clients back off.
- Metrics:
retry_countper client id bounded.
Interview one-liner
“On the server I return 429 with Retry-After and per-tenant limits; on the client I use exponential backoff with jitter, cap retries, only retry idempotent calls, and open a circuit after sustained 429—so rate limiting protects the service instead of triggering a retry storm.”