Design a distributed rate limiter
A rate limiter caps how often a client can call your APIs—protecting databases, controlling cost, and ensuring fair use across tenants. At one machine, a counter in memory is enough. At platform scale, thousands of stateless servers must share one consistent budget per API key, user, or IP, with low latency and no race-induced over-admission.
Interviews focus on algorithm choice (token bucket vs sliding window), Redis (or similar) for centralized state, atomic updates, multi-region behavior, and what happens when the limiter itself fails—fail-open vs fail-closed. This guide matches the depth of the other classic prompts here, including failure points and failure modes.
Design prompt
Design a distributed rate limiting service used by all API gateways and internal microservices.
Support per-client rules, burst traffic, low latency (< 5 ms p99), and consistent enforcement across regions.
What you should be able to do after reading:
- Compare fixed window, sliding window, token bucket, and leaky bucket.
- Implement atomic checks with Redis + Lua (or cell-based counters).
- Place the limiter at gateway vs sidecar vs library—and explain tradeoffs.
- Size Redis for peak check RPS and hot keys.
- Map failure points and choose fail-open vs fail-closed per API class.
1. Requirements gathering
1.1 Functional requirements
- Allow / deny decision — given identity + rule, return allowed or
429 Too Many Requests. - Rule types — limit per API key, user ID, IP, endpoint, or global platform cap.
- Configurable limits — e.g. 1000 requests/minute with burst 2000.
- Multiple windows — per second + per day on same client (optional).
- Dynamic config — ops updates limits without redeploying all services.
- Headers — return
X-RateLimit-Limit,Remaining,Reset(RFC-style). - Whitelist / blacklist — bypass or hard-block specific keys.
Usually out of scope unless asked: full WAF/DDoS (Layer 3/4), bot detection ML, billing metering (related but separate product).
1.2 Non-functional requirements
- Low latency — p99 < 5 ms added to request path (often < 2 ms with local cache).
- High throughput — millions of checks per second platform-wide.
- Accuracy — do not admit 10× budget because of races (eventual consistency OK for some tiers).
- Availability — limiter outage must not take down entire product (policy-dependent).
- Consistency — strong per-key consistency in one region; cross-region may be approximate.
- Fairness — no single tenant starves others on shared infrastructure.
- Observability — top violators, 429 rate per rule, Redis latency.
Assumptions for capacity math: 50,000 peak API RPS platform-wide; each request performs 1 rate limit check; average 2 rules evaluated per request (global + per API key); 10M registered API keys; 1% of keys active in any minute; Redis cluster in each region.
2. Capacity estimation
2.1 Check throughput
Peak API RPS = 50,000 Rules per request = 2 Peak rate-limit checks/sec = 50,000 × 2 = 100,000/sec per region 3 regions (US, EU, APAC) → ~100k/sec each if traffic geo-routed Global Redis ops (if single global store) = 300,000 ops/sec (upper bound)
Each check is typically 1–2 Redis commands (GET + INCR or single Lua EVAL). Plan for 100k–300k Redis ops/sec depending on architecture.
2.2 Redis sizing (rule of thumb)
Single Redis shard: ~100k–250k simple ops/sec depending on instance and pipelining. At 100k checks/sec you need cluster mode with multiple shards or aggressive local caching.
Sharding key: hash(api_key) mod N_shards Hot key risk: one viral partner → single shard overload
2.3 Memory per counter
Active keys in 1-minute window ≈ 10M × 1% = 100,000 hot keys Bytes per key (token bucket state): ~64 bytes (key + tokens + last_refill_ts) Memory ≈ 100k × 64 B ≈ 6.4 MB hot set (tiny) Rules/config in DB: 10M keys × 200 B rules ≈ 2 GB (cached in gateway memory)
Memory is rarely the bottleneck—ops/sec and hot keys are.
2.4 Network
100k checks/sec × 200 bytes round-trip ≈ 20 MB/s Redis traffic per region Co-locate limiter Redis with API gateway AZ to minimize RTT
3. High-level design
3.1 Placement options
| Placement | Pros | Cons |
|---|---|---|
| API gateway | Central enforcement; one config surface | Internal service-to-service calls may bypass |
| Sidecar (Envoy) | Per-pod; uniform for mesh traffic | Ops complexity |
| Shared library + Redis | Flexible in any service | Inconsistent if not all services adopt |
| Dedicated Rate Limiter service | gRPC/HTTP /check; language-agnostic | Extra network hop; must be HA |
Common interview answer: gateway for edge traffic + library/service for internal critical paths.
- Rate Limiter service — stateless; calls Redis; exposes
Allow(key, rule). - Redis cluster — holds counters / token state; Lua for atomicity.
- Config service — rules per API key; pushed to gateway cache.
- Local cache (optional) — short TTL allowance cache to cut Redis QPS.
- Analytics — async stream of allow/deny events.
flowchart LR
C[Client] --> GW[API Gateway]
GW --> LC[Local allowance cache]
LC -->|miss| RL[Rate Limiter service]
RL --> R[(Redis cluster)]
GW --> API[Backend APIs]
CFG[Config service] --> GW
CFG --> RL
RL --> AN[Analytics Kafka]
Request path
- Request arrives with
Authorization: Bearer <api_key>. - Gateway resolves rule:
1000 req/min, burst 1500. - Build limiter key:
rl:api_key:abc123:rule:default. - Call Redis (Lua) → allowed + remaining tokens, or denied + retry-after.
- If allowed → forward to backend; if denied →
429without hitting DB.
sequenceDiagram
participant C as Client
participant G as Gateway
participant R as Redis
participant B as Backend
C->>G: GET /v1/resource
G->>R: EVAL token_bucket Lua
alt allowed
R-->>G: OK remaining=742
G->>B: proxy request
B-->>G: 200
G-->>C: 200 + rate limit headers
else denied
R-->>G: DENY retry_after=12
G-->>C: 429 Too Many Requests
end
4. Database and state design
4.1 Rules store (PostgreSQL)
CREATE TABLE rate_limit_rules ( id UUID PRIMARY KEY, subject_type TEXT NOT NULL, -- api_key | user | ip | global subject_id TEXT NOT NULL, window_sec INT NOT NULL, limit_count INT NOT NULL, burst_count INT, -- optional for token bucket algorithm TEXT NOT NULL DEFAULT 'token_bucket', enabled BOOLEAN NOT NULL DEFAULT TRUE, updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), UNIQUE (subject_type, subject_id) );
Gateways cache rules in memory; invalidate on config version bump via pub/sub.
4.2 Redis key design
# Token bucket (hash)
KEY: rl:tb:{subject_type}:{subject_id}
FIELDS: tokens (float), last_refill_ms (int)
# Sliding window counter (string counters)
KEY: rl:sw:{subject}:{window_start_epoch}
VALUE: count (INCR)
TTL: window_sec + small buffer (auto expire)
4.3 Atomic token bucket (Lua sketch)
-- KEYS[1] = bucket key
-- ARGV[1] = refill_rate per sec
-- ARGV[2] = capacity (burst)
-- ARGV[3] = now_ms
-- ARGV[4] = requested tokens (usually 1)
local data = redis.call('HMGET', KEYS[1], 'tokens', 'ts')
local tokens = tonumber(data[1]) or ARGV[2]
local last_ts = tonumber(data[2]) or ARGV[3]
local delta = math.max(0, ARGV[3] - last_ts) / 1000
tokens = math.min(ARGV[2], tokens + delta * ARGV[1])
if tokens >= ARGV[4] then
tokens = tokens - ARGV[4]
redis.call('HMSET', KEYS[1], 'tokens', tokens, 'ts', ARGV[3])
redis.call('EXPIRE', KEYS[1], 3600)
return {1, tokens} -- allowed, remaining
else
return {0, tokens} -- denied, remaining
end
Lua runs atomically on the Redis shard that owns the key—critical to prevent over-admission.
5. API design
5.1 Check allowance (internal)
POST /v1/ratelimit/check
{
"subject": { "type": "api_key", "id": "key_abc" },
"rule_id": "default",
"cost": 1
}
200 OK — allowed
{
"allowed": true,
"limit": 1000,
"remaining": 742,
"reset_at": "2026-05-27T20:16:00Z"
}
200 OK — denied (or 429 at gateway)
{
"allowed": false,
"retry_after_sec": 12
}
5.2 Admin: set rule
PUT /v1/ratelimit/rules/{subject_type}/{subject_id}
{
"window_sec": 60,
"limit_count": 1000,
"burst_count": 1500,
"algorithm": "token_bucket"
}
5.3 Response headers (edge)
HTTP/1.1 200 OK X-RateLimit-Limit: 1000 X-RateLimit-Remaining: 742 X-RateLimit-Reset: 1716840960 Retry-After: 12 # only on 429
6. Diving deep into key components
6.1 Algorithm comparison
| Algorithm | Behavior | Burst | Boundary issue | Interview use |
|---|---|---|---|---|
| Fixed window | Counter per clock minute | Hard cap per window | 2× spike at window edges | Simplest; mention flaw |
| Sliding window log | Store each request timestamp | Accurate | Memory heavy | Precision when asked |
| Sliding window counter | Weighted previous + current window | Smooth | Approximate | Common Redis pattern |
| Token bucket | Tokens refill at rate r, cap B | Allows burst up to B | None at boundary | Default recommendation |
| Leaky bucket | Constant outflow rate | Smooth output | Queues bursts | Traffic shaping to downstream |
6.2 Fixed window spike (worked example)
Limit 100 req/min. Client sends 100 requests at 00:00:59 and 100 at 00:01:00 → 200 in 2 seconds while “complying” with two windows. Sliding window or token bucket fixes this.
6.3 Sliding window counter (Redis)
window = 60 sec, limit = 1000 now = current minute bucket t1, previous bucket t0 count = count_t0 * (1 - elapsed/60) + count_t1 if count >= 1000: deny else: INCR count_t1, allow
6.4 Token bucket parameters
- refill_rate — sustained RPS (e.g. 1000/60 ≈ 16.67 tokens/sec for 1000/min).
- capacity — max burst (e.g. 1500 allows short spikes).
- cost — expensive endpoints deduct 10 tokens (weighted limits).
6.5 Local cache (hybrid)
Gateway keeps in-memory “spent” counter for 50–100 ms, batching Redis updates. Risk: slight over-admission across pods—acceptable for abuse prevention if global cap still enforced on Redis every N requests or on deny path always hit Redis.
Safer pattern: local cache only for known-good remaining > threshold; always consult Redis when local estimate low.
6.6 Multi-region strategies
| Strategy | Consistency | Latency | Notes |
|---|---|---|---|
| Regional Redis only | Per-region budget (1/N of global) | Best | User may get N× limit via VPN regions—often OK |
| Global Redis (single primary) | Strong global | Cross-region RTT | Hard at scale |
| Async cross-region sync | Eventual | Low local | Complex; rare in interviews |
Interview default: partition budget by region or route user to home region sticky.
6.7 Fail-open vs fail-closed
| Policy | When Redis down | Risk |
|---|---|---|
| Fail-open | Allow traffic | Outage → abuse, overload backend |
| Fail-closed | Return 503/429 | Outage → product down |
| Degraded local limit | Conservative per-pod cap | Under-utilization; uneven across pods |
Public APIs often fail-closed for paid tiers only after local degrade; internal admin APIs may fail-open with alert.
6.8 Hot key mitigation
- Shard per key — Redis cluster hashes key to slot.
- Local token pool — allocate sub-budget from Redis every 100 ms for hot partners.
- Separate limit tier — dedicated Redis shard for enterprise keys.
- Request coalescing — not for limiter (each request must decide)—but batch INCR in side channel for analytics only.
7. Failure points
Failure points are where faults cause over-admission (abuse), false rejects (revenue loss), or cascading outages. Rate limiting is on the critical path—design explicitly for Redis and config failures.
| # | Failure point | What breaks | Detection | Mitigation design |
|---|---|---|---|---|
| FP1 | Gateway bypass | Traffic hits backend directly | Spike bypasses 429 | mTLS internal only; network policy; library in all entrypoints |
| FP2 | Non-atomic read-modify-write | Two pods INCR without Lua | Limit 1000 but 1500 admitted | Lua / Redis INCR single command / Redlock not needed if single key op atomic |
| FP3 | Redis primary failure | Checks timeout | 429/503 spike; latency SLO breach | Replica promotion; circuit breaker; degraded local limit |
| FP4 | Hot key on one shard | Single shard CPU 100% | p99 latency for one API key | Local sub-pool; dedicated shard; raise partner tier hardware |
| FP5 | Stale rule cache | Config lowered limit not applied | Partner exceeds new cap | Versioned config; TTL + pub/sub invalidation |
| FP6 | Clock skew | Window boundaries wrong | Uneven deny rate | Redis TIME; token bucket uses monotonic ms from Redis |
| FP7 | Cross-region double spend | User hits US + EU | 2× global budget | Regional quota split; global id in key; sticky routing |
| FP8 | Local cache over-admit | Pods each spend local budget | Aggregate RPS > limit | Sync to Redis before local budget exhausted; conservative factor |
| FP9 | Fail-open misconfiguration | Redis blip → unlimited traffic | Backend DB meltdown | Default fail-closed for public; cap fail-open duration |
| FP10 | 429 retry storm | Clients retry immediately | Traffic amplifies 3× | Retry-After header; jittered backoff; don’t count 429 against limit optionally |
flowchart LR
C[Client] -->|FP1| GW[Gateway]
GW -->|FP8| LC[Local cache]
LC -->|FP2 FP3| R[(Redis)]
GW -->|FP5| CFG[Config]
GW -->|FP7| R2[(Region Redis)]
C -->|FP10| C
R -->|FP4| R
R -->|FP6| R
GW -->|FP9| API[Backend]
8. Failure modes
8.1 Window boundary spike (fixed window)
Symptom: Traffic doubles for 1 second at minute rollovers.
Cause: Fixed window algorithm without sliding/token bucket.
Safe response: Migrate to token bucket or sliding window counter.
8.2 Race over-admission
Symptom: Partner exceeds contracted RPS during load test.
Cause: FP2 — GET then SET from two gateways.
Safe response: Lua script; single atomic INCR with EXPIRE for fixed window fallback.
8.3 Redis outage → backend overload
Symptom: DB CPU 100%; limiter “fixed” when Redis returns.
Cause: FP9 fail-open.
Safe response: Fail-closed or strict per-pod emergency cap; queue at gateway.
8.4 Redis outage → mass 503
Symptom: All API calls rejected though backend healthy.
Cause: FP3 fail-closed without degrade path.
Safe response: Degraded local limiter; multi-AZ Redis Sentinel/cluster.
8.5 Hot key latency collapse
Symptom: One API key sees 5 s latency; others fine.
Cause: FP4 — viral integration on single Redis slot.
Safe response: Pre-allocated local token pool; move key to dedicated shard.
8.6 Stale limits after downgrade
Symptom: Abusive key not throttled after ops cut limit.
Cause: FP5 — gateway cache TTL 5 min.
Safe response: Push invalidation; include config_version in check path.
8.7 Multi-region budget multiplication
Symptom: Global partner loads 3 regions; 3× traffic accepted.
Cause: FP7 — independent regional counters.
Safe response: Document as design choice or add lightweight global sync counter for enterprise keys.
8.8 False rejects (flapping)
Symptom: Paying customer sees 429 at 80% of true usage.
Cause: FP8 local cache too aggressive; clock skew FP6.
Safe response: Tune burst; use Redis TIME; expose remaining in headers for client backoff.
8.9 Retry amplification
Symptom: Effective RPS 3× after rate limit events.
Cause: FP10 — clients ignore Retry-After.
Safe response: Exponential backoff docs; penalize rapid retries (optional separate counter).
8.10 Thundering herd on Redis recovery
Symptom: Redis comes back; 100k reconnects stall cluster.
Cause: All gateways hammer Redis simultaneously.
Safe response: Jittered reconnect; ramp local caches; circuit breaker half-open probe.
| Failure mode | Primary failure points | Impact | Core mitigation |
|---|---|---|---|
| Boundary spike | Algorithm | Backend spike | Token bucket / sliding window |
| Race over-admission | FP2 | Abuse | Lua atomicity |
| Fail-open overload | FP9, FP3 | Outage cascade | Degraded cap / fail-closed |
| Fail-closed outage | FP3 | Product down | HA Redis + local degrade |
| Hot key | FP4 | Latency | Shard / local pool |
| Stale config | FP5 | Policy miss | Invalidation |
| Region multiplication | FP7 | Over quota | Split budget / global key |
| False rejects | FP6, FP8 | Customer pain | Tune cache; Redis TIME |
| Retry storm | FP10 | Amplified load | Retry-After + jitter |
| Recovery herd | FP3 | Redis flap | Circuit breaker ramp |
9. Scalability, availability, and security
9.1 Scalability
- Redis Cluster with hash tags for related keys if needed:
{api_key}:tokens. - Horizontal scale Rate Limiter service pods; all state in Redis.
- Pipelining / connection pooling from gateway to Redis.
- Separate read-heavy config from write-heavy counters.
9.2 Availability
- Redis: primary + replica, automatic failover (Sentinel or managed ElastiCache/Memorystore).
- Multi-AZ deployment; limit cross-AZ Redis RTT if co-located with gateway.
- Health check: if Redis unhealthy > N sec → switch degrade policy.
9.3 Security
- Rate limit by API key + IP for stolen key abuse.
- Admin rule APIs behind IAM; audit changes.
- Prevent key enumeration: same 429 body for unknown vs known keys on public edge.
- DDoS at edge still needs CDN/WAF—rate limiter is Layer 7 per identity.
10. Tradeoffs recap
| Decision | Common choice | Why |
|---|---|---|
| Algorithm | Token bucket | Burst + smooth refill |
| Store | Redis + Lua | Fast atomic in-memory ops |
| Placement | API gateway | Single enforcement point |
| Multi-region | Regional budget split | Latency vs global accuracy |
| Redis down | Fail-closed + degrade | Protect backend first |
11. How to present this in 45 minutes
- 5 min — requirements; clarify distributed vs single host; rules per key.
- 7 min — capacity: 100k checks/sec, Redis cluster, hot keys.
- 8 min — algorithms table; pick token bucket; show fixed-window flaw.
- 8 min — architecture: gateway → Redis Lua; config service.
- 10 min — failure points + fail-open/closed + failure modes.
- 7 min — multi-region, local cache tradeoff, headers, extensions (weighted cost).
The one line to remember
A distributed rate limiter is a centralized counter with atomic decrements—usually token bucket in Redis + Lua—sitting as close to the edge as possible, with an explicit story for hot keys, multi-region budgets, and whether you fail-open or fail-closed when Redis is unavailable.