sharpbyte.dev

Design a distributed rate limiter

A rate limiter caps how often a client can call your APIs—protecting databases, controlling cost, and ensuring fair use across tenants. At one machine, a counter in memory is enough. At platform scale, thousands of stateless servers must share one consistent budget per API key, user, or IP, with low latency and no race-induced over-admission.

Interviews focus on algorithm choice (token bucket vs sliding window), Redis (or similar) for centralized state, atomic updates, multi-region behavior, and what happens when the limiter itself fails—fail-open vs fail-closed. This guide matches the depth of the other classic prompts here, including failure points and failure modes.

Design prompt

Design a distributed rate limiting service used by all API gateways and internal microservices.

Support per-client rules, burst traffic, low latency (< 5 ms p99), and consistent enforcement across regions.

What you should be able to do after reading:

1. Requirements gathering

1.1 Functional requirements

Usually out of scope unless asked: full WAF/DDoS (Layer 3/4), bot detection ML, billing metering (related but separate product).

1.2 Non-functional requirements

Assumptions for capacity math: 50,000 peak API RPS platform-wide; each request performs 1 rate limit check; average 2 rules evaluated per request (global + per API key); 10M registered API keys; 1% of keys active in any minute; Redis cluster in each region.

2. Capacity estimation

2.1 Check throughput

Peak API RPS = 50,000
Rules per request = 2
Peak rate-limit checks/sec = 50,000 × 2 = 100,000/sec per region

3 regions (US, EU, APAC) → ~100k/sec each if traffic geo-routed
Global Redis ops (if single global store) = 300,000 ops/sec (upper bound)

Each check is typically 1–2 Redis commands (GET + INCR or single Lua EVAL). Plan for 100k–300k Redis ops/sec depending on architecture.

2.2 Redis sizing (rule of thumb)

Single Redis shard: ~100k–250k simple ops/sec depending on instance and pipelining. At 100k checks/sec you need cluster mode with multiple shards or aggressive local caching.

Sharding key: hash(api_key) mod N_shards
Hot key risk: one viral partner → single shard overload

2.3 Memory per counter

Active keys in 1-minute window ≈ 10M × 1% = 100,000 hot keys
Bytes per key (token bucket state): ~64 bytes (key + tokens + last_refill_ts)
Memory ≈ 100k × 64 B ≈ 6.4 MB hot set (tiny)

Rules/config in DB: 10M keys × 200 B rules ≈ 2 GB (cached in gateway memory)

Memory is rarely the bottleneck—ops/sec and hot keys are.

2.4 Network

100k checks/sec × 200 bytes round-trip ≈ 20 MB/s Redis traffic per region
Co-locate limiter Redis with API gateway AZ to minimize RTT

3. High-level design

3.1 Placement options

PlacementProsCons
API gatewayCentral enforcement; one config surfaceInternal service-to-service calls may bypass
Sidecar (Envoy)Per-pod; uniform for mesh trafficOps complexity
Shared library + RedisFlexible in any serviceInconsistent if not all services adopt
Dedicated Rate Limiter servicegRPC/HTTP /check; language-agnosticExtra network hop; must be HA

Common interview answer: gateway for edge traffic + library/service for internal critical paths.

flowchart LR
  C[Client] --> GW[API Gateway]
  GW --> LC[Local allowance cache]
  LC -->|miss| RL[Rate Limiter service]
  RL --> R[(Redis cluster)]
  GW --> API[Backend APIs]
  CFG[Config service] --> GW
  CFG --> RL
  RL --> AN[Analytics Kafka]
    

Request path

  1. Request arrives with Authorization: Bearer <api_key>.
  2. Gateway resolves rule: 1000 req/min, burst 1500.
  3. Build limiter key: rl:api_key:abc123:rule:default.
  4. Call Redis (Lua) → allowed + remaining tokens, or denied + retry-after.
  5. If allowed → forward to backend; if denied → 429 without hitting DB.
sequenceDiagram
  participant C as Client
  participant G as Gateway
  participant R as Redis
  participant B as Backend
  C->>G: GET /v1/resource
  G->>R: EVAL token_bucket Lua
  alt allowed
    R-->>G: OK remaining=742
    G->>B: proxy request
    B-->>G: 200
    G-->>C: 200 + rate limit headers
  else denied
    R-->>G: DENY retry_after=12
    G-->>C: 429 Too Many Requests
  end
    

4. Database and state design

4.1 Rules store (PostgreSQL)

CREATE TABLE rate_limit_rules (
  id            UUID PRIMARY KEY,
  subject_type  TEXT NOT NULL,  -- api_key | user | ip | global
  subject_id    TEXT NOT NULL,
  window_sec    INT NOT NULL,
  limit_count   INT NOT NULL,
  burst_count   INT,            -- optional for token bucket
  algorithm     TEXT NOT NULL DEFAULT 'token_bucket',
  enabled       BOOLEAN NOT NULL DEFAULT TRUE,
  updated_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  UNIQUE (subject_type, subject_id)
);

Gateways cache rules in memory; invalidate on config version bump via pub/sub.

4.2 Redis key design

# Token bucket (hash)
KEY:   rl:tb:{subject_type}:{subject_id}
FIELDS: tokens (float), last_refill_ms (int)

# Sliding window counter (string counters)
KEY:   rl:sw:{subject}:{window_start_epoch}
VALUE: count (INCR)

TTL:   window_sec + small buffer (auto expire)

4.3 Atomic token bucket (Lua sketch)

-- KEYS[1] = bucket key
-- ARGV[1] = refill_rate per sec
-- ARGV[2] = capacity (burst)
-- ARGV[3] = now_ms
-- ARGV[4] = requested tokens (usually 1)

local data = redis.call('HMGET', KEYS[1], 'tokens', 'ts')
local tokens = tonumber(data[1]) or ARGV[2]
local last_ts = tonumber(data[2]) or ARGV[3]
local delta = math.max(0, ARGV[3] - last_ts) / 1000
tokens = math.min(ARGV[2], tokens + delta * ARGV[1])
if tokens >= ARGV[4] then
  tokens = tokens - ARGV[4]
  redis.call('HMSET', KEYS[1], 'tokens', tokens, 'ts', ARGV[3])
  redis.call('EXPIRE', KEYS[1], 3600)
  return {1, tokens}  -- allowed, remaining
else
  return {0, tokens}  -- denied, remaining
end

Lua runs atomically on the Redis shard that owns the key—critical to prevent over-admission.

5. API design

5.1 Check allowance (internal)

POST /v1/ratelimit/check

{
  "subject": { "type": "api_key", "id": "key_abc" },
  "rule_id": "default",
  "cost": 1
}

200 OK — allowed

{
  "allowed": true,
  "limit": 1000,
  "remaining": 742,
  "reset_at": "2026-05-27T20:16:00Z"
}

200 OK — denied (or 429 at gateway)

{
  "allowed": false,
  "retry_after_sec": 12
}

5.2 Admin: set rule

PUT /v1/ratelimit/rules/{subject_type}/{subject_id}

{
  "window_sec": 60,
  "limit_count": 1000,
  "burst_count": 1500,
  "algorithm": "token_bucket"
}

5.3 Response headers (edge)

HTTP/1.1 200 OK
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 742
X-RateLimit-Reset: 1716840960
Retry-After: 12          # only on 429

6. Diving deep into key components

6.1 Algorithm comparison

AlgorithmBehaviorBurstBoundary issueInterview use
Fixed windowCounter per clock minuteHard cap per window2× spike at window edgesSimplest; mention flaw
Sliding window logStore each request timestampAccurateMemory heavyPrecision when asked
Sliding window counterWeighted previous + current windowSmoothApproximateCommon Redis pattern
Token bucketTokens refill at rate r, cap BAllows burst up to BNone at boundaryDefault recommendation
Leaky bucketConstant outflow rateSmooth outputQueues burstsTraffic shaping to downstream

6.2 Fixed window spike (worked example)

Limit 100 req/min. Client sends 100 requests at 00:00:59 and 100 at 00:01:00 → 200 in 2 seconds while “complying” with two windows. Sliding window or token bucket fixes this.

6.3 Sliding window counter (Redis)

window = 60 sec, limit = 1000
now = current minute bucket t1, previous bucket t0

count = count_t0 * (1 - elapsed/60) + count_t1
if count >= 1000: deny
else: INCR count_t1, allow

6.4 Token bucket parameters

6.5 Local cache (hybrid)

Gateway keeps in-memory “spent” counter for 50–100 ms, batching Redis updates. Risk: slight over-admission across pods—acceptable for abuse prevention if global cap still enforced on Redis every N requests or on deny path always hit Redis.

Safer pattern: local cache only for known-good remaining > threshold; always consult Redis when local estimate low.

6.6 Multi-region strategies

StrategyConsistencyLatencyNotes
Regional Redis onlyPer-region budget (1/N of global)BestUser may get N× limit via VPN regions—often OK
Global Redis (single primary)Strong globalCross-region RTTHard at scale
Async cross-region syncEventualLow localComplex; rare in interviews

Interview default: partition budget by region or route user to home region sticky.

6.7 Fail-open vs fail-closed

PolicyWhen Redis downRisk
Fail-openAllow trafficOutage → abuse, overload backend
Fail-closedReturn 503/429Outage → product down
Degraded local limitConservative per-pod capUnder-utilization; uneven across pods

Public APIs often fail-closed for paid tiers only after local degrade; internal admin APIs may fail-open with alert.

6.8 Hot key mitigation

7. Failure points

Failure points are where faults cause over-admission (abuse), false rejects (revenue loss), or cascading outages. Rate limiting is on the critical path—design explicitly for Redis and config failures.

#Failure pointWhat breaksDetectionMitigation design
FP1 Gateway bypass Traffic hits backend directly Spike bypasses 429 mTLS internal only; network policy; library in all entrypoints
FP2 Non-atomic read-modify-write Two pods INCR without Lua Limit 1000 but 1500 admitted Lua / Redis INCR single command / Redlock not needed if single key op atomic
FP3 Redis primary failure Checks timeout 429/503 spike; latency SLO breach Replica promotion; circuit breaker; degraded local limit
FP4 Hot key on one shard Single shard CPU 100% p99 latency for one API key Local sub-pool; dedicated shard; raise partner tier hardware
FP5 Stale rule cache Config lowered limit not applied Partner exceeds new cap Versioned config; TTL + pub/sub invalidation
FP6 Clock skew Window boundaries wrong Uneven deny rate Redis TIME; token bucket uses monotonic ms from Redis
FP7 Cross-region double spend User hits US + EU 2× global budget Regional quota split; global id in key; sticky routing
FP8 Local cache over-admit Pods each spend local budget Aggregate RPS > limit Sync to Redis before local budget exhausted; conservative factor
FP9 Fail-open misconfiguration Redis blip → unlimited traffic Backend DB meltdown Default fail-closed for public; cap fail-open duration
FP10 429 retry storm Clients retry immediately Traffic amplifies 3× Retry-After header; jittered backoff; don’t count 429 against limit optionally
flowchart LR
  C[Client] -->|FP1| GW[Gateway]
  GW -->|FP8| LC[Local cache]
  LC -->|FP2 FP3| R[(Redis)]
  GW -->|FP5| CFG[Config]
  GW -->|FP7| R2[(Region Redis)]
  C -->|FP10| C
  R -->|FP4| R
  R -->|FP6| R
  GW -->|FP9| API[Backend]
    

8. Failure modes

8.1 Window boundary spike (fixed window)

Symptom: Traffic doubles for 1 second at minute rollovers.

Cause: Fixed window algorithm without sliding/token bucket.

Safe response: Migrate to token bucket or sliding window counter.

8.2 Race over-admission

Symptom: Partner exceeds contracted RPS during load test.

Cause: FP2 — GET then SET from two gateways.

Safe response: Lua script; single atomic INCR with EXPIRE for fixed window fallback.

8.3 Redis outage → backend overload

Symptom: DB CPU 100%; limiter “fixed” when Redis returns.

Cause: FP9 fail-open.

Safe response: Fail-closed or strict per-pod emergency cap; queue at gateway.

8.4 Redis outage → mass 503

Symptom: All API calls rejected though backend healthy.

Cause: FP3 fail-closed without degrade path.

Safe response: Degraded local limiter; multi-AZ Redis Sentinel/cluster.

8.5 Hot key latency collapse

Symptom: One API key sees 5 s latency; others fine.

Cause: FP4 — viral integration on single Redis slot.

Safe response: Pre-allocated local token pool; move key to dedicated shard.

8.6 Stale limits after downgrade

Symptom: Abusive key not throttled after ops cut limit.

Cause: FP5 — gateway cache TTL 5 min.

Safe response: Push invalidation; include config_version in check path.

8.7 Multi-region budget multiplication

Symptom: Global partner loads 3 regions; 3× traffic accepted.

Cause: FP7 — independent regional counters.

Safe response: Document as design choice or add lightweight global sync counter for enterprise keys.

8.8 False rejects (flapping)

Symptom: Paying customer sees 429 at 80% of true usage.

Cause: FP8 local cache too aggressive; clock skew FP6.

Safe response: Tune burst; use Redis TIME; expose remaining in headers for client backoff.

8.9 Retry amplification

Symptom: Effective RPS 3× after rate limit events.

Cause: FP10 — clients ignore Retry-After.

Safe response: Exponential backoff docs; penalize rapid retries (optional separate counter).

8.10 Thundering herd on Redis recovery

Symptom: Redis comes back; 100k reconnects stall cluster.

Cause: All gateways hammer Redis simultaneously.

Safe response: Jittered reconnect; ramp local caches; circuit breaker half-open probe.

Failure modePrimary failure pointsImpactCore mitigation
Boundary spikeAlgorithmBackend spikeToken bucket / sliding window
Race over-admissionFP2AbuseLua atomicity
Fail-open overloadFP9, FP3Outage cascadeDegraded cap / fail-closed
Fail-closed outageFP3Product downHA Redis + local degrade
Hot keyFP4LatencyShard / local pool
Stale configFP5Policy missInvalidation
Region multiplicationFP7Over quotaSplit budget / global key
False rejectsFP6, FP8Customer painTune cache; Redis TIME
Retry stormFP10Amplified loadRetry-After + jitter
Recovery herdFP3Redis flapCircuit breaker ramp

9. Scalability, availability, and security

9.1 Scalability

9.2 Availability

9.3 Security

10. Tradeoffs recap

DecisionCommon choiceWhy
AlgorithmToken bucketBurst + smooth refill
StoreRedis + LuaFast atomic in-memory ops
PlacementAPI gatewaySingle enforcement point
Multi-regionRegional budget splitLatency vs global accuracy
Redis downFail-closed + degradeProtect backend first

11. How to present this in 45 minutes

  1. 5 min — requirements; clarify distributed vs single host; rules per key.
  2. 7 min — capacity: 100k checks/sec, Redis cluster, hot keys.
  3. 8 min — algorithms table; pick token bucket; show fixed-window flaw.
  4. 8 min — architecture: gateway → Redis Lua; config service.
  5. 10 minfailure points + fail-open/closed + failure modes.
  6. 7 min — multi-region, local cache tradeoff, headers, extensions (weighted cost).

The one line to remember

A distributed rate limiter is a centralized counter with atomic decrements—usually token bucket in Redis + Lua—sitting as close to the edge as possible, with an explicit story for hot keys, multi-region budgets, and whether you fail-open or fail-closed when Redis is unavailable.