Design a Distributed Rate Limiter
Throttle API requests to N per minute per API key across a fleet of services, with low overhead and consistent behavior across regions.
Required building blocks
Rate Limiting
Key-Value Store
API Gateway
Nice to have
Consistent Hashing
Circuit Breaker
Canonical answer
Token bucket or sliding window log/counter in Redis. Atomic INCR + EXPIRE. Per-region shard if global isn't required.
Capacity estimation
- Fleet of 100 services × ~5K req/sec each → ~500K req/sec must be checked.
- Each check is one Redis INCR + EXPIRE (~0.2 ms) → a 16-shard Redis cluster handles ~1M ops/sec headroom.
- Memory: 10M active API keys × ~100 B/bucket × 4 windows ≈ 4 GB per region — fits in RAM trivially.
- Latency budget: add <2 ms to p99 → co-locate Redis in same AZ as API gateway.
- Cross-region: replicate counters via CRDTs or shard by key region — eventual consistency acceptable for soft limits.
Architecture
Client ─→ API Gateway / Envoy ─→ Backend Svcs
│
▼
Rate Limit Filter
│
▼
Redis Cluster (sharded by api_key)
│
┌─────────┴─────────┐
▼ ▼
Token Bucket Sliding Window
(per-key INCR) (sorted-set log)
│
▼
Metrics → Prometheus
Overflow → Kafka (audit)API
- Internal: rateLimit.check(api_key, route) → { allowed, remaining, reset_at }
- PUT /admin/limits/:api_key { rps, burst } → 204 (control plane)
- GET /admin/limits/:api_key → { rps, burst, current_usage }
- Response header on throttle: 429 + Retry-After + X-RateLimit-Remaining
Data model
buckets (Redis, key = "rl:{api_key}:{route}:{window}"):
value : int counter
ttl : window_size_seconds
policies (SQL, control plane):
api_key (PK) : string
tier : enum(free|pro|enterprise)
rps : int
burst : int
updated_at : timestamp
audit_log (Kafka topic):
api_key, route, ts, allowed, remainingConcept blurbs
Rate Limiting
Throttle requests per user/IP with token bucket, leaky bucket, or sliding window.
Key-Value Store
O(1) get/put by key; massively scalable (DynamoDB, Redis, Cassandra).
API Gateway
Single entry point: auth, rate limit, routing, transformation.
Consistent Hashing
Distribute keys across nodes with minimal remapping on resize.
Circuit Breaker
Stop calling failing dependencies; fail fast and recover gracefully.