Design a Distributed Rate Limiter

Design a Distributed Rate Limiter

Throttle API requests to N per minute per API key across a fleet of services, with low overhead and consistent behavior across regions.

Required building blocks
Rate Limiting
Key-Value Store
API Gateway
Nice to have
Consistent Hashing
Circuit Breaker
Canonical answer

Token bucket or sliding window log/counter in Redis. Atomic INCR + EXPIRE. Per-region shard if global isn't required.

Capacity estimation
  • Fleet of 100 services × ~5K req/sec each → ~500K req/sec must be checked.
  • Each check is one Redis INCR + EXPIRE (~0.2 ms) → a 16-shard Redis cluster handles ~1M ops/sec headroom.
  • Memory: 10M active API keys × ~100 B/bucket × 4 windows ≈ 4 GB per region — fits in RAM trivially.
  • Latency budget: add <2 ms to p99 → co-locate Redis in same AZ as API gateway.
  • Cross-region: replicate counters via CRDTs or shard by key region — eventual consistency acceptable for soft limits.
Architecture
Client ─→ API Gateway / Envoy ─→ Backend Svcs
                  │
                  ▼
         Rate Limit Filter
                  │
                  ▼
         Redis Cluster (sharded by api_key)
                  │
        ┌─────────┴─────────┐
        ▼                   ▼
   Token Bucket        Sliding Window
   (per-key INCR)      (sorted-set log)
                  │
                  ▼
         Metrics → Prometheus
         Overflow → Kafka (audit)
API
  • Internal: rateLimit.check(api_key, route) → { allowed, remaining, reset_at }
  • PUT /admin/limits/:api_key { rps, burst } → 204 (control plane)
  • GET /admin/limits/:api_key → { rps, burst, current_usage }
  • Response header on throttle: 429 + Retry-After + X-RateLimit-Remaining
Data model
buckets (Redis, key = "rl:{api_key}:{route}:{window}"):
  value          : int counter
  ttl            : window_size_seconds

policies (SQL, control plane):
  api_key (PK)   : string
  tier           : enum(free|pro|enterprise)
  rps            : int
  burst          : int
  updated_at     : timestamp

audit_log (Kafka topic):
  api_key, route, ts, allowed, remaining
Concept blurbs
Rate Limiting
Throttle requests per user/IP with token bucket, leaky bucket, or sliding window.
Key-Value Store
O(1) get/put by key; massively scalable (DynamoDB, Redis, Cassandra).
API Gateway
Single entry point: auth, rate limit, routing, transformation.
Consistent Hashing
Distribute keys across nodes with minimal remapping on resize.
Circuit Breaker
Stop calling failing dependencies; fail fast and recover gracefully.