Design a Web Search Autocomplete
Suggest top-k completions for a partial query in <100ms for global traffic.
Required building blocks
Search Index
Cache-Aside (Lazy Loading)
CDN
Load Balancer
Nice to have
Bloom Filter
Rate Limiting
Canonical answer
Trie of prefixes with top-k cached at each node. Refresh from query logs nightly. Edge CDN cache for hot prefixes.
Capacity estimation
- 1B searches/day → ~12K QPS average; autocomplete fires ~5x per query → ~60K prefix-lookup QPS, ~150K at peak.
- <100ms p99 budget → trie lookup ~5ms + cache hit ~2ms; cold edges go to origin trie service.
- Vocabulary: ~50M unique prefixes × top-5 completions × 50 B ≈ 12 GB trie in memory per node.
- Query log ingest: 1B queries/day × 100 B ≈ 100 GB/day to feed nightly rebuild.
- Edge CDN caches hot prefixes (~10K prefixes serve 50% of traffic) → origin sees ~30K QPS.
- Rebuild pipeline: Hadoop/Spark over 30d log window → ranked top-k merged into shipped trie snapshot.
Architecture
Browser ─→ CDN (edge cache hot prefixes)
│ miss
▼
Load Balancer
│
▼
Autocomplete Svc (stateless)
│
┌──────┴──────┐
▼ ▼
Redis Cache Trie Service (sharded by prefix-hash)
│
▼
Trie Snapshot (S3) ◀── Nightly Rebuild
▲
│
Query Logs (Kafka)
▲
│
Browser (search events)API
- GET /autocomplete?q=prefix&limit=10 → { suggestions: [{ text, score }] }
- POST /events/search { query, user_id?, ts } → 202 (logs to Kafka)
- GET /trending?region=US → { queries[] } (refreshed every minute)
- Admin: POST /index/rebuild → { job_id, eta }
Data model
trie_node (in-memory + S3 snapshot):
prefix (PK) : string
top_k : list<{ completion, score }> (k=10)
child_offsets : map<char, node_offset>
query_log (Kafka → S3 partitioned by hour):
query : string
user_id : uuid?
region : string
ts : timestamp
clicked_result : string?
trending (Redis, key = "trend:{region}"):
sorted_set of query → recency-weighted countConcept blurbs
Search Index
Inverted index for full-text search and faceting (Elasticsearch, OpenSearch).
Cache-Aside (Lazy Loading)
App reads cache first; on miss, loads from DB and populates cache.
CDN
Edge-cached static (and sometimes dynamic) content close to users.
Load Balancer
Distribute requests across healthy backends (L4 or L7).
Bloom Filter
Probabilistic set membership — no false negatives, tunable false positives.
Rate Limiting
Throttle requests per user/IP with token bucket, leaky bucket, or sliding window.