Design a Web Search Autocomplete

Design a Web Search Autocomplete

Suggest top-k completions for a partial query in <100ms for global traffic.

Required building blocks
Search Index
Cache-Aside (Lazy Loading)
CDN
Load Balancer
Nice to have
Bloom Filter
Rate Limiting
Canonical answer

Trie of prefixes with top-k cached at each node. Refresh from query logs nightly. Edge CDN cache for hot prefixes.

Capacity estimation
  • 1B searches/day → ~12K QPS average; autocomplete fires ~5x per query → ~60K prefix-lookup QPS, ~150K at peak.
  • <100ms p99 budget → trie lookup ~5ms + cache hit ~2ms; cold edges go to origin trie service.
  • Vocabulary: ~50M unique prefixes × top-5 completions × 50 B ≈ 12 GB trie in memory per node.
  • Query log ingest: 1B queries/day × 100 B ≈ 100 GB/day to feed nightly rebuild.
  • Edge CDN caches hot prefixes (~10K prefixes serve 50% of traffic) → origin sees ~30K QPS.
  • Rebuild pipeline: Hadoop/Spark over 30d log window → ranked top-k merged into shipped trie snapshot.
Architecture
Browser ─→ CDN (edge cache hot prefixes)
              │ miss
              ▼
        Load Balancer
              │
              ▼
       Autocomplete Svc (stateless)
              │
       ┌──────┴──────┐
       ▼             ▼
   Redis Cache    Trie Service (sharded by prefix-hash)
                       │
                       ▼
                  Trie Snapshot (S3) ◀── Nightly Rebuild
                                          ▲
                                          │
                                     Query Logs (Kafka)
                                          ▲
                                          │
                                       Browser (search events)
API
  • GET /autocomplete?q=prefix&limit=10 → { suggestions: [{ text, score }] }
  • POST /events/search { query, user_id?, ts } → 202 (logs to Kafka)
  • GET /trending?region=US → { queries[] } (refreshed every minute)
  • Admin: POST /index/rebuild → { job_id, eta }
Data model
trie_node (in-memory + S3 snapshot):
  prefix (PK)     : string
  top_k           : list<{ completion, score }> (k=10)
  child_offsets   : map<char, node_offset>

query_log (Kafka → S3 partitioned by hour):
  query           : string
  user_id         : uuid?
  region          : string
  ts              : timestamp
  clicked_result  : string?

trending (Redis, key = "trend:{region}"):
  sorted_set of query → recency-weighted count
Concept blurbs
Search Index
Inverted index for full-text search and faceting (Elasticsearch, OpenSearch).
Cache-Aside (Lazy Loading)
App reads cache first; on miss, loads from DB and populates cache.
CDN
Edge-cached static (and sometimes dynamic) content close to users.
Load Balancer
Distribute requests across healthy backends (L4 or L7).
Bloom Filter
Probabilistic set membership — no false negatives, tunable false positives.
Rate Limiting
Throttle requests per user/IP with token bucket, leaky bucket, or sliding window.