Design a Web Search Autocomplete

Senior

Suggest top-k completions for a partial query in <100ms for global traffic.

Required building blocks

Search Index

Cache-Aside (Lazy Loading)

CDN

Load Balancer

Nice to have

Bloom Filter

Rate Limiting

Canonical answer

Trie of prefixes with top-k cached at each node. Refresh from query logs nightly. Edge content delivery network (CDN) cache for hot prefixes.

Capacity estimation

1B searches/day → ~12K queries per second (QPS) average; autocomplete fires ~5x per query → ~60K prefix-lookup QPS, ~150K at peak.
<100ms p99 (99th percentile latency) budget → trie lookup ~5ms + cache hit ~2ms; cold edges go to origin trie service.
Vocabulary: ~50M unique prefixes × top-5 completions × 50 B ≈ 12 GB trie in memory per node.
Query log ingest: 1B queries/day × 100 B ≈ 100 GB/day to feed nightly rebuild.
Edge content delivery network (CDN) caches hot prefixes (~10K prefixes serve 50% of traffic) → origin sees ~30K QPS.
Rebuild pipeline: Hadoop/Spark over 30d log window → ranked top-k merged into shipped trie snapshot.

Architecture

Browser ─→ CDN (edge cache hot prefixes)
              │ miss
              ▼
        Load Balancer
              │
              ▼
       Autocomplete Svc (stateless)
              │
       ┌──────┴──────┐
       ▼             ▼
   Redis Cache    Trie Service (sharded by prefix-hash)
                       │
                       ▼
                  Trie Snapshot (S3) ◀── Nightly Rebuild
                                          ▲
                                          │
                                     Query Logs (Kafka)
                                          ▲
                                          │
                                       Browser (search events)

API

GET /autocomplete?q=prefix&limit=10 → { suggestions: [{ text, score }] }
POST /events/search { query, user_id?, ts } → 202 (logs to Kafka)
GET /trending?region=US → { queries[] } (refreshed every minute)
Admin: POST /index/rebuild → { job_id, eta }

Data model

trie_node (in-memory + S3 snapshot):
  prefix (PK)     : string
  top_k           : list<{ completion, score }> (k=10)
  child_offsets   : map<char, node_offset>

query_log (Kafka → S3 partitioned by hour):
  query           : string
  user_id         : uuid?
  region          : string
  ts              : timestamp
  clicked_result  : string?

trending (Redis, key = "trend:{region}"):
  sorted_set of query → recency-weighted count

Concept blurbs

Search Index

Inverted index for full-text search and faceting (Elasticsearch, OpenSearch).

Cache-Aside (Lazy Loading)

App reads cache first; on miss, loads from DB and populates cache.

CDN

Edge-cached static (and sometimes dynamic) content close to users via a CDN (Content Delivery Network).

Load Balancer

Distribute requests across healthy backends (L4 or L7).

Bloom Filter

Probabilistic set membership — no false negatives, tunable false positives.

Rate Limiting

Throttle requests per user/IP with token bucket, leaky bucket, or sliding window.