Design Twitter / News Feed
Design a Twitter-like timeline: 500M users, average 200 follows, p95 timeline read < 200ms.
Required building blocks
Pub/Sub
Cache-Aside (Lazy Loading)
Sharding
Wide-Column Store
Load Balancer
CDN
Nice to have
Search Index
API Gateway
Canonical answer
Fan-out-on-write for normal users (push to followers' timeline cache), fan-out-on-read for celebrities (pull at read). Wide-column store for tweets.
Capacity estimation
- 500M users, assume 200M DAU; avg 2 tweets/day → 400M writes/day ≈ 4,600 writes/sec, ~12K/sec at peak.
- Avg 50 timeline reads/DAU → 10B reads/day ≈ 115K reads/sec, ~290K/sec at peak.
- Fan-out: 200 followers avg × 4,600 tps = ~920K timeline inserts/sec (push path).
- Tweet storage: 400M × 365 × ~1 KB ≈ 150 TB/year in wide-column store.
- Timeline cache: 200M users × 800 cached tweet IDs × 16 B ≈ 2.5 TB in Redis cluster.
- Celebrity threshold ~10K followers → ~0.1% of users handled via pull to cap fan-out cost.
Architecture
Mobile / Web ─→ CDN (media) ─→ API Gateway
│
▼
Load Balancer
│
┌───────────────┼───────────────┐
▼ ▼ ▼
Write Svc Read Svc Search Svc
│ │ │
▼ ▼ ▼
Kafka Redis ES/Solr
(fan-out) Timeline Cache (inverted idx)
│ ▲
▼ │ miss
Fan-out Workers ────────┘
│
▼
Wide-Column Store (Cassandra)
tweets by (user_id, tweet_id)
timelines by (user_id, ts)API
- POST /tweets { text, media_ids? } → { tweet_id, created_at }
- GET /timeline?cursor=… → { tweets[], next_cursor }
- POST /follow { target_user_id } → 204
- GET /users/:id/tweets?cursor=… → { tweets[], next_cursor }
- GET /search?q=… → { tweets[] } (search-index path)
Data model
tweets (wide-column, partition = user_id, cluster = tweet_id DESC):
tweet_id (PK) : snowflake
user_id : uuid
text : string (≤280)
media_ids : list<uuid>
created_at : timestamp
timelines (wide-column, partition = user_id, cluster = ts DESC):
user_id (PK) : uuid
ts : timestamp
tweet_id : snowflake
author_id : uuid
follows (KV, partition = follower_id):
follower_id : uuid
followee_id : uuid
created_at : timestampConcept blurbs
Pub/Sub
Fan-out events to many subscribers; topic-based (Kafka, SNS, Redis pub/sub).
Cache-Aside (Lazy Loading)
App reads cache first; on miss, loads from DB and populates cache.
Sharding
Partition data across DB instances by key (hash, range, or geography).
Wide-Column Store
Sparse rows over many columns; time-series friendly (Cassandra, HBase, Bigtable).
Load Balancer
Distribute requests across healthy backends (L4 or L7).
CDN
Edge-cached static (and sometimes dynamic) content close to users.
Search Index
Inverted index for full-text search and faceting (Elasticsearch, OpenSearch).
API Gateway
Single entry point: auth, rate limit, routing, transformation.