Design Twitter / News Feed

Design Twitter / News Feed

Design a Twitter-like timeline: 500M users, average 200 follows, p95 timeline read < 200ms.

Required building blocks
Pub/Sub
Cache-Aside (Lazy Loading)
Sharding
Wide-Column Store
Load Balancer
CDN
Nice to have
Search Index
API Gateway
Canonical answer

Fan-out-on-write for normal users (push to followers' timeline cache), fan-out-on-read for celebrities (pull at read). Wide-column store for tweets.

Capacity estimation
  • 500M users, assume 200M DAU; avg 2 tweets/day → 400M writes/day ≈ 4,600 writes/sec, ~12K/sec at peak.
  • Avg 50 timeline reads/DAU → 10B reads/day ≈ 115K reads/sec, ~290K/sec at peak.
  • Fan-out: 200 followers avg × 4,600 tps = ~920K timeline inserts/sec (push path).
  • Tweet storage: 400M × 365 × ~1 KB ≈ 150 TB/year in wide-column store.
  • Timeline cache: 200M users × 800 cached tweet IDs × 16 B ≈ 2.5 TB in Redis cluster.
  • Celebrity threshold ~10K followers → ~0.1% of users handled via pull to cap fan-out cost.
Architecture
Mobile / Web ─→ CDN (media) ─→ API Gateway
                              │
                              ▼
                       Load Balancer
                              │
              ┌───────────────┼───────────────┐
              ▼               ▼               ▼
         Write Svc       Read Svc        Search Svc
              │               │               │
              ▼               ▼               ▼
          Kafka          Redis              ES/Solr
        (fan-out)     Timeline Cache       (inverted idx)
              │               ▲
              ▼               │ miss
      Fan-out Workers ────────┘
              │
              ▼
       Wide-Column Store (Cassandra)
         tweets by (user_id, tweet_id)
         timelines by (user_id, ts)
API
  • POST /tweets { text, media_ids? } → { tweet_id, created_at }
  • GET /timeline?cursor=… → { tweets[], next_cursor }
  • POST /follow { target_user_id } → 204
  • GET /users/:id/tweets?cursor=… → { tweets[], next_cursor }
  • GET /search?q=… → { tweets[] } (search-index path)
Data model
tweets (wide-column, partition = user_id, cluster = tweet_id DESC):
  tweet_id (PK)  : snowflake
  user_id        : uuid
  text           : string (≤280)
  media_ids      : list<uuid>
  created_at     : timestamp

timelines (wide-column, partition = user_id, cluster = ts DESC):
  user_id (PK)   : uuid
  ts             : timestamp
  tweet_id       : snowflake
  author_id      : uuid

follows (KV, partition = follower_id):
  follower_id    : uuid
  followee_id    : uuid
  created_at     : timestamp
Concept blurbs
Pub/Sub
Fan-out events to many subscribers; topic-based (Kafka, SNS, Redis pub/sub).
Cache-Aside (Lazy Loading)
App reads cache first; on miss, loads from DB and populates cache.
Sharding
Partition data across DB instances by key (hash, range, or geography).
Wide-Column Store
Sparse rows over many columns; time-series friendly (Cassandra, HBase, Bigtable).
Load Balancer
Distribute requests across healthy backends (L4 or L7).
CDN
Edge-cached static (and sometimes dynamic) content close to users.
Search Index
Inverted index for full-text search and faceting (Elasticsearch, OpenSearch).
API Gateway
Single entry point: auth, rate limit, routing, transformation.