Design Twitter / News Feed

Mid

Design a Twitter-like timeline: 500M users, average 200 follows, p95 timeline read < 200ms.

Required building blocks

Pub/Sub

Cache-Aside (Lazy Loading)

Sharding

Wide-Column Store

Load Balancer

CDN

Nice to have

Search Index

API Gateway

Canonical answer

Fan-out-on-write for normal users (push to followers' timeline cache), fan-out-on-read for celebrities (pull at read). Wide-column store for tweets.

Capacity estimation

500M users, assume 200M DAU; avg 2 tweets/day → 400M writes/day ≈ 4,600 writes/sec, ~12K/sec at peak.
Avg 50 timeline reads/DAU → 10B reads/day ≈ 115K reads/sec, ~290K/sec at peak.
Fan-out: 200 followers avg × 4,600 tps = ~920K timeline inserts/sec (push path).
Tweet storage: 400M × 365 × ~1 KB ≈ 150 TB/year in wide-column store.
Timeline cache: 200M users × 800 cached tweet IDs × 16 B ≈ 2.5 TB in Redis cluster.
Celebrity threshold ~10K followers → ~0.1% of users handled via pull to cap fan-out cost.

Architecture

Mobile / Web ─→ CDN (media) ─→ API Gateway
                              │
                              ▼
                       Load Balancer
                              │
              ┌───────────────┼───────────────┐
              ▼               ▼               ▼
         Write Svc       Read Svc        Search Svc
              │               │               │
              ▼               ▼               ▼
          Kafka          Redis              ES/Solr
        (fan-out)     Timeline Cache       (inverted idx)
              │               ▲
              ▼               │ miss
      Fan-out Workers ────────┘
              │
              ▼
       Wide-Column Store (Cassandra)
         tweets by (user_id, tweet_id)
         timelines by (user_id, ts)

API

POST /tweets { text, media_ids? } → { tweet_id, created_at }
GET /timeline?cursor=… → { tweets[], next_cursor }
POST /follow { target_user_id } → 204
GET /users/:id/tweets?cursor=… → { tweets[], next_cursor }
GET /search?q=… → { tweets[] } (search-index path)

Data model

tweets (wide-column, partition = user_id, cluster = tweet_id DESC):
  tweet_id (PK)  : snowflake
  user_id        : uuid
  text           : string (<=280)
  media_ids      : list<uuid>
  created_at     : timestamp

timelines (wide-column, partition = user_id, cluster = ts DESC):
  user_id (PK)   : uuid
  ts             : timestamp
  tweet_id       : snowflake
  author_id      : uuid

follows (KV, partition = follower_id):
  follower_id    : uuid
  followee_id    : uuid
  created_at     : timestamp

Concept blurbs

Pub/Sub

Fan-out events to many subscribers; topic-based (Kafka, SNS, Redis pub/sub).

Cache-Aside (Lazy Loading)

App reads cache first; on miss, loads from DB and populates cache.

Sharding

Partition data across DB instances by key (hash, range, or geography).

Wide-Column Store

Sparse rows over many columns; time-series friendly (Cassandra, HBase, Bigtable).

Load Balancer

Distribute requests across healthy backends (L4 or L7).

CDN

Edge-cached static (and sometimes dynamic) content close to users via a CDN (Content Delivery Network).

Search Index

Inverted index for full-text search and faceting (Elasticsearch, OpenSearch).

API Gateway

Single API (Application Programming Interface) entry point: auth, rate limit, routing, transformation.