Design Instagram

Design Instagram

Photo upload, feed generation, follow graph, search by hashtag; mostly read-heavy.

Required building blocks
Object Storage (Blob)
CDN
Cache-Aside (Lazy Loading)
Wide-Column Store
Search Index
Sharding
Nice to have
Pub/Sub
Load Balancer
Canonical answer

Images on object storage behind CDN. Feed via fan-out (push or hybrid). Hashtag search via inverted index.

Capacity estimation
  • 2B users, 500M DAU; ~100M photo uploads/day → ~1,200 writes/sec, ~3K at peak.
  • Avg photo 500 KB + thumbnails → ~75 TB/day to object storage, ~27 PB/year.
  • Feed reads: 500M DAU × 50 feed loads/day → 25B reads/day ≈ 290K reads/sec, ~700K at peak.
  • Feed cache: 500M users × 500 cached post IDs × 16 B ≈ 4 TB in Redis cluster.
  • Follow graph: 500M × avg 150 follows × 16 B ≈ 1.2 TB; sharded by follower_id.
  • CDN serves ~98% of image bytes; origin egress ~1.5 PB/day.
Architecture
Mobile ─→ CDN (images) ─→ Object Storage (S3)
                              ▲
                              │ writes
                       Upload Service
                              │
                              ▼
                     Image Pipeline (resize/exif strip)
                              │
                              ▼
                          Kafka
                ┌─────────────┼─────────────┐
                ▼             ▼             ▼
         Feed Fan-out    Indexer        Notification
                │      (Elasticsearch)   Service
                ▼
        Redis Feed Cache ◀── Feed API ◀── API Gateway ◀── Mobile
                │
                ▼
        Cassandra (posts, by user_id+ts)
API
  • POST /media (multipart) → { media_id, cdn_url, thumb_url }
  • POST /posts { media_id, caption, hashtags[] } → { post_id, created_at }
  • GET /feed?cursor=… → { posts[], next_cursor }
  • POST /follow { user_id } → 204
  • GET /search/hashtag/:tag → { posts[], next_cursor }
  • POST /posts/:id/like → { like_count }
Data model
posts (wide-column, partition = user_id, cluster = ts DESC):
  post_id (PK)    : snowflake
  user_id         : uuid
  media_ids       : list<uuid>
  caption         : string
  hashtags        : list<string>
  created_at      : timestamp

feeds (Redis, key = "feed:{user_id}"):
  list of post_id : capped at ~500 entries
  ttl             : 7d (rebuild on miss)

follows (wide-column, partition = follower_id):
  follower_id     : uuid
  followee_id     : uuid
  created_at      : timestamp

hashtag_index (Elasticsearch):
  hashtag, post_id, ts, user_id, like_count
Concept blurbs
Object Storage (Blob)
Cheap durable storage for large immutable blobs (S3, GCS).
CDN
Edge-cached static (and sometimes dynamic) content close to users.
Cache-Aside (Lazy Loading)
App reads cache first; on miss, loads from DB and populates cache.
Wide-Column Store
Sparse rows over many columns; time-series friendly (Cassandra, HBase, Bigtable).
Search Index
Inverted index for full-text search and faceting (Elasticsearch, OpenSearch).
Sharding
Partition data across DB instances by key (hash, range, or geography).
Pub/Sub
Fan-out events to many subscribers; topic-based (Kafka, SNS, Redis pub/sub).
Load Balancer
Distribute requests across healthy backends (L4 or L7).