Design Instagram

Mid

Photo upload, feed generation, follow graph, search by hashtag; mostly read-heavy.

Required building blocks

Object Storage (Blob)

CDN

Cache-Aside (Lazy Loading)

Wide-Column Store

Search Index

Sharding

Nice to have

Pub/Sub

Load Balancer

Canonical answer

Images on object storage behind a content delivery network (CDN). Feed via fan-out (push or hybrid). Hashtag search via inverted index.

Capacity estimation

2B users, 500M DAU; ~100M photo uploads/day → ~1,200 writes/sec, ~3K at peak.
Avg photo 500 KB + thumbnails → ~75 TB/day to object storage, ~27 PB/year.
Feed reads: 500M DAU × 50 feed loads/day → 25B reads/day ≈ 290K reads/sec, ~700K at peak.
Feed cache: 500M users × 500 cached post IDs × 16 B ≈ 4 TB in Redis cluster.
Follow graph: 500M × avg 150 follows × 16 B ≈ 1.2 TB; sharded by follower_id.
Content delivery network (CDN) serves ~98% of image bytes; origin egress ~1.5 PB/day.

Architecture

Mobile ─→ CDN (images) ─→ Object Storage (S3)
                              ▲
                              │ writes
                       Upload Service
                              │
                              ▼
                     Image Pipeline (resize/exif strip)
                              │
                              ▼
                          Kafka
                ┌─────────────┼─────────────┐
                ▼             ▼             ▼
         Feed Fan-out    Indexer        Notification
                │      (Elasticsearch)   Service
                ▼
        Redis Feed Cache ◀── Feed API ◀── API Gateway ◀── Mobile
                │
                ▼
        Cassandra (posts, by user_id+ts)

API

POST /media (multipart) → { media_id, cdn_url, thumb_url }
POST /posts { media_id, caption, hashtags[] } → { post_id, created_at }
GET /feed?cursor=… → { posts[], next_cursor }
POST /follow { user_id } → 204
GET /search/hashtag/:tag → { posts[], next_cursor }
POST /posts/:id/like → { like_count }

Data model

posts (wide-column, partition = user_id, cluster = ts DESC):
  post_id (PK)    : snowflake
  user_id         : uuid
  media_ids       : list<uuid>
  caption         : string
  hashtags        : list<string>
  created_at      : timestamp

feeds (Redis, key = "feed:{user_id}"):
  list of post_id : capped at ~500 entries
  ttl             : 7d (rebuild on miss)

follows (wide-column, partition = follower_id):
  follower_id     : uuid
  followee_id     : uuid
  created_at      : timestamp

hashtag_index (Elasticsearch):
  hashtag, post_id, ts, user_id, like_count

Concept blurbs

Object Storage (Blob)

Cheap durable storage for large immutable blobs (S3, GCS).

CDN

Edge-cached static (and sometimes dynamic) content close to users via a CDN (Content Delivery Network).

Cache-Aside (Lazy Loading)

App reads cache first; on miss, loads from DB and populates cache.

Wide-Column Store

Sparse rows over many columns; time-series friendly (Cassandra, HBase, Bigtable).

Search Index

Inverted index for full-text search and faceting (Elasticsearch, OpenSearch).

Sharding

Partition data across DB instances by key (hash, range, or geography).

Pub/Sub

Fan-out events to many subscribers; topic-based (Kafka, SNS, Redis pub/sub).

Load Balancer

Distribute requests across healthy backends (L4 or L7).