Design YouTube / Video Streaming
Design video upload, transcoding, and global streaming for 1B+ daily views.
Required building blocks
Object Storage (Blob)
CDN
Message Queue
Relational DB (SQL)
Load Balancer
Nice to have
Search Index
Wide-Column Store
Canonical answer
Upload → object storage → queue → transcoder workers → multiple bitrates → CDN. Metadata in SQL; view counts in wide-column. Adaptive bitrate (HLS/DASH).
Capacity estimation
- 500K uploads/day, avg 100 MB → 50 TB/day raw ingest, ~18 PB/year before transcoding.
- Transcoding into ~6 renditions (240p–4K) → ~5x storage amplification ≈ 250 TB/day post-encode.
- 1B views/day × avg 10 MB streamed → 10 PB/day egress; CDN absorbs >95%, origin sees <500 TB/day.
- Peak streaming: 1B / 86400 × 3x peak ≈ 35K concurrent stream starts/sec; sustained ~5M concurrent viewers.
- Metadata: ~1B videos lifetime × 2 KB ≈ 2 TB in SQL (sharded by video_id).
- Transcoder fleet: 50 TB/day / (8 cores × 0.5x realtime) → ~30K vCPU sustained, autoscaled on queue depth.
Architecture
Uploader ─→ Resumable Upload Svc ─→ S3 (raw)
│
▼
Kafka (jobs)
│
▼
Transcoder Workers (k8s)
│
▼
S3 (HLS/DASH renditions)
│
▼
CDN (global PoPs)
▲
Viewer ─→ DNS ─→ API Gateway ─────────┘
│
▼
Metadata SQL (sharded)
View Counts (Cassandra)
Search (Elasticsearch)API
- POST /uploads/init { filename, size, mime } → { upload_id, chunk_urls[] }
- PUT /uploads/:id/chunks/:n (resumable) → 200
- POST /uploads/:id/complete → { video_id, status: processing }
- GET /videos/:id → { manifest_url, title, channel, stats }
- GET /watch/:id/manifest.m3u8 → HLS playlist (signed, CDN-served)
- POST /videos/:id/view → 202 (async to counter pipeline)
Data model
videos (SQL, sharded by video_id):
video_id (PK) : uuid
uploader_id : uuid
title, desc : string
duration_s : int
status : enum(uploading|processing|ready|failed)
created_at : timestamp
renditions (SQL):
video_id (FK) : uuid
resolution : enum(240p|360p|...|2160p)
codec : string
s3_key : string
bitrate_kbps : int
view_counts (wide-column, partition = video_id, cluster = bucket_hour):
video_id : uuid
bucket_hour : timestamp
views : counterConcept blurbs
Object Storage (Blob)
Cheap durable storage for large immutable blobs (S3, GCS).
CDN
Edge-cached static (and sometimes dynamic) content close to users.
Message Queue
Decouple producers/consumers; buffer bursts; enable retries (SQS/RabbitMQ).
Relational DB (SQL)
ACID, joins, secondary indexes; vertical scale + read replicas.
Load Balancer
Distribute requests across healthy backends (L4 or L7).
Search Index
Inverted index for full-text search and faceting (Elasticsearch, OpenSearch).
Wide-Column Store
Sparse rows over many columns; time-series friendly (Cassandra, HBase, Bigtable).