Design Uber / Ride Sharing
Match riders to nearby drivers in <2 seconds, track real-time locations, and handle surge pricing.
Required building blocks
Geohash
WebSockets
Load Balancer
Key-Value Store
Pub/Sub
Nice to have
Quadtree / R-Tree
Message Queue
Canonical answer
Geohash/quadtree for nearby-driver lookup; drivers push location via WS; pub/sub broadcasts match. KV cache of driver_id → last_location.
Capacity estimation
- 10M active drivers; each pings location every 4s → 2.5M writes/sec into geo store.
- 1M concurrent riders; ~100K ride requests/min → ~1,700 match ops/sec, ~5K at peak.
- Geohash precision-6 cells (~1.2 km) → ~3M populated cells globally; index fits in 50 GB Redis.
- Driver location KV: 10M × ~200 B ≈ 2 GB — kept hot in memory, snapshotted to disk hourly.
- WebSocket fan-out: 10M driver sockets + 1M rider sockets across ~20K gateway nodes.
- Match latency target <2s → keep matching service co-located with geo index in same AZ.
Architecture
Rider App ─WSS─┐ ┌─WSS─ Driver App
│ │
▼ ▼
WS Gateway Cluster
│
▼
Dispatch Service
│ │
▼ ▼
Geo Index Pricing Svc (surge)
(Redis + │
Quadtree) ▼
│ Demand Heatmap (Kafka)
▼
Match Engine ─→ Kafka (ride_events)
│ │
▼ ▼
Trip Store Analytics / Billing
(Cassandra)API
- WS /driver/connect — driver pushes { lat, lng, ts } every 4s
- POST /rides/request { rider_id, pickup, dropoff } → { ride_id, eta, fare_estimate }
- POST /rides/:id/accept (driver) → { rider_contact, route }
- GET /rides/:id → { status, driver_loc, eta }
- POST /rides/:id/complete → { final_fare, receipt_url }
Data model
driver_locations (Redis GEO / KV):
driver_id (PK) : uuid
geohash : string (precision 6)
lat, lng : float
status : enum(available|on_trip|offline)
last_ping : timestamp
rides (wide-column, partition = ride_id):
ride_id (PK) : uuid
rider_id : uuid
driver_id : uuid?
pickup, dropoff : geo_point
status : enum(requested|matched|in_progress|done|canceled)
created_at : timestamp
fare : decimal
surge_zones (KV, partition = geohash5):
geohash5 (PK) : string
multiplier : float
expires_at : timestampConcept blurbs
Geohash
Encode lat/lng into a string prefix for proximity bucketing/indexing.
WebSockets
Persistent bidirectional connection for low-latency push to clients.
Load Balancer
Distribute requests across healthy backends (L4 or L7).
Key-Value Store
O(1) get/put by key; massively scalable (DynamoDB, Redis, Cassandra).
Pub/Sub
Fan-out events to many subscribers; topic-based (Kafka, SNS, Redis pub/sub).
Quadtree / R-Tree
Spatial index for range queries (find all points in a region).
Message Queue
Decouple producers/consumers; buffer bursts; enable retries (SQS/RabbitMQ).