Design Uber / Ride Sharing

Design Uber / Ride Sharing

Match riders to nearby drivers in <2 seconds, track real-time locations, and handle surge pricing.

Required building blocks
Geohash
WebSockets
Load Balancer
Key-Value Store
Pub/Sub
Nice to have
Quadtree / R-Tree
Message Queue
Canonical answer

Geohash/quadtree for nearby-driver lookup; drivers push location via WS; pub/sub broadcasts match. KV cache of driver_id → last_location.

Capacity estimation
  • 10M active drivers; each pings location every 4s → 2.5M writes/sec into geo store.
  • 1M concurrent riders; ~100K ride requests/min → ~1,700 match ops/sec, ~5K at peak.
  • Geohash precision-6 cells (~1.2 km) → ~3M populated cells globally; index fits in 50 GB Redis.
  • Driver location KV: 10M × ~200 B ≈ 2 GB — kept hot in memory, snapshotted to disk hourly.
  • WebSocket fan-out: 10M driver sockets + 1M rider sockets across ~20K gateway nodes.
  • Match latency target <2s → keep matching service co-located with geo index in same AZ.
Architecture
Rider App ─WSS─┐         ┌─WSS─ Driver App
                │         │
                ▼         ▼
          WS Gateway Cluster
                │
                ▼
          Dispatch Service
            │       │
            ▼       ▼
       Geo Index   Pricing Svc (surge)
       (Redis +    │
        Quadtree)  ▼
            │   Demand Heatmap (Kafka)
            ▼
       Match Engine ─→ Kafka (ride_events)
            │              │
            ▼              ▼
       Trip Store     Analytics / Billing
       (Cassandra)
API
  • WS /driver/connect — driver pushes { lat, lng, ts } every 4s
  • POST /rides/request { rider_id, pickup, dropoff } → { ride_id, eta, fare_estimate }
  • POST /rides/:id/accept (driver) → { rider_contact, route }
  • GET /rides/:id → { status, driver_loc, eta }
  • POST /rides/:id/complete → { final_fare, receipt_url }
Data model
driver_locations (Redis GEO / KV):
  driver_id (PK)  : uuid
  geohash         : string (precision 6)
  lat, lng        : float
  status          : enum(available|on_trip|offline)
  last_ping       : timestamp

rides (wide-column, partition = ride_id):
  ride_id (PK)    : uuid
  rider_id        : uuid
  driver_id       : uuid?
  pickup, dropoff : geo_point
  status          : enum(requested|matched|in_progress|done|canceled)
  created_at      : timestamp
  fare            : decimal

surge_zones (KV, partition = geohash5):
  geohash5 (PK)   : string
  multiplier      : float
  expires_at      : timestamp
Concept blurbs
Geohash
Encode lat/lng into a string prefix for proximity bucketing/indexing.
WebSockets
Persistent bidirectional connection for low-latency push to clients.
Load Balancer
Distribute requests across healthy backends (L4 or L7).
Key-Value Store
O(1) get/put by key; massively scalable (DynamoDB, Redis, Cassandra).
Pub/Sub
Fan-out events to many subscribers; topic-based (Kafka, SNS, Redis pub/sub).
Quadtree / R-Tree
Spatial index for range queries (find all points in a region).
Message Queue
Decouple producers/consumers; buffer bursts; enable retries (SQS/RabbitMQ).