How WhatsApp-style chat works at scale
A messenger at billion-user scale is not one database with a chat bubble on top. It is three problems glued together: a live path for tiny updates (text, receipts, typing), a fat path for photos and voice notes, and a durable path for data you cannot afford to lose.
We build the design in order—requirements first, numbers second, architecture third, APIs last. This uses a WhatsApp-class mental model: wire formats, ids, stores, caches, queues, encryption, and calls.
What you should be able to do after reading:
- Separate the three paths—live, fat, durable—and assign the right store to each.
- List functional and non-functional requirements and tie them to WebSockets, writers, and object storage.
- Walk one message online and offline, naming
client_msg_id,server_seq, and receipts. - Explain E2EE building blocks (X3DH, Double Ratchet) and what metadata the server still sees.
- Read the technical section: HTTP, WebSocket frames, SQL/Cassandra, Kafka partition keys.
Step 0 — How we will work through the problem
Ordered thinking beats memorizing a diagram. Use this sequence when you design a global messenger:
- Clarify scope. Text only today, or calls? Desktop/web, or mobile only? E2EE in scope or transport-only TLS?
- Write requirements. Functional = what users see. Non-functional = latency, durability, privacy, scale.
- Do napkin math. Messages per day, connection count, bytes per row.
- Draw three paths (live, fat, durable) before naming databases.
- Tell one story—send with recipient online, then offline, then duplicate retry.
flowchart LR
subgraph live [Live path]
WS[WebSocket wss]
CH[Chat hosts]
end
subgraph fat [Fat path]
UP[Multipart upload]
CDN[CDN + object store]
end
subgraph durable [Durable path]
WR[Message writer]
DB[(History store)]
end
WS --> CH --> WR --> DB
Step 1 — Functional requirements (what users need)
| Area | Requirement | Why scale makes it hard |
|---|---|---|
| Chat | 1:1 and group conversations | Groups fan out one send to N deliveries |
| Content | Text, files, photos, voice notes, optional video | Media must not ride the same socket as text |
| Receipts | Sent, delivered, read on the live path | High rate of tiny control frames |
| Offline | Server holds mail until device wakes | Inbox + sync on reconnect |
| Identity | Phone signup, devices, profile | Strong consistency on accounts |
| Calls (optional) | Voice/video via signaling + RTP | Separate from chat tier capacity |
Functional details easy to skip (but worth stating clearly)
Order is per thread. server_seq is meaningful only inside one thread_id.
Dedupe is mandatory. client_msg_id prevents double-tap and retry duplicates.
Receipts are separate events on the live wire—delivered is not the same as read.
Step 2 — Non-functional requirements (engineering promises)
| Category | Target (typical) | How we meet it | If we miss it |
|---|---|---|---|
| Latency — send | p99 < 500 ms send→ack | Write-before-ack, regional hosts | Feels sluggish |
| Latency — live push | Sub-second when online | WebSocket + connection registry | Late receipts |
| Availability | 99.9%+ monthly (often cited) | Redundant fronts, health checks | Outages make news |
| Durability | Very low message loss | Quorum writes, backups | Permanent trust loss |
| Order | Total order per thread | Sequencer / single writer lane | Scrambled history |
| Privacy | TLS; optional E2EE | Signal-style protocols | Legal and user risk |
| Bandwidth | Tiny control frames; CDN media | Client compression, edge cache | Churn on metered plans |
Key idea: Profiles want correctness, history wants write throughput, Redis can restart empty if disk is the source of truth.
Step 3 — Napkin math (why one server is not enough)
Round numbers. Multiply in the open—you are showing magnitude, not precision accounting.
- ~2 billion monthly active users on large messengers.
- ~100 billion events/day (messages, receipts, pings).
- 80B metadata rows/day × 160 bytes ≈ 13 TB/day before media bytes.
- 1.5B peak sessions ÷ 5M sessions/host ≈ hundreds of connection machines (before regions and headroom).
Honest ranges beat fake precision—the fleet size moves with heartbeat size and framing choices.
Step 4 — Architecture: the big picture
Phones on the left. DNS and load balancer in the middle. Chat hosts hold long-lived WebSockets. Behind them: SQL for profiles, wide-column or sharded store for history, Redis for presence, object storage + CDN for media. Skinny control on the live wire; fat bytes on the side door.
flowchart TB
subgraph clients["Your apps"]
MOB[Phones]
WEB[Web / desktop]
end
subgraph entry["Front door"]
DNS[DNS / geo routing]
LB[Load balancer]
end
subgraph chat["Chat tier"]
C1[Chat host]
C2[Chat host]
C3[Chat host]
end
subgraph data["Data"]
UDB[(Profiles + login)]
MDB[(Message history)]
RED[(Fast cache)]
S3[(File bucket)]
CDN[Edge file cache]
end
MOB --> DNS
WEB --> DNS
DNS --> LB
LB --> C1
LB --> C2
LB --> C3
C1 --> UDB
C1 --> MDB
C1 --> RED
MOB -->|upload / download| CDN
CDN --> S3
C1 -.->|small pointer| S3
Step 5 — What each component does
Clients. The app draws bubbles, keeps a little local copy so scrolling feels smooth, and on many designs encrypts text before it leaves the pocket. The server is often a mail carrier, not a proofreader.
Load balancer. Spreads people across chat hosts. Often wants “sticky” routing so one user keeps landing on the same host until a controlled move happens. Health checks so dead machines stop receiving new friends.
Chat hosts. Hold the live wire, parse small packets, talk to databases and cache, push events to whoever is online. Some stacks use languages famous for many light threads; some use anything else that the team can run well. Focus on responsibilities, not language wars.
User database. Phone number, profile, device list, maybe privacy flags. Things you want strong rules around—classic SQL territory in many answers.
Message store. Huge write load, time order inside a chat matters. People often draw a wide-column store here, but you can say “partitioned log + workers” if that is what you know better.
Redis or similar. “Who is online right now,” “which host holds this socket,” “last fifty lines hot in RAM,” typing dots with a short expiry. If Redis wakes up empty, the world still survives because the real truth is on disk.
Object storage + CDN. Big encrypted blobs, thumbnails, later downloads from an edge city near the user.
Step 5 (continued) — Transport, stickiness, and heartbeats
WebSocket (wss:// over TLS) is the usual production choice: one TCP connection, full-duplex frames, small binary or JSON payloads for MESSAGE, ACK, PING/PONG. Alternatives you should name: HTTP/2 long poll or gRPC streaming for companies that hate holding sockets in the load balancer; custom binary framing on TLS when you want minimal overhead. Tradeoff: anything stateful needs session affinity or a shared connection registry (Redis hash: conn:{user_id} → {chat_host_id, channel_id} with TTL 60–120s refreshed on each client ping).
Typical client ping interval: 15–60 seconds; server closes idle sockets after 2–5× the ping window if no application traffic. Frame size caps (for example 64 KB) stop one evil client from starving others on the same process.
Load balancer algorithms: round-robin for stateless HTTP; consistent hash on user id (or subchannel id) for WebSocket so the same user lands on the same chat host until you drain. Health checks: TCP + optional HTTP /healthz returning 200 with build id and dependency pings (DB, Redis) with short timeouts.
Step 6 — Walk one message end to end
I tell the story in plain steps. I use fake names because “Alice and Bob” are old friends from crypto class.
- Alice’s phone builds a random
client_msg_idso double taps and bad Wi‑Fi do not become twin messages. - If the product uses end-to-end encryption, the phone locks the text before it rides the wire. The server sees a blob, not a love letter.
- The blob rides a long-lived connection—think “open window” instead of “knock on the door every second.”
- The chat host checks auth, picks the right conversation, hands work to a writer that assigns the next slot number and saves to disk before saying “saved” back to Alice.
- A small directory says where Bob’s socket lives today. If Bob is awake, push now. If Bob is asleep, write an inbox row and maybe ask Apple or Google to ping the phone later.
- Bob’s phone unlocks the blob locally, draws the bubble, sends “delivered.” If you support reads, another tiny message when the UI actually shows the line.
The drawing under this paragraph is the comic-book version. Read top to bottom. Notice again: writer talks to disk before it promises Alice anything permanent.
sequenceDiagram
participant A as Alice phone
participant E as Front session
participant W as Message writer
participant S as Storage
participant B as Bob phone
A->>E: Blob + ids
E->>W: Trusted command
W->>S: Save with next slot
W->>A: Saved
alt Bob online
W->>B: Push blob
B->>E: Delivered
E->>W: Mark state
W->>A: Bob got it
else Bob offline
W->>S: Queue for Bob
end
flowchart LR
subgraph phones["Devices"]
PA[Alice]
PB[Bob]
end
subgraph front["Cloud front"]
LB[Balancer]
SESS[Live sessions]
end
subgraph back["Cloud back"]
WR[Writers]
DB[(History)]
OBJ[(Files)]
end
PA <-->|live line| LB
PB <-->|live line| LB
LB --> SESS
SESS --> WR
WR --> DB
PA -->|heavy upload| OBJ
PB -->|heavy download| OBJ
WR -.->|pointer only| DB
Step 7 — Online vs offline delivery
Bob awake. This is the happy path everyone loves. Push goes out now. Receipts fly back. Latency story is short and sweet.
Bob asleep. The message lands in durable storage with a clear owner. A push notification wakes the OS later. When Bob opens the app, the client pulls anything newer than its last slot number. You say “TTL” if you want to sound grown-up—that just means “we do not keep undownloaded server copies forever unless the law says we must.”
Most days are a mix. Your design should not break when the mix changes hour by hour.
Step 8 — Storage: match data to the right store
One tool rarely wins every job. Draw three buckets and I explain each in words a friend outside tech could repeat at dinner.
| What you store | What you want | What people often say on the board |
|---|---|---|
| Accounts, contacts, settings | Strong rules, careful joins, “do not lose this” | SQL family (Postgres, MySQL, …) |
| Huge message history | Insane write rate, time order inside a chat, cheap growth sideways | Wide-column or sharded stores (Cassandra-style thinking, or your favorite) |
| “Who is online,” hot previews | Microsecond reads, okay if it vanishes on restart | Redis-style memory store |
| Photos and video bytes | Cheap disk, big pipes, CDN friendliness | Object store + CDN |
If you only remember one sentence: match the tool to the shape of the pain. Profiles hurt if wrong; chat history hurts if slow; cache hurt if missing is fine.
Step 9 — Schemas, keys, and indexes
Relational core (Postgres-style)
Strong typing, foreign keys, and UNIQUE constraints give you correctness for money-adjacent data. Example sketch (types simplified):
CREATE TABLE users (
user_id BIGSERIAL PRIMARY KEY,
phone_e164 TEXT NOT NULL UNIQUE,
display_name TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE TABLE devices (
device_id UUID PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(user_id),
push_token TEXT,
last_seen_at TIMESTAMPTZ
);
CREATE TABLE threads (
thread_id UUID PRIMARY KEY,
kind TEXT NOT NULL CHECK (kind IN ('direct','group')),
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE TABLE thread_members (
thread_id UUID NOT NULL REFERENCES threads(thread_id),
user_id BIGINT NOT NULL REFERENCES users(user_id),
role TEXT NOT NULL DEFAULT 'member',
member_version BIGINT NOT NULL DEFAULT 1,
PRIMARY KEY (thread_id, user_id)
);
CREATE TABLE messages (
thread_id UUID NOT NULL,
server_seq BIGINT NOT NULL,
sender_id BIGINT NOT NULL,
client_msg_id UUID NOT NULL,
ciphertext BYTEA,
attachment_id UUID,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
PRIMARY KEY (thread_id, server_seq),
UNIQUE (sender_id, client_msg_id)
);
CREATE INDEX idx_messages_thread_seq ON messages (thread_id, server_seq DESC);
Wide-column / log-style message store (Cassandra-style)
Partition by thread_id so all rows for one conversation sit together; cluster by message_id (time-UUID) descending for fast “latest first” scans. Example CQL shape:
CREATE TABLE messages_by_thread ( thread_id UUID, message_id TIMEUUID, sender_id BIGINT, ciphertext BLOB, attachment_id UUID, PRIMARY KEY ((thread_id), message_id) ) WITH CLUSTERING ORDER BY (message_id DESC);
TTL on message cells (for example 30 days on server-held ciphertext) is a product plus legal choice, not a physics law—state what you assume.
Redis keys (examples)
HSET session:{device_id} user_id 123 chat_host "c-7" conn_id "ch-9912a" EX 86400
SETEX presence:{user_id} 45 "online"
LPUSH recent:{thread_id} "{...}" LTRIM recent:{thread_id} 0 49
Use pipelines or Lua scripts when you must read presence and push in one round trip to cut tail latency.
Step 10 — Redis and the hot scratchpad
Redis (or anything like it) is the sticky note on the fridge. Online status, typing flashes, “which server holds Bob today,” maybe the last screen of messages for speed. The fridge note can fall on the floor; the wedding contract still lives in the filing cabinet (disk). Remember: do not store the only copy of a bank transfer inside Redis with no backup story.
Step 11 — End-to-end encryption (building blocks)
Modern messengers often follow the Signal protocol family. You do not derive every equation in a design review; you name the moving parts and what the server never sees.
X3DH (first message to a new device)
Long-term identity key, medium-term signed pre-key, pool of one-time pre-keys. Sender fetches a bundle from the server (public keys only), runs several Elliptic-Curve Diffie-Hellman exchanges, mixes outputs through a KDF to get a shared secret, then sends the first ciphertext plus the ephemeral public keys the receiver needs to recompute the same secret. Server stores public material only.
Double Ratchet (after the session exists)
Two ratchets: a symmetric chain advances per message for forward secrecy inside the session; a DH ratchet occasionally injects fresh entropy from new ECDH results (“self-healing” after compromise). Keys are deleted after use where the protocol specifies it—mention that explicitly; it is why old messages resist later key theft.
What the server can still see (metadata)
| Usually ciphertext / opaque to server | Still visible for routing & ops |
|---|---|
| Message body bytes, attachment bytes when client-encrypted | Sender id, receiver id or thread id, timestamps, frame sizes, delivery state, IP / ASN for abuse, push tokens |
That split drives legal retention, abuse tooling, and “can we search message text server-side?” (usually no, for strict E2EE products).
Step 12 — Groups: fan-out and tradeoffs
In the strongest privacy story, Alice’s phone may encrypt one copy per member, because each member has different keys. The server then does mail delivery: many small envelopes, same dinner reservation inside each scrambled shell.
That pattern is beautiful for privacy and painful for CPU and battery on huge groups. So real products compress, batch, and sometimes cap group size or degrade gracefully. Name that tradeoff plainly—large groups are where fan-out cost shows up in CPU, battery, and ops.
Complexity sketch: for N members, sender device performs O(N) encryptions and uploads O(N) ciphertext frames (or one batched RPC that expands server-side). Server CPU is mostly routing and fan-out to sockets, not decrypting. For N > 256, move to background fan-out workers fed from a queue keyed by thread_id so the HTTP or WS request returns fast with 202 Accepted + job_id optional.
flowchart TD
POST[Alice posts to group]
POST --> MEM[Load members]
MEM --> EACH{Each member}
EACH -->|online| LIVE[Push now]
EACH -->|offline| SAVE[Save for later]
EACH -->|no app open| PUSH[OS push ping]
Step 13 — Media: the side door
Compress on the phone when you can—people pay for data by the drop. Encrypt if that is your product promise. Ask the server for permission to upload (signed URL is the usual trick). Put bytes in the bucket. Send a skinny chat row that only carries a pointer, a thumbnail preview, and keys wrapped for the receiver.
Download often hits the CDN first so Mumbai and Montreal do not both yank the same file from one lonely Oregon disk. Say “origin vs edge” once; you sound like you shipped something before.
flowchart TB
UI[Chat app]
CHAT[Chat service]
META[(Small metadata)]
CDN[Nearby edge cache]
BUCKET[(Central file store)]
UI -->|1 ask for upload ticket| CHAT
CHAT --> META
UI -->|2 put bytes| CDN
CDN --> BUCKET
CHAT -.->|pointer + key ref| META
Upload mechanics (S3-style)
Large objects use multipart upload: typical part size 5–32 MB, parallel parts capped by CPU and memory on the phone. Server returns uploadId plus presigned PUT URLs per part; client sends Content-MD5 or checksum header if you want integrity without re-reading the whole file server-side. Complete with ordered ETag list; abort stale uploads after 24–48h.
Chat row stores attachment_id, content_type, byte_length, thumbnail_key, and a wrapped media key (key material encrypted with the per-chat symmetric ratchet, not sent in plain to object storage). CDN serves GET with short-lived signed cookies or query strings; set Cache-Control: public, max-age=… only for ciphertext blobs that are not user-private without auth—usually you keep CDN auth on.
Step 14 — Voice and video calls
Text rides small packets. A call is a river of RTP-style media packets with jitter buffers (20–80 ms target depth) and PLC for audio gaps.
Signaling path (over your existing TLS socket or a small HTTPS service): exchange SDP offer/answer (session description: codecs, ICE ufrag/pwd, fingerprint for DTLS-SRTP), trickle ICE candidates (host, srflx from STUN, relay from TURN). TURN allocates relay ports; credentials time out.
Media path: prefer UDP direct P2P; fallback to TURN over UDP then TCP. Opus for audio (16–32 kb/s voice), H.264 or VP8/VP9 for video with simulcast layers if you want adaptive bitrate without full renegotiation every second.
Your chat tier carries signaling + call state; it should not become a media relay except when TURN is in play—capacity plan TURN bandwidth separately (often $$$).
Step 15 — Growing the fleet
Geo DNS + anycast steer clients to a nearby front; health-checked records drop bad POPs.
Consistent hashing: map both nodes and keys onto a ring (0..232−1 or 0..2160−1). Each physical node owns V virtual replicas (often 100–200) to smooth hot spots. Key k routes to first clockwise node ≥ hash(k). Adding a node moves only O(keys / (N·V)) of traffic in expectation—state that formula out loud.
Replication: message store often uses quorum writes (for example W=3, R=2 or LOCAL_QUORUM in one region) with RPO measured in seconds of unreplicated tail if you async-replicate cross region; RTO minutes if you automate failover. Say numbers you can defend, not magic zeros.
Cache layers: L1 SQLite on device (pages of history), L2 Redis hot set (<50 ms reads), L3 CDN for immutable ciphertext blobs with auth at edge.
Step 16 — Security engineering
Edge: TLS 1.3 only, strong cipher suites, OCSP stapling, optional certificate pinning in mobile binaries. mTLS between internal services (chat → writer, writer → DB) so a stolen east-west packet is not enough.
AuthN: short-lived JWT access tokens (5–15 min) + rotating refresh tokens bound to device id; store only hashes server-side. OAuth2 device flow if you have headless clients.
AuthZ: check thread_id membership on every write; use RBAC for admin APIs. Return 403 vs 404 deliberately so existence does not leak when privacy matters.
Rate limits: token bucket per user_id and per ip on login; sliding window on POST /messages (for example 60/min soft, 300/hour hard). Return 429 with Retry-After header.
Payload limits: max JSON body 64 KB for control messages; reject oversize frames at the WebSocket layer before JSON parse to avoid parser bombs.
Step 17 — Failure modes and idempotency
Duplicates. Networks lie. Build idempotent handlers—same id twice should not create two rows.
Clocks. Phones drift. Order by server slot numbers, not by “whatever time the handset felt like.”
Membership races. Version your roster so old packets cannot bring back someone you kicked.
Delete for real. Queues that walk replicas and CDN edges; say “eventual” instead of fairy tales.
Idempotent write pattern (SQL)
-- server_seq chosen in same txn via SELECT … FOR UPDATE on a thread row -- or from a dedicated sequencer service; client_msg_id is the idempotency key INSERT INTO messages (thread_id, server_seq, sender_id, client_msg_id, ciphertext) VALUES ($1, $2, $3, $4, $5) ON CONFLICT (sender_id, client_msg_id) DO NOTHING RETURNING server_seq;
Pair with an outbox table (message_id, publish_state) if downstream search indexes must never miss a row: transactionally insert message + outbox row, worker publishes to Kafka, marks published.
Step 18 — Kafka ordering and partitions
For strict per-thread order, assign Kafka partition key = thread_id (or hash into a fixed set of hot partitions if one celebrity thread would saturate a single broker). One consumer per partition preserves order; scale consumers with partition count.
Producer settings: acks=all, enable.idempotence=true, max.in.flight.requests.per.connection=5 (with idempotence) to avoid reorder on retry. Consumer: process batch, commit offset only after durable write to Cassandra/Postgres.
If you cannot afford one partition per thread, use a sequencer service (per-thread single writer with compare-and-swap on expected_seq) and accept that as a central bottleneck you must shard by geography or thread popularity.
Step 19 — HTTP and WebSocket contracts
Control-plane REST complements the duplex socket. Typical verbs and status codes:
| Operation | HTTP | Success | Common errors |
|---|---|---|---|
| Create thread | POST /v1/threads |
201 + Location |
400 bad body, 409 duplicate direct thread |
| List messages | GET /v1/threads/{id}/messages?after_seq=&limit=50 |
200 + cursor header X-Next-Seq |
403 not a member, 416 if seq out of range (optional) |
| Send message (HTTP fallback) | POST /v1/threads/{id}/messages |
201 |
409 duplicate client_msg_id, 429 rate limit |
| Begin upload | POST /v1/attachments:initiateMultipart |
200 JSON with uploadId, part URLs |
413 quota exceeded |
Example request body for send (E2EE product stores ciphertext only):
POST /v1/threads/7c9e…/messages
Authorization: Bearer eyJ…
Content-Type: application/json
{
"client_msg_id": "2f5c1b8e-…",
"kind": "text",
"ciphertext": "base64-or-binary-in-json",
"reply_to_seq": 1204
}
WebSocket binary opcode example (conceptual): 0x01 envelope = {type:"MSG", thread_id, server_seq, ciphertext}; 0x02 = {type:"ACK", client_msg_id, status:"DELIVERED"}. Version your schema (proto v3) so old apps do not explode.
Step 20 — Data model invariants
Tables you still draw: users, devices, threads, thread_members, messages, deliveries (optional per device), attachments, outbox_for_indexer (optional). Invariants worth saying aloud:
(sender_id, client_msg_id)is globally unique → dedupe retries.(thread_id, server_seq)is unique → total order inside a thread.- Foreign keys from
messages.thread_idandthread_members.thread_idtothreads. member_version(orepoch) on roster rows monotonic increases on admin edits → stale fan-out jobs abort.- Attachment row references object key +
sha256for dedupe across forwards.
Hot thread mitigation: shard by hash(thread_id) to N physical keyspaces; pin ultra-hot threads to dedicated partitions so they cannot evict cold neighbors from page cache.
Step 21 — Observability and SLOs
Golden signals: request rate, error rate (HTTP 5xx + WS abnormal close), latency histogram (p50/p95/p99) for send→ack and send→delivered, saturation (CPU, event loop lag, Kafka consumer lag, DB thread pool queue depth).
Example SLO targets for the control plane (tune to product): p99 send latency < 500 ms, availability 99.9% monthly, 0.001% durable message loss backed by quorum + backups. Alert on SLO burn rate over short windows, not only raw CPU.
Tracing: propagate trace_id from edge through writer to DB commit span; tag with thread_id, shard_id. Logs: structured JSON, never log ciphertext or raw tokens.
Step 22 — Goals → knobs (quick reference)
| Goal in normal words | Knob you can actually turn |
|---|---|
| Feels instant | Regional hosts, small packets on the live wire, CDN for media, warm caches |
| Hard to lose | Write before you ack, backups, drills where you pretend a data center died |
| Stays readable in one chat | Rising slot per thread, single writer lane per hot shard, dedupe on client ids |
| Stays up on a bad day | Spare capacity, health checks, failover with honest RPO/RTO words |
| Does not leak like a sieve | TLS, rate limits, device-bound tokens, E2EE story if in scope |
Step 23 — Close the loop (what to practice)
On a whiteboard: three paths, one message story (online + offline), label SQL vs Cassandra vs Redis vs S3.
Out loud: five functional requirements and which non-functional target applies to live vs durable paths.
With the technical section: trace POST /v1/threads/{id}/messages and the WebSocket MSG frame.
The one line to remember
A messenger is three systems: a live wire for control, a side door for media, and a durable ledger for history. Respect those boundaries and billion-user chat stays understandable.