sharpbyte.dev

How Google Docs works at scale

Collaborative editing looks like a word processor in the browser. At scale it is a distributed log of tiny edits that many people apply at once—plus permissions, revision history, and export pipelines that must never corrupt the document.

We work through the design in order—requirements first, numbers second, architecture third, APIs last—using a Google Docs–class product as the mental model, not any one company’s private implementation.

What you should be able to do after reading:

Step 0 — How we will work through the problem

Ordered thinking beats memorizing boxes. Use this sequence when you design real-time collaborative documents:

  1. Clarify scope. Docs only or Sheets/Slides too? Offline editing? Guest commenters? Enterprise DLP in scope?
  2. Write requirements. Functional = editing, sharing, history. Non-functional = sync latency, durability, never lose edits.
  3. Do napkin math. Active docs, ops per second per doc, revision row growth—so nobody stores the whole internet in one JSON blob.
  4. Draw three loops before naming WebSockets or Spanner.
  5. Tell one story—two users type in the same paragraph—then failure cases (partition, reconnect, conflicting paste).
flowchart TB
  subgraph collab [Collaboration loop]
    C1[Client A ops] --> WS[Realtime gateway]
    C2[Client B ops] --> WS
    WS --> OT[Transform / order]
    OT --> FAN[Broadcast to sessions]
  end
  subgraph doc [Document loop]
    OT --> LOG[Append revision log]
    LOG --> SNAP[Periodic snapshots]
    SNAP --> BLOB[(Object store)]
  end
  subgraph access [Access loop]
    ACL[Permissions service] --> WS
    ACL --> API[REST metadata]
    SHARE[Share links] --> ACL
  end
    

Step 1 — Functional requirements (editors, viewers, admins)

ActorRequirementWhy scale makes it hard
EditorType, format, insert images/tables, undo/redoEvery keystroke is an op; fan-out to N collaborators
EditorSee others’ cursors and selections in near real timePresence channel separate from document ops
ViewerRead-only open; copy allowed per policySame render path without accepting ops
CommenterComments and @mentions without editing bodyComment anchoring to volatile positions
Suggest modeProposed edits require owner accept/rejectBranching suggestion layer on top of canonical doc
OwnerShare by user/group/link; transfer ownershipACL evaluation on every op and API call
AllVersion history; restore named revisionCompaction vs infinite op log
AllExport PDF, DOCX, etc.Async render farm reads snapshot + tail ops
AdminOrg policies, retention, legal hold, auditeDiscovery exports across millions of files

Functional details worth stating clearly

Operations are the source of truth, not the HTML DOM. The server stores a ordered sequence of ops (or periodic snapshots + tail).

Undo is local and global. Local undo stacks invert recent ops; server may still reject if revision moved.

Out of scope today (say it aloud). Building a full layout engine from scratch, real-time video co-editing, or on-prem blockchain audit—park them.

Step 2 — Non-functional requirements (engineering promises)

CategoryTarget (typical)How we meet itIf we miss it
Latency — remote edit visiblep95 < 200 ms same regionWebSocket, regional doc shards, small op payloadsFeels like email, not “live”
Latency — open docp95 < 2 s cold startLatest snapshot + tail replayUsers think tab crashed
DurabilityNo acknowledged op lostWrite-ahead log before ACK to clientTrust destroyed forever
ConsistencyTotal order of ops per documentSingle sequencer per doc shardGarbled text, divergent forks
Availability99.9%+ edit path monthlyReplica gateways, doc shard failoverClassroom / deal room stops
Scale — hot doc100+ viewers; 10–20 concurrent editorsOp batching, read-only fan-out pathGateway meltdown on viral doc
Scale — corpusBillions of filesShard by doc_id, tier cold storageOne DB owns everything
SecurityLeast privilege per opACL check at gateway + storageLink leak edits entire company drive

Key idea: Collaboration wants a single authoritative order per document. Analytics and search can be eventual; the edit log cannot.

Step 3 — Napkin math (why one JSON file is not enough)

Step 4 — Architecture: three loops

Browser clients talk to an edge API for metadata (title, permissions) and a realtime gateway for ops. Each document maps to a shard with a sequencer that assigns monotonic revision numbers. The document loop appends to a write-ahead log, compacts into snapshots, and stores large blobs separately.

flowchart TB
  subgraph clients [Clients]
    WEB[Browser editor]
    MOB[Mobile app]
  end
  subgraph edge [Edge]
    LB[Load balancer]
    META[Metadata API]
    RT[Realtime gateway]
  end
  subgraph core [Document shard]
    SEQ[Sequencer]
    OT[OT / CRDT engine]
    WAL[(Revision log)]
    SNAP[Snapshot builder]
  end
  subgraph stores [Stores]
    SQL[(Doc metadata DB)]
    OBJ[(Blob store)]
    SRCH[(Optional index)]
  end
  WEB --> LB
  MOB --> LB
  LB --> META
  LB --> RT
  META --> SQL
  RT --> SEQ --> OT --> WAL
  OT --> SNAP --> OBJ
  WAL --> OBJ
    

Step 5 — Walk one edit end to end

Two users edit the same paragraph. User A inserts “Hello” at index 42.

  1. Client A generates op {type: insert, index: 42, text: "Hello", client_rev: 118} and sends over WebSocket.
  2. Gateway authenticates session, checks ACL role >= editor, routes to document shard doc_7xk.
  3. Sequencer receives op, transforms against any concurrent ops since rev 118 (OT) or merges (CRDT), assigns server_rev: 119.
  4. WAL persists op 119 durably (quorum write) before ACK to clients.
  5. Fan-out pushes transformed op to all subscribed sessions (A, B, …).
  6. Client B applies op 119 locally; UI updates text and shifts B’s pending cursor indices.
  7. Presence (parallel channel) may show A’s cursor near index 42 without blocking the critical path.
sequenceDiagram
  participant A as Client A
  participant G as Gateway
  participant S as Doc shard
  participant B as Client B
  A->>G: op insert @42 (client_rev 118)
  G->>S: authorize + forward
  S->>S: transform + assign rev 119
  S->>S: append WAL
  S-->>A: ACK rev 119
  S-->>B: op 119 transformed
    

Step 6 — OT vs CRDT: keeping concurrent edits sane

Operational Transformation (OT) — clients send ops against a known revision; the server transforms concurrent ops so everyone converges. Classic Google Docs–era approach: server is authoritative; complex but battle-tested for rich text with tables.

CRDTs — data structures designed so merge is commutative; peers can sync without a central transformer in some designs. Popular in newer editors (Notion-class, Figma-class). Rich text CRDTs exist (Yjs, Automerge) but payload size and complexity differ from OT.

ApproachStrengthCost
OT + central serverStrong ordering; easier global undo policyServer CPU per op; harder to do P2P
CRDTOffline-friendly; peer syncLarger states; tombstones; format migration

Either way, clients must handle rebase: while offline, buffer ops; on reconnect, server sends missing rev range 119–140 for replay.

Step 7 — Realtime transport: WebSockets and session stickiness

Use WebSockets (or HTTP/2 streams) for bidirectional op traffic. Initial doc open: GET snapshot + GET revisions?from=… over HTTPS, then upgrade socket for live ops.

Step 8 — Document model: revisions, snapshots, compaction

Store an append-only revision log:

revisions(doc_id, rev, op_payload, author_id, ts)
snapshots(doc_id, rev, snapshot_blob_ref, created_at)

Snapshot policy: every N revisions or M minutes, materialize full document state to object storage; new clients load latest snapshot + replay tail only. Compaction archives revisions older than last snapshot + legal retention window.

Large docs may split into segments (tabs, huge tables) each with its own sub-log to avoid one infinite hot row.

Step 9 — Permissions, sharing, and org policies

Cache effective ACL in Redis with short TTL; invalidate on permission.changed events—never trust client-side checks alone.

Step 10 — Comments, suggestions, and presence

Comments anchor to stable identifiers (paragraph id + offset) not raw indices that shift every keystroke. Suggestions store proposed ops in a side branch; accept merges into canonical log; reject discards branch.

Presence — ephemeral data: cursor color, selection range, “User is typing…” — Redis or in-memory on gateway; lossy is OK.

Step 11 — Offline, reconnect, and conflict UX

  1. Client keeps local op queue + last known server_rev.
  2. On reconnect, send catch_up from=server_rev+1; server streams missing ops.
  3. Client rebases pending ops against incoming transforms; if impossible, show “copy your changes” modal.
  4. IndexedDB holds snapshot for read-only offline; sync when online returns.

Step 12 — Export, import, and side pipelines

Export is async: job reads snapshot + ops to target format (PDF via headless render, DOCX via converter service). Import parses uploaded file into initial snapshot + marks provenance. Virus scan and content policy run on uploads before merging into collaborative doc.

flowchart LR
  DOC[Canonical doc state] --> JOB[Export worker]
  JOB --> PDF[PDF]
  JOB --> DOCX[DOCX]
  UP[Upload] --> PARSE[Import parser] --> DOC
    

Step 13 — Scale: hot documents and sharding

Step 14 — Technical layer: APIs and wire formats

OperationHTTP / WSSuccessNotes
Get metadata GET /v1/documents/{id} 200 Title, owners, mime, revision head
Get snapshot GET /v1/documents/{id}/snapshot 200 Binary or JSON model at rev
List revisions GET /v1/documents/{id}/revisions?from=100&to=150 200 Paginated op log for catch-up
Submit ops (fallback) POST /v1/documents/{id}/operations 200 + new head rev When WebSocket unavailable
Live channel WSS /v1/documents/{id}/channel Bi-directional Ops, ACKs, presence events
Export POST /v1/documents/{id}/exports 202 + job id Poll GET …/exports/{job} for URL

WebSocket message (illustrative JSON):

{
  "type": "op_batch",
  "doc_id": "doc_7xk",
  "client_rev": 118,
  "ops": [
    {"op": "insert", "index": 42, "text": "Hello", "client_op_id": "c-op-9"}
  ]
}

→ server
{
  "type": "ack",
  "server_rev": 119,
  "transformed_ops": [ … ],
  "head_rev": 119
}

Logical tables

documents(id, owner_id, title, head_rev, snapshot_ref, …)
revisions(doc_id, rev, op_json, user_id, ts)
sessions(doc_id, session_id, user_id, gateway_id, last_rev)
acl(doc_id, principal, role)
export_jobs(id, doc_id, format, status, output_url)

Step 15 — Reliability, observability, and failure modes

Failure modes

Observability

Step 16 — Goals → knobs (quick reference)

GoalKnob
Edits feel liveWebSockets, regional shards, small ops, batching tuned
Never lose workWAL before ACK, snapshots, client offline queue
Opens stay fastSnapshots every N revs, CDN for static assets, parallel metadata + snapshot fetch
Safe sharingServer-side ACL on every path; link scope; audit log
Survive viral docViewer throttling, dedicated hot shard, op batching

Step 17 — Close the loop (what to practice)

On a whiteboard: three loops, two users editing one sentence, label WAL vs snapshot vs ACL.

Out loud: five functional requirements and which NFR is hardest for collaboration vs export.

With the technical section: trace one insert op from WebSocket to revision 119 ACK and broadcast.

The one line to remember

Google Docs–class systems are a ordered operation log with a realtime fan-out layer on top. Collaboration needs one revision sequence per document; everything else—search, export, analytics—reads that log, never fights it.