How YouTube works at scale

If you only remember the red play button, you will design the wrong system. YouTube is not “Netflix with comments.” It is a platform that must ingest enormous files on unreliable networks, process them into dozens of renditions, serve bytes from edge caches worldwide, and personalize what billions of people see next—all at once.

We work through the design in order: requirements first, numbers second, architecture third, APIs last. By the end you should explain what each layer does, why Vitess and Bigtable coexist, and how upload, watch, and homepage traffic differ on the wire.

What you should be able to do after reading:

Separate the three pipes—write, process, read—and assign the right database to each.
List functional and non-functional requirements with realistic latency and consistency targets.
Walk one upload and one watch, naming resumable upload, upload.complete, InnerTube player, and CDN segment fetches.
Describe two-stage recommendations (candidates → rank by watch time) and why clicks alone are a bad objective.
Read the technical section: Data API, InnerTube, googlevideo.com, Vitess schema, queue events.

Step 0 — How we will work through the problem

Ordered thinking beats memorizing a diagram. Use this sequence whenever you design a video platform at scale:

Clarify scope. Upload only? Watch page? Homepage recommendations? Live streaming is a different system—say if it is out of scope today.
Write requirements. Functional = what creators and viewers see. Non-functional = latency, availability, consistency, cost.
Do napkin math. Hours uploaded per minute, read:write ratio, segment QPS—so nobody assumes one MySQL instance is enough.
Draw three pipes before naming Kafka, Vitess, or CDN tiers.
Tell two stories—one upload, one watch—then cover failures (resume, cold CDN miss, viral comment spike).

flowchart LR
  subgraph write [Write path]
    U[Upload chunks] --> GCS[Object storage]
    GCS --> K[Event queue]
  end
  subgraph process [Processing path]
    K --> T[Transcode FFmpeg]
    T --> Segments[Many renditions]
  end
  subgraph read [Read path]
    Segments --> CDN[Media CDN]
    CDN --> P[Player ABR]
    Meta[(MySQL Vitess + Bigtable)] --> Page[Watch page APIs]
    Rec[2-stage ML] --> Home[Homepage]
  end

Step 1 — Functional requirements (what creators and viewers need)

Functional requirements are behaviors the product must ship. Missing one is a product bug, not a performance tweak.

Actor	Requirement	Why scale makes it hard
Creator	Upload multi-GB video; resume after network drop	Cannot stream whole file through one app server
Creator	Edit title, privacy, thumbnail; see processing status	State machine + many renditions behind one video id
Viewer	Play with adaptive quality (360p–4K/HDR)	Many itags, CDN paths, signed URLs that expire
Viewer	Search and browse billions of items	Inverted index separate from SQL metadata
Viewer	Personalized home feed and “up next”	Two-stage ML under millisecond budgets
Viewer	Engage: like, comment, subscribe, notifications	Viral spikes on comments—not steady CRUD
Platform	Monetization, ads, rights, geo policy	Policy checks on watch path fan-out
Ops	Live streaming (often phase two)	RTMP ingest, segment latency, chat—different pipe

Functional details easy to skip (but worth stating clearly)

“Upload complete” ≠ “every rendition ready.” The UI can show done while transcoding still runs; playback uses whichever itags exist.

Watch page ≠ one database query. Metadata, ads, comments count, related shelf, and player URLs come from parallel services.

Video id is the spine. Same id ties Vitess row, GCS objects, CDN segments, search doc, and recommendation features.

Step 2 — Non-functional requirements (how good “good” must be)

Category	Target (typical)	Design consequence	If we miss it
Availability — APIs	99.9%+ core metadata	Vitess failover, redundant edge	Site-wide errors on watch
Availability — bytes	CDN multi-tier + DNS steering	Edge → regional → origin	Buffering, playback failures
Latency — metadata	Low hundreds of ms for page APIs	Cache fragments, parallel RPC	Sluggish watch page shell
Latency — playback	Time-to-first-frame ~2s (hot content)	ABR starts low, edge hit	Viewers abandon
Throughput — upload	500+ hours uploaded/minute (order of magnitude)	Direct-to-GCS, async transcode	Creator backlog
Read:write — metadata	~100:1 serves vs writes	Sharding + caches, not one master	Replica lag, connection storms
CDN hit ratio	95%+ for popular content	DNS maps video id to warm edges	Origin meltdown
Consistency — money/metadata	Strong on video state, counts	Transactions on Vitess shard	Wrong privacy, broken ACL
Consistency — recommendations	Eventual feature freshness OK	Stream watch history to training	Slightly stale home feed
Durability	Exabyte-class; no silent loss	GCS + replicated DB	Permanent creator data loss
Cost	Egress dominates	CDN hit ratio, codec efficiency (AV1)	Unsustainable unit economics

Key idea: Video bytes favor bandwidth and cache hit rate; video metadata favors shardable SQL and fan-out control. Do not store every segment request in MySQL.

Step 3 — Napkin math (why the shape of the system is inevitable)

Public material cites ballpark figures (they drift year to year). Use them to justify architecture, not as financial statements.

2+ billion signed-in users per month.
500 hours of video uploaded every minute → transcode fleet, not synchronous upload handlers.
1 billion hours watched per day → CDN and recommendation load dominate.
100:1 metadata read:write → caches, Vitess, avoid single-writer myths.
Media CDN in 1,300+ cities, multi-tier cache, DNS-steered googlevideo.com.

Sanity check: one viral watch page with ~60 thumbnails is ~60 small object reads—filesystems and naive BLOB columns fail; Bigtable-style wide rows exist for that pattern.

Honest ranges beat fake precision when you extrapolate to your own product.

Step 4 — The three pipes (draw this first)

Start by drawing three parallel pipes. Merging them into one “YouTube service” is how designs end up streaming gigabytes through Python metadata servers.

Write path — resumable upload → object storage → event.
Process path — transcode, index, thumbnail generation (async, CPU-heavy).
Read path — metadata APIs, CDN segments, recommendations.

Step 5 — Walk one upload and one watch

Upload story

Creator starts resumable session → POST …/upload/youtube/v3/videos?uploadType=resumable.
Chunks PUT to session URL (or signed GCS URLs internally) → 308 resume on failure.
Stub row in Vitess: upload_status=uploading.
GCS finalize → upload.complete on queue → Borg/FFmpeg workers.
UI shows “complete”; background sets processing → ready as renditions land.

Watch story

Viewer opens /watch?v=VIDEO_ID → SPA calls POST /youtubei/v1/player.
Response carries streamingData.adaptiveFormats[] with signed googlevideo.com URLs.
Player fetches init + media segments (DASH/HLS); ABR switches itag on bandwidth.
Parallel: /next for recommendations; Vitess/Bigtable for title, channel, comments stub.

Replay with failures: expired signature (refresh player), cold video (origin fill), upload resume after disconnect.

Step 6 — Bend MySQL with Vitess instead of abandoning it

Many scaling stories follow one arc: “We outgrew MySQL, so we moved to NoSQL.” YouTube’s documented story is different. They started with a single MySQL instance. Growth caused:

Replication lag under heavy writes (async replication, single-threaded apply on replicas).
Connection exhaustion (too many app connections overwhelming MySQL).
Tables too large to scale vertically forever.

So YouTube built Vitess—a clustering layer for MySQL—rather than throwing away relational semantics for everything. Vitess has been described as a core part of YouTube’s database infrastructure since 2011, growing to tens of thousands of MySQL nodes (not “a few big servers”—tens of thousands of instances).

In plain terms: Vitess sits between your app and many MySQL shards. Your app still speaks SQL; Vitess figures out which shard owns the row, pools connections, and blocks runaway queries.

Step 6 (continued) — How Vitess works

VTGate (query router)

Routes queries to the correct shard automatically (you shard by user id, video id, etc.).
Connection pooling so thousands of app processes do not each open a raw MySQL connection storm.
Query safety: can kill dangerous or long-running queries before they take down a shard.

VTTablet (per-shard agent)

Manages individual MySQL instances on that shard.
Row-level caching (e.g. via Memcached) with invalidation driven by the MySQL replication stream.
Automated failover and backups with less manual DBA toil.

Sharding and resharding

Split or merge shards with minimal downtime—critical when one shard (a celebrity channel, a viral topic) becomes hot. Reported operational wins from this architecture include better cache locality, less disk thrashing, improved hardware efficiency, and replica lag driven toward zero compared with one giant MySQL.

Database evolution (three stages you can draw)

Single MySQL + read replicas — worked until write load and lag broke the model.
Vertical split — different databases for different table families (e.g. users vs. video metadata). Bought time, did not remove the ceiling.
Horizontal sharding + Vitess — many MySQL instances; Vitess handles routing, pooling, safety, resharding, failover.

Why sharding won:

Cache locality: each shard has its own working set—less fighting over one buffer pool.
I/O isolation: hot user tables do not starve unrelated video metadata queries on the same disk.
Blast radius: one bad shard does not take down the entire fleet.
Horizontal growth: add shards instead of buying impossible single-box machines.

Step 7 — Four request paths (different traffic, different bottleneck)

YouTube’s traffic is not one API. Name the question each path answers before you name the database.

Path A — Watch page (metadata + shell)

Flow: Client → load balancer → backend services → data layer (MySQL+Vitess, Bigtable, caches) → HTML/JSON for the page.

What breaks: Fan-out RPC latency, not CPU on a single Python process. One page touches many services (video metadata, channel, comments count, related videos stub, ads slot, etc.).

Stack (as described in Google-facing material):

Python for much business logic.
C++ / Java for performance-critical paths (processing, low-latency serving).
Go for newer infrastructure such as Vitess itself.

How they scaled it: pre-generate cached HTML fragments, cache Python objects (not only raw DB rows), push hot computed data into process memory. Counterintuitive lesson: adding web servers helped because Python spent much time waiting on RPCs, not burning CPU.

Path B — Video serving (CDN + DNS)

Video bytes are served through Google Media CDN (same family of infrastructure available to cloud customers today). Protocols commonly listed: DASH / HLS adaptive streaming, QUIC / HTTP/3, TLS 1.3, BBR congestion control. The player switches renditions based on bandwidth and buffer health.

Easy to overlook: DNS-based routing. A measurement study of YouTube’s delivery network reported:

A relatively flat video ID space.
Multiple DNS namespaces reflecting a multi-tier logical cache hierarchy.
Video IDs map to logical servers, then to physical cache locations via DNS.

That means YouTube can add capacity or rebalance load by updating DNS mappings—without redeploying application code. Serving path: edge cache → on miss, regional tier → on miss, origin. Most requests never touch origin.

Path C — Thumbnails (the small-file problem)

A watch page can show on the order of ~60 thumbnails. That is a huge number of requests for tiny objects. Early filesystem approaches suffered inode cache thrashing, directory limits (e.g. ext3-era limits), and brutal warmup times—the “billions of tiny files” nightmare.

Post-acquisition architecture notes describe Bigtable used to replicate thumbnails across data centers in wide-column, key-value patterns. Bigtable “clumps” data so distributed multi-level caching across sites works; you avoid per-file filesystem pain. Bigtable-family stores also show up for video metadata at scale, user activity logs, and time-series style data.

Path D — Databases and operational metadata

Structured metadata (titles, channels, subscriptions) in MySQL via Vitess; append-only and wide patterns in Bigtable; high-write social signals often on Cassandra-class stores. Boundaries shift over years—what matters is matching the store to the access pattern.

Step 8 — Upload, transcode, and serve (write + process pipes)

Why chunked, resumable uploads exist

Multi-gigabyte files fail on a single POST over mobile networks. Resumable uploads retry one chunk, not the entire file. Chunks are often 256 KB multiples on the public Data API; internal ingest may use 5–50 MB blocks to GCS.

Reserve video_id; Vitess row upload_status=uploading.
Upload bytes to GCS (not through the metadata app tier).
upload.complete event → transcode workers.
Creator sees “done”; renditions catch up asynchronously.

Transcoding

FFmpeg pipelines produce many itags (144p–4K, H.264/VP9/AV1). Hundreds of hours uploaded per minute implies large batch fleets on Borg-class schedulers.

What	Where	Why
Raw + segments	GCS / Colossus-class	Sequential bytes, CDN origin
Titles, channels, ACL	MySQL + Vitess	Transactions, relational queries
Watch history, logs	Bigtable	Append-heavy, wide rows
Search	Inverted index (Elasticsearch-class)	Full-text over billions of docs

Step 9 — Recommendations: two-stage neural ranking

Covington et al., Deep Neural Networks for YouTube Recommendations, describes the industrial pattern: candidate generation (billions → hundreds) then ranking by expected watch time—not clicks alone.

Stage 1: embeddings from watch/search history → fast retrieval of hundreds of ids.
Stage 2: score watch time, freshness, context → final homepage / up-next slate.
Freshness features surface new uploads without long watch history.

flowchart LR
  U[User context embeddings] --> C[Candidate generation billions to hundreds]
  C --> R[Ranking by expected watch time]
  R --> F[Final slate on homepage / up next]

Step 10 — Borg, data centers, and ops at fleet scale

YouTube runs on Google’s Borg cluster manager—latency-sensitive APIs and batch transcode share fleets with Search and Gmail. Kubernetes is the open-source lesson from Borg; inside Google, Borg is the scheduler.

Video files: bandwidth matters more than single-digit ms to one DB region.
Thumbnails: latency-sensitive; replicated via Bigtable; pick nearby replica.
~5–6 large data centers plus CDN colocation for peering and hardware—historical ballpark.

Step 11 — Technical layer: APIs, payloads, and wires

Requirements say what; this section shows how it ships. Three public layers cooperate: Data API (creators), InnerTube (watch/browse), signed CDN URLs (bytes). Internal traffic is mostly gRPC between Python/C++/Java services and Vitess/Bigtable.

Layer	Typical host	Auth	What it does
Data API (v3)	`www.googleapis.com/youtube/v3` `www.googleapis.com/upload/youtube/v3`	OAuth 2.0 (`Authorization: Bearer`)	Upload, update metadata, list channels, comments (creator tools & partners)
InnerTube	`youtubei.googleapis.com/youtubei/v1`	Session cookie + `X-Goog-Api-Key` / visitor id	Watch page player payload, homepage browse, search, “up next”
Media CDN	`*.googlevideo.com` (DNS-steered)	Short-lived signed query params (`expire`, `sig`, `sparams`)	DASH/HLS segment delivery, range GETs, ABR switches
Object storage	GCS (`storage.googleapis.com`) — internal upload path	Signed PUT URL from upload service	Raw upload blobs; origin for cold CDN misses

Upload — resumable session (Data API, documented)

The public resumable upload protocol is the best-documented version of what internal pipelines do: metadata first, bytes second, completion event third. Chunk size on the wire is often 256 KB multiples (API requirement for chunked mode); edge ingest may use larger 5–50 MB blocks before landing in GCS.

Step	HTTP	Success	Notes
1 — start session	`POST /upload/youtube/v3/videos?uploadType=resumable&part=snippet,status,contentDetails`	`200` + `Location:` session URI	Body = JSON video resource; headers `X-Upload-Content-Length`, `X-Upload-Content-Type: video/*`
2 — upload bytes	`PUT {Location}`	`201` + `videos` resource	Full file or chunked `Content-Range: bytes start-end/total`
3 — probe / resume	`PUT {Location}` with `Content-Range: bytes */total`	`308 Resume Incomplete` + `Range:` header	Resume from byte after last confirmed `Range` upper bound

Step 1 — initiate session (metadata only in body):

POST /upload/youtube/v3/videos?uploadType=resumable&part=snippet,status,contentDetails HTTP/1.1
Host: www.googleapis.com
Authorization: Bearer ya29…
Content-Type: application/json; charset=UTF-8
X-Upload-Content-Length: 2147483648
X-Upload-Content-Type: video/mp4

{
  "snippet": {
    "title": "Architecture walkthrough",
    "description": "Chunked upload demo",
    "tags": ["system-design"],
    "categoryId": "28"
  },
  "status": {
    "privacyStatus": "unlisted",
    "selfDeclaredMadeForKids": false
  }
}

→ 200 OK
Location: https://www.googleapis.com/upload/youtube/v3/videos?uploadType=resumable&upload_id=…

Step 2 — first chunk (256 KB-aligned when using chunked mode):

PUT https://www.googleapis.com/upload/youtube/v3/videos?uploadType=resumable&upload_id=… HTTP/1.1
Authorization: Bearer ya29…
Content-Length: 262144
Content-Type: video/mp4
Content-Range: bytes 0-262143/2147483648

<262144 bytes of MP4>

→ 308 Resume Incomplete
Range: bytes=0-262143

Internal parallel path (studio-scale): an upload gateway often returns { videoId, uploadSessionId, gcsSignedPutUrls[] } so the browser uploads directly to object storage (PUT https://storage.googleapis.com/bucket/raw/{videoId}/part-0007) without pinning terabytes through Python app servers. On final part, storage emits upload.complete to a queue; Vitess row moves upload_status: uploading → processing.

Watch page — InnerTube `player` (what clients actually call)

Opening /watch?v=VIDEO_ID does not load all metadata from one HTML document server-side. The SPA calls InnerTube with a rich context object (client name, version, hl, gl) and receives streaming URLs, captions, and playability in one JSON blob.

Operation	HTTP	Success	Returns
Player payload	`POST /youtubei/v1/player`	`200` JSON	`streamingData` (formats, `adaptiveFormats`), `playabilityStatus`, `videoDetails`
Related / up next	`POST /youtubei/v1/next`	`200`	Shelf of recommended videos with impression tokens
Home feed browse	`POST /youtubei/v1/browse` (`browseId=FEwhat_to_watch`)	`200`	Personalized shelves (fed by two-stage ranker)
Search	`POST /youtubei/v1/search`	`200`	Results + continuation tokens

POST /youtubei/v1/player HTTP/1.1
Host: youtubei.googleapis.com
Content-Type: application/json
X-Goog-Api-Key: …

{
  "videoId": "dQw4w9WgXcQ",
  "context": {
    "client": { "clientName": "WEB", "clientVersion": "2.20250101.00.00" }
  },
  "playbackContext": {
    "contentPlaybackContext": { "signatureTimestamp": 20321 }
  }
}

→ 200 OK (excerpt)
{
  "playabilityStatus": { "status": "OK" },
  "videoDetails": { "videoId": "…", "title": "…", "lengthSeconds": "212" },
  "streamingData": {
    "expiresInSeconds": "21540",
    "formats": [ … progressive MP4 … ],
    "adaptiveFormats": [
      { "itag": 248, "mimeType": "video/webm; codecs=\"vp9\"",
        "bitrate": 2500000, "width": 1920, "height": 1080,
        "url": "https://rr3---sn-…googlevideo.com/videoplayback?expire=…&sig=…" }
    ]
  }
}

Fan-out behind this one POST: separate backend services load channel ACL, age restrictions, view count (often cached), comment thread stub, and ad decisioning. Python orchestrators aggregate via RPC; latency is dominated by parallel downstream calls, not JSON parsing.

Metadata CRUD — Data API `videos`

Operation	HTTP	Success	Common errors
Get video	`GET /youtube/v3/videos?part=snippet,contentDetails,statistics&id={id}`	`200`	`404` if private to caller
Update metadata	`PUT /youtube/v3/videos?part=snippet,status`	`200`	`403` quota / ACL
List channel uploads	`GET /youtube/v3/search?part=snippet&channelId={id}&order=date`	`200` + `nextPageToken`	`429` quota exceeded
Insert comment	`POST /youtube/v3/commentThreads?part=snippet`	`200`	`403` comments disabled

Playback — CDN segment delivery (not JSON)

URLs from streamingData.adaptiveFormats[].url hit Media CDN edge nodes. Adaptive players (DASH or HLS) fetch an init segment + media segments; the player switches itag when buffer or bandwidth changes.

Request	Typical method	Response	Purpose
Init segment	`GET …/init.mp4?…`	`200` small fMP4 header	Codec config (SPS/PPS, etc.)
Media segment	`GET …/seg-42.m4s?…` or byte range	`200` / `206 Partial Content`	2–10 s of encoded video per segment
DNS steering	resolve `rr{N}---sn-….googlevideo.com`	A/AAAA to edge POP	Video id → logical cache tier → physical POP

# Example segment fetch (signed URL truncated)
GET /videoplayback?expire=1735689600&ei=…&ip=…&id=o-ABC123&itag=248&source=youtube&requiressl=yes&mh=…&mm=31&mn=sn-…&ms=au&mv=m&signature=…&lsparams=…&lsig=… HTTP/1.1
Host: rr3---sn-abcd7.googlevideo.com
Range: bytes=0-

→ 206 Partial Content
Content-Type: video/webm
Content-Range: bytes 0-1048575/52428800

Protocols on the wire: DASH (.mpd manifest) or HLS (.m3u8) over HTTPS; QUIC/HTTP/3 and BBR on supported clients. Signatures expire—player refreshes player response when URLs near expiresInSeconds.

Recommendations — serving path (conceptual internal contract)

Training lives offline (Covington et al.); serving is a low-latency inference stack. Homepage and watch-page shelves call a ranker service with precomputed user/video embeddings from a feature store (Bigtable / Redis-class).

POST /rec/v1/rank  (internal gRPC/HTTP — illustrative)
{
  "userId": "UC…",
  "surface": "HOME_FEED",
  "candidateVideoIds": ["vid1", "vid2", …],   // stage-1: hundreds
  "context": { "locale": "en-US", "device": "mobile", "hourLocal": 21 }
}

→ {
  "ranked": [
    { "videoId": "vid9", "score": 0.82, "expectedWatchTimeSec": 340 },
    …
  ],
  "impressionToken": "opaque-for-logging"
}

Stage 1 (candidate generation) often runs as a separate call or ANN index lookup (POST /rec/v1/candidates) that returns hundreds of ids in <10 ms; stage 2 (ranking) scores watch-time objective on GPU/TPU pools. Logs feed continuous training pipelines.

Vitess routing and core relational schema

VTGate parses SQL and routes by shard vindex. Common shard keys: video_id for video metadata rows, user_id for channels/subscriptions. Cross-shard queries (admin reports) are expensive—product APIs are designed to hit one shard per request.

-- videos (shard key: video_id)
CREATE TABLE videos (
  video_id        CHAR(11) PRIMARY KEY,
  channel_id      CHAR(24) NOT NULL,
  title           VARCHAR(100) NOT NULL,
  description     TEXT,
  upload_status   ENUM('uploading','processing','ready','failed') NOT NULL,
  privacy         ENUM('public','unlisted','private') NOT NULL,
  duration_sec    INT UNSIGNED,
  created_at      TIMESTAMP NOT NULL,
  published_at    TIMESTAMP NULL,
  KEY idx_channel_created (channel_id, created_at)
);

-- renditions produced by transcode workers (many rows per video)
CREATE TABLE video_renditions (
  video_id     CHAR(11) NOT NULL,
  itag         SMALLINT NOT NULL,
  codec        VARCHAR(16) NOT NULL,
  width        SMALLINT,
  height       SMALLINT,
  bitrate_bps  INT UNSIGNED,
  gcs_path     VARCHAR(512) NOT NULL,
  PRIMARY KEY (video_id, itag)
);

State machine (upload_status):

stateDiagram-v2
  [*] --> uploading: resumable POST + PUT chunks
  uploading --> processing: upload.complete event
  processing --> ready: renditions in CDN
  processing --> failed: transcode error
  ready --> [*]
  failed --> [*]

uploading — chunks accepted; stub row in Vitess; GCS incomplete OK.
processing — raw object closed; transcode queue consumed; not yet playable on all itags.
ready — minimum rendition set in CDN; player returns playabilityStatus.status = OK.
failed — poison transcode; dead-letter queue; creator sees retry UI.

Async pipeline — queue message shape

After GCS finalize, a compact event triggers Borg-scheduled FFmpeg workers:

{
  "eventType": "upload.complete",
  "videoId": "dQw4w9WgXcQ",
  "rawObjectUri": "gs://yt-uploads/raw/dQw4w9WgXcQ/source.mp4",
  "bytes": 2147483648,
  "checksumSha256": "a1b2…",
  "uploadedAt": "2026-05-19T12:04:11Z"
}
→ transcode worker claims job → writes renditions → updates video_renditions
→ sets upload_status='ready' → warms CDN popular itags

Thumbnails — Bigtable access pattern

~60 thumbnails per watch page ⇒ ~60 point reads. Row key design (conceptual): thumb#{video_id}#{variant} → column image/jpeg bytes. VTGate is bypassed; a thumbnail service batches Bigtable reads and sets Cache-Control on HTTP responses at the edge.

Rate limits, quotas, and errors you will see

Data API: daily quota units per project; 403 quotaExceeded; use exponential backoff on 500/503.
Upload: max file 256 GB (documented Data API limit); 308 + Range for resume; session URI expires → restart from step 1.
CDN: 403 on expired signature → refresh /player; 404 on cold rare video → origin pull then fill edge.
InnerTube: playabilityStatus.status = LOGIN_REQUIRED / UNPLAYABLE for region or policy blocks—not transport errors.

flowchart TD
  C[Client] --> DNS[DNS maps video id to cache tier]
  DNS --> E[Edge cache]
  E -->|miss| R[Regional cache]
  R -->|miss| O[Origin / GCS segments]
  C --> API[Watch page APIs]
  API --> V[(Vitess MySQL)]
  API --> B[(Bigtable metadata / thumbs)]

Step 12 — Reliability, security, and observability

Reliability

Resumable uploads — 308 + Range so mobile networks do not force a full re-upload.
Vitess resharding — hot channels move without taking down the fleet.
CDN DNS shifts — rebalance load without redeploying player code.
Idempotent APIs — duplicate clientRequestId on creator tools must not create two videos.

Security and policy

OAuth2 on Data API; session + API key on InnerTube; short-lived signed CDN URLs.
Quota units per project; rate limits protect shared control planes.
Geo and age policy surfaced as playabilityStatus, not mysterious 500 errors.

Observability

Trace one watch: player → segment GETs → log itag, cache tier, TTFB.
Metrics: upload queue depth, transcode lag, CDN hit ratio, recommendation serve latency.
upload.complete and trip.events-class logs for warehouse training and abuse detection.

Step 13 — Debug playback latency (hands-on)

Measure where time goes on a segment fetch—the same breakdown applies to any CDN-backed media API:

curl -w "@-" -o /dev/null -s "https://example.com/segment" <<'EOF'
    time_namelookup:  %{time_namelookup}s (DNS)
       time_connect:  %{time_connect}s (TCP)
    time_appconnect:  %{time_appconnect}s (TLS)
 time_starttransfer:  %{time_starttransfer}s (TTFB)
         time_total:  %{time_total}s
EOF

Swap in a real segment URL. Compare regions. DNS slow? TLS slow? First byte slow? That mirrors how you reason about CDN vs. origin issues.

Step 14 — Goals → knobs (quick reference)

Goal	Knob
Playback starts quickly	CDN edge caching, popular content pre-warmed, ABR starting at low bitrate
Upload survives bad networks	Chunked resumable uploads, direct-to-object-storage, parallel chunks
Metadata stays consistent	MySQL transactions on Vitess shards, clear video state machine (uploading → processing → ready)
Homepage feels personal	Two-stage ML, watch-time objective, freshness features, online feature pipelines
Ops at huge scale	Vitess resharding, DNS traffic shifts, Borg scheduling, automated failover

Step 15 — Close the loop (what to practice)

On a whiteboard: three pipes, one upload story, one watch story, label Vitess vs GCS vs CDN on each step.

Out loud: five functional requirements and which non-functional target applies to bytes vs metadata.

With the technical section: trace POST /youtubei/v1/player through to the first 206 segment response.

The one line to remember

YouTube is three systems behind one play button: ingest (resumable write to object storage), process (transcode and index), and serve (Vitess metadata + CDN bytes + two-stage recommendations). Respect the pipe boundaries and the design stays teachable at billion-user scale.

Step 0 — How we will work through the problem

Step 1 — Functional requirements (what creators and viewers need)

Functional details easy to skip (but worth stating clearly)

Step 2 — Non-functional requirements (how good “good” must be)

Step 3 — Napkin math (why the shape of the system is inevitable)

Step 4 — The three pipes (draw this first)

Step 5 — Walk one upload and one watch

Upload story

Watch story

Step 6 — Bend MySQL with Vitess instead of abandoning it

Step 6 (continued) — How Vitess works

VTGate (query router)

VTTablet (per-shard agent)

Sharding and resharding

Database evolution (three stages you can draw)

Step 7 — Four request paths (different traffic, different bottleneck)

Path A — Watch page (metadata + shell)

Path B — Video serving (CDN + DNS)

Path C — Thumbnails (the small-file problem)

Path D — Databases and operational metadata

Step 8 — Upload, transcode, and serve (write + process pipes)

Why chunked, resumable uploads exist

Transcoding

Step 9 — Recommendations: two-stage neural ranking

Step 10 — Borg, data centers, and ops at fleet scale

Step 11 — Technical layer: APIs, payloads, and wires

Upload — resumable session (Data API, documented)

Watch page — InnerTube player (what clients actually call)

Metadata CRUD — Data API videos

Playback — CDN segment delivery (not JSON)

Recommendations — serving path (conceptual internal contract)

Vitess routing and core relational schema

Async pipeline — queue message shape

Thumbnails — Bigtable access pattern

Rate limits, quotas, and errors you will see

Step 12 — Reliability, security, and observability

Reliability

Security and policy

Observability

Step 13 — Debug playback latency (hands-on)

Step 14 — Goals → knobs (quick reference)

Step 15 — Close the loop (what to practice)

The one line to remember

Watch page — InnerTube `player` (what clients actually call)

Metadata CRUD — Data API `videos`