How YouTube works at scale
If you only remember the red play button, you will design the wrong system. YouTube is not “Netflix with comments.” It is a platform that must ingest enormous files on unreliable networks, process them into dozens of renditions, serve bytes from edge caches worldwide, and personalize what billions of people see next—all at once.
We work through the design in order: requirements first, numbers second, architecture third, APIs last. By the end you should explain what each layer does, why Vitess and Bigtable coexist, and how upload, watch, and homepage traffic differ on the wire.
What you should be able to do after reading:
- Separate the three pipes—write, process, read—and assign the right database to each.
- List functional and non-functional requirements with realistic latency and consistency targets.
- Walk one upload and one watch, naming resumable upload,
upload.complete, InnerTubeplayer, and CDN segment fetches. - Describe two-stage recommendations (candidates → rank by watch time) and why clicks alone are a bad objective.
- Read the technical section: Data API, InnerTube,
googlevideo.com, Vitess schema, queue events.
Step 0 — How we will work through the problem
Ordered thinking beats memorizing a diagram. Use this sequence whenever you design a video platform at scale:
- Clarify scope. Upload only? Watch page? Homepage recommendations? Live streaming is a different system—say if it is out of scope today.
- Write requirements. Functional = what creators and viewers see. Non-functional = latency, availability, consistency, cost.
- Do napkin math. Hours uploaded per minute, read:write ratio, segment QPS—so nobody assumes one MySQL instance is enough.
- Draw three pipes before naming Kafka, Vitess, or CDN tiers.
- Tell two stories—one upload, one watch—then cover failures (resume, cold CDN miss, viral comment spike).
flowchart LR
subgraph write [Write path]
U[Upload chunks] --> GCS[Object storage]
GCS --> K[Event queue]
end
subgraph process [Processing path]
K --> T[Transcode FFmpeg]
T --> Segments[Many renditions]
end
subgraph read [Read path]
Segments --> CDN[Media CDN]
CDN --> P[Player ABR]
Meta[(MySQL Vitess + Bigtable)] --> Page[Watch page APIs]
Rec[2-stage ML] --> Home[Homepage]
end
Step 1 — Functional requirements (what creators and viewers need)
Functional requirements are behaviors the product must ship. Missing one is a product bug, not a performance tweak.
| Actor | Requirement | Why scale makes it hard |
|---|---|---|
| Creator | Upload multi-GB video; resume after network drop | Cannot stream whole file through one app server |
| Creator | Edit title, privacy, thumbnail; see processing status | State machine + many renditions behind one video id |
| Viewer | Play with adaptive quality (360p–4K/HDR) | Many itags, CDN paths, signed URLs that expire |
| Viewer | Search and browse billions of items | Inverted index separate from SQL metadata |
| Viewer | Personalized home feed and “up next” | Two-stage ML under millisecond budgets |
| Viewer | Engage: like, comment, subscribe, notifications | Viral spikes on comments—not steady CRUD |
| Platform | Monetization, ads, rights, geo policy | Policy checks on watch path fan-out |
| Ops | Live streaming (often phase two) | RTMP ingest, segment latency, chat—different pipe |
Functional details easy to skip (but worth stating clearly)
“Upload complete” ≠ “every rendition ready.” The UI can show done while transcoding still runs; playback uses whichever itags exist.
Watch page ≠ one database query. Metadata, ads, comments count, related shelf, and player URLs come from parallel services.
Video id is the spine. Same id ties Vitess row, GCS objects, CDN segments, search doc, and recommendation features.
Step 2 — Non-functional requirements (how good “good” must be)
| Category | Target (typical) | Design consequence | If we miss it |
|---|---|---|---|
| Availability — APIs | 99.9%+ core metadata | Vitess failover, redundant edge | Site-wide errors on watch |
| Availability — bytes | CDN multi-tier + DNS steering | Edge → regional → origin | Buffering, playback failures |
| Latency — metadata | Low hundreds of ms for page APIs | Cache fragments, parallel RPC | Sluggish watch page shell |
| Latency — playback | Time-to-first-frame ~2s (hot content) | ABR starts low, edge hit | Viewers abandon |
| Throughput — upload | 500+ hours uploaded/minute (order of magnitude) | Direct-to-GCS, async transcode | Creator backlog |
| Read:write — metadata | ~100:1 serves vs writes | Sharding + caches, not one master | Replica lag, connection storms |
| CDN hit ratio | 95%+ for popular content | DNS maps video id to warm edges | Origin meltdown |
| Consistency — money/metadata | Strong on video state, counts | Transactions on Vitess shard | Wrong privacy, broken ACL |
| Consistency — recommendations | Eventual feature freshness OK | Stream watch history to training | Slightly stale home feed |
| Durability | Exabyte-class; no silent loss | GCS + replicated DB | Permanent creator data loss |
| Cost | Egress dominates | CDN hit ratio, codec efficiency (AV1) | Unsustainable unit economics |
Key idea: Video bytes favor bandwidth and cache hit rate; video metadata favors shardable SQL and fan-out control. Do not store every segment request in MySQL.
Step 3 — Napkin math (why the shape of the system is inevitable)
Public material cites ballpark figures (they drift year to year). Use them to justify architecture, not as financial statements.
- 2+ billion signed-in users per month.
- 500 hours of video uploaded every minute → transcode fleet, not synchronous upload handlers.
- 1 billion hours watched per day → CDN and recommendation load dominate.
- 100:1 metadata read:write → caches, Vitess, avoid single-writer myths.
- Media CDN in 1,300+ cities, multi-tier cache, DNS-steered
googlevideo.com.
Sanity check: one viral watch page with ~60 thumbnails is ~60 small object reads—filesystems and naive BLOB columns fail; Bigtable-style wide rows exist for that pattern.
Honest ranges beat fake precision when you extrapolate to your own product.
Step 4 — The three pipes (draw this first)
Start by drawing three parallel pipes. Merging them into one “YouTube service” is how designs end up streaming gigabytes through Python metadata servers.
- Write path — resumable upload → object storage → event.
- Process path — transcode, index, thumbnail generation (async, CPU-heavy).
- Read path — metadata APIs, CDN segments, recommendations.
Step 5 — Walk one upload and one watch
Upload story
- Creator starts resumable session →
POST …/upload/youtube/v3/videos?uploadType=resumable. - Chunks
PUTto session URL (or signed GCS URLs internally) →308resume on failure. - Stub row in Vitess:
upload_status=uploading. - GCS finalize →
upload.completeon queue → Borg/FFmpeg workers. - UI shows “complete”; background sets
processing→readyas renditions land.
Watch story
- Viewer opens
/watch?v=VIDEO_ID→ SPA callsPOST /youtubei/v1/player. - Response carries
streamingData.adaptiveFormats[]with signedgooglevideo.comURLs. - Player fetches init + media segments (DASH/HLS); ABR switches
itagon bandwidth. - Parallel:
/nextfor recommendations; Vitess/Bigtable for title, channel, comments stub.
Replay with failures: expired signature (refresh player), cold video (origin fill), upload resume after disconnect.
Step 6 — Bend MySQL with Vitess instead of abandoning it
Many scaling stories follow one arc: “We outgrew MySQL, so we moved to NoSQL.” YouTube’s documented story is different. They started with a single MySQL instance. Growth caused:
- Replication lag under heavy writes (async replication, single-threaded apply on replicas).
- Connection exhaustion (too many app connections overwhelming MySQL).
- Tables too large to scale vertically forever.
So YouTube built Vitess—a clustering layer for MySQL—rather than throwing away relational semantics for everything. Vitess has been described as a core part of YouTube’s database infrastructure since 2011, growing to tens of thousands of MySQL nodes (not “a few big servers”—tens of thousands of instances).
In plain terms: Vitess sits between your app and many MySQL shards. Your app still speaks SQL; Vitess figures out which shard owns the row, pools connections, and blocks runaway queries.
Step 6 (continued) — How Vitess works
VTGate (query router)
- Routes queries to the correct shard automatically (you shard by user id, video id, etc.).
- Connection pooling so thousands of app processes do not each open a raw MySQL connection storm.
- Query safety: can kill dangerous or long-running queries before they take down a shard.
VTTablet (per-shard agent)
- Manages individual MySQL instances on that shard.
- Row-level caching (e.g. via Memcached) with invalidation driven by the MySQL replication stream.
- Automated failover and backups with less manual DBA toil.
Sharding and resharding
Split or merge shards with minimal downtime—critical when one shard (a celebrity channel, a viral topic) becomes hot. Reported operational wins from this architecture include better cache locality, less disk thrashing, improved hardware efficiency, and replica lag driven toward zero compared with one giant MySQL.
Database evolution (three stages you can draw)
- Single MySQL + read replicas — worked until write load and lag broke the model.
- Vertical split — different databases for different table families (e.g. users vs. video metadata). Bought time, did not remove the ceiling.
- Horizontal sharding + Vitess — many MySQL instances; Vitess handles routing, pooling, safety, resharding, failover.
Why sharding won:
- Cache locality: each shard has its own working set—less fighting over one buffer pool.
- I/O isolation: hot user tables do not starve unrelated video metadata queries on the same disk.
- Blast radius: one bad shard does not take down the entire fleet.
- Horizontal growth: add shards instead of buying impossible single-box machines.
Step 7 — Four request paths (different traffic, different bottleneck)
YouTube’s traffic is not one API. Name the question each path answers before you name the database.
Path A — Watch page (metadata + shell)
Flow: Client → load balancer → backend services → data layer (MySQL+Vitess, Bigtable, caches) → HTML/JSON for the page.
What breaks: Fan-out RPC latency, not CPU on a single Python process. One page touches many services (video metadata, channel, comments count, related videos stub, ads slot, etc.).
Stack (as described in Google-facing material):
- Python for much business logic.
- C++ / Java for performance-critical paths (processing, low-latency serving).
- Go for newer infrastructure such as Vitess itself.
How they scaled it: pre-generate cached HTML fragments, cache Python objects (not only raw DB rows), push hot computed data into process memory. Counterintuitive lesson: adding web servers helped because Python spent much time waiting on RPCs, not burning CPU.
Path B — Video serving (CDN + DNS)
Video bytes are served through Google Media CDN (same family of infrastructure available to cloud customers today). Protocols commonly listed: DASH / HLS adaptive streaming, QUIC / HTTP/3, TLS 1.3, BBR congestion control. The player switches renditions based on bandwidth and buffer health.
Easy to overlook: DNS-based routing. A measurement study of YouTube’s delivery network reported:
- A relatively flat video ID space.
- Multiple DNS namespaces reflecting a multi-tier logical cache hierarchy.
- Video IDs map to logical servers, then to physical cache locations via DNS.
That means YouTube can add capacity or rebalance load by updating DNS mappings—without redeploying application code. Serving path: edge cache → on miss, regional tier → on miss, origin. Most requests never touch origin.
Path C — Thumbnails (the small-file problem)
A watch page can show on the order of ~60 thumbnails. That is a huge number of requests for tiny objects. Early filesystem approaches suffered inode cache thrashing, directory limits (e.g. ext3-era limits), and brutal warmup times—the “billions of tiny files” nightmare.
Post-acquisition architecture notes describe Bigtable used to replicate thumbnails across data centers in wide-column, key-value patterns. Bigtable “clumps” data so distributed multi-level caching across sites works; you avoid per-file filesystem pain. Bigtable-family stores also show up for video metadata at scale, user activity logs, and time-series style data.
Path D — Databases and operational metadata
Structured metadata (titles, channels, subscriptions) in MySQL via Vitess; append-only and wide patterns in Bigtable; high-write social signals often on Cassandra-class stores. Boundaries shift over years—what matters is matching the store to the access pattern.
Step 8 — Upload, transcode, and serve (write + process pipes)
Why chunked, resumable uploads exist
Multi-gigabyte files fail on a single POST over mobile networks. Resumable uploads retry one chunk, not the entire file. Chunks are often 256 KB multiples on the public Data API; internal ingest may use 5–50 MB blocks to GCS.
- Reserve
video_id; Vitess rowupload_status=uploading. - Upload bytes to GCS (not through the metadata app tier).
upload.completeevent → transcode workers.- Creator sees “done”; renditions catch up asynchronously.
Transcoding
FFmpeg pipelines produce many itags (144p–4K, H.264/VP9/AV1). Hundreds of hours uploaded per minute implies large batch fleets on Borg-class schedulers.
| What | Where | Why |
|---|---|---|
| Raw + segments | GCS / Colossus-class | Sequential bytes, CDN origin |
| Titles, channels, ACL | MySQL + Vitess | Transactions, relational queries |
| Watch history, logs | Bigtable | Append-heavy, wide rows |
| Search | Inverted index (Elasticsearch-class) | Full-text over billions of docs |
Step 9 — Recommendations: two-stage neural ranking
Covington et al., Deep Neural Networks for YouTube Recommendations, describes the industrial pattern: candidate generation (billions → hundreds) then ranking by expected watch time—not clicks alone.
- Stage 1: embeddings from watch/search history → fast retrieval of hundreds of ids.
- Stage 2: score watch time, freshness, context → final homepage / up-next slate.
- Freshness features surface new uploads without long watch history.
flowchart LR
U[User context embeddings] --> C[Candidate generation billions to hundreds]
C --> R[Ranking by expected watch time]
R --> F[Final slate on homepage / up next]
Step 10 — Borg, data centers, and ops at fleet scale
YouTube runs on Google’s Borg cluster manager—latency-sensitive APIs and batch transcode share fleets with Search and Gmail. Kubernetes is the open-source lesson from Borg; inside Google, Borg is the scheduler.
- Video files: bandwidth matters more than single-digit ms to one DB region.
- Thumbnails: latency-sensitive; replicated via Bigtable; pick nearby replica.
- ~5–6 large data centers plus CDN colocation for peering and hardware—historical ballpark.
Step 11 — Technical layer: APIs, payloads, and wires
Requirements say what; this section shows how it ships. Three public layers cooperate: Data API (creators), InnerTube (watch/browse), signed CDN URLs (bytes). Internal traffic is mostly gRPC between Python/C++/Java services and Vitess/Bigtable.
| Layer | Typical host | Auth | What it does |
|---|---|---|---|
| Data API (v3) | www.googleapis.com/youtube/v3www.googleapis.com/upload/youtube/v3 |
OAuth 2.0 (Authorization: Bearer) |
Upload, update metadata, list channels, comments (creator tools & partners) |
| InnerTube | youtubei.googleapis.com/youtubei/v1 |
Session cookie + X-Goog-Api-Key / visitor id |
Watch page player payload, homepage browse, search, “up next” |
| Media CDN | *.googlevideo.com (DNS-steered) |
Short-lived signed query params (expire, sig, sparams) |
DASH/HLS segment delivery, range GETs, ABR switches |
| Object storage | GCS (storage.googleapis.com) — internal upload path |
Signed PUT URL from upload service | Raw upload blobs; origin for cold CDN misses |
Upload — resumable session (Data API, documented)
The public resumable upload protocol is the best-documented version of what internal pipelines do: metadata first, bytes second, completion event third. Chunk size on the wire is often 256 KB multiples (API requirement for chunked mode); edge ingest may use larger 5–50 MB blocks before landing in GCS.
| Step | HTTP | Success | Notes |
|---|---|---|---|
| 1 — start session | POST /upload/youtube/v3/videos?uploadType=resumable&part=snippet,status,contentDetails |
200 + Location: session URI |
Body = JSON video resource; headers X-Upload-Content-Length, X-Upload-Content-Type: video/* |
| 2 — upload bytes | PUT {Location} |
201 + videos resource |
Full file or chunked Content-Range: bytes start-end/total |
| 3 — probe / resume | PUT {Location} with Content-Range: bytes */total |
308 Resume Incomplete + Range: header |
Resume from byte after last confirmed Range upper bound |
Step 1 — initiate session (metadata only in body):
POST /upload/youtube/v3/videos?uploadType=resumable&part=snippet,status,contentDetails HTTP/1.1
Host: www.googleapis.com
Authorization: Bearer ya29…
Content-Type: application/json; charset=UTF-8
X-Upload-Content-Length: 2147483648
X-Upload-Content-Type: video/mp4
{
"snippet": {
"title": "Architecture walkthrough",
"description": "Chunked upload demo",
"tags": ["system-design"],
"categoryId": "28"
},
"status": {
"privacyStatus": "unlisted",
"selfDeclaredMadeForKids": false
}
}
→ 200 OK
Location: https://www.googleapis.com/upload/youtube/v3/videos?uploadType=resumable&upload_id=…
Step 2 — first chunk (256 KB-aligned when using chunked mode):
PUT https://www.googleapis.com/upload/youtube/v3/videos?uploadType=resumable&upload_id=… HTTP/1.1 Authorization: Bearer ya29… Content-Length: 262144 Content-Type: video/mp4 Content-Range: bytes 0-262143/2147483648 <262144 bytes of MP4> → 308 Resume Incomplete Range: bytes=0-262143
Internal parallel path (studio-scale): an upload gateway often returns
{ videoId, uploadSessionId, gcsSignedPutUrls[] } so the browser uploads directly to object storage
(PUT https://storage.googleapis.com/bucket/raw/{videoId}/part-0007) without pinning terabytes through Python app servers.
On final part, storage emits upload.complete to a queue; Vitess row moves upload_status: uploading → processing.
Watch page — InnerTube player (what clients actually call)
Opening /watch?v=VIDEO_ID does not load all metadata from one HTML document server-side.
The SPA calls InnerTube with a rich context object (client name, version, hl, gl) and receives streaming URLs, captions, and playability in one JSON blob.
| Operation | HTTP | Success | Returns |
|---|---|---|---|
| Player payload | POST /youtubei/v1/player |
200 JSON |
streamingData (formats, adaptiveFormats), playabilityStatus, videoDetails |
| Related / up next | POST /youtubei/v1/next |
200 |
Shelf of recommended videos with impression tokens |
| Home feed browse | POST /youtubei/v1/browse (browseId=FEwhat_to_watch) |
200 |
Personalized shelves (fed by two-stage ranker) |
| Search | POST /youtubei/v1/search |
200 |
Results + continuation tokens |
POST /youtubei/v1/player HTTP/1.1
Host: youtubei.googleapis.com
Content-Type: application/json
X-Goog-Api-Key: …
{
"videoId": "dQw4w9WgXcQ",
"context": {
"client": { "clientName": "WEB", "clientVersion": "2.20250101.00.00" }
},
"playbackContext": {
"contentPlaybackContext": { "signatureTimestamp": 20321 }
}
}
→ 200 OK (excerpt)
{
"playabilityStatus": { "status": "OK" },
"videoDetails": { "videoId": "…", "title": "…", "lengthSeconds": "212" },
"streamingData": {
"expiresInSeconds": "21540",
"formats": [ … progressive MP4 … ],
"adaptiveFormats": [
{ "itag": 248, "mimeType": "video/webm; codecs=\"vp9\"",
"bitrate": 2500000, "width": 1920, "height": 1080,
"url": "https://rr3---sn-…googlevideo.com/videoplayback?expire=…&sig=…" }
]
}
}
Fan-out behind this one POST: separate backend services load channel ACL, age restrictions, view count (often cached), comment thread stub, and ad decisioning. Python orchestrators aggregate via RPC; latency is dominated by parallel downstream calls, not JSON parsing.
Metadata CRUD — Data API videos
| Operation | HTTP | Success | Common errors |
|---|---|---|---|
| Get video | GET /youtube/v3/videos?part=snippet,contentDetails,statistics&id={id} |
200 |
404 if private to caller |
| Update metadata | PUT /youtube/v3/videos?part=snippet,status |
200 |
403 quota / ACL |
| List channel uploads | GET /youtube/v3/search?part=snippet&channelId={id}&order=date |
200 + nextPageToken |
429 quota exceeded |
| Insert comment | POST /youtube/v3/commentThreads?part=snippet |
200 |
403 comments disabled |
Playback — CDN segment delivery (not JSON)
URLs from streamingData.adaptiveFormats[].url hit Media CDN edge nodes.
Adaptive players (DASH or HLS) fetch an init segment + media segments; the player switches itag when buffer or bandwidth changes.
| Request | Typical method | Response | Purpose |
|---|---|---|---|
| Init segment | GET …/init.mp4?… |
200 small fMP4 header |
Codec config (SPS/PPS, etc.) |
| Media segment | GET …/seg-42.m4s?… or byte range |
200 / 206 Partial Content |
2–10 s of encoded video per segment |
| DNS steering | resolve rr{N}---sn-….googlevideo.com |
A/AAAA to edge POP | Video id → logical cache tier → physical POP |
# Example segment fetch (signed URL truncated) GET /videoplayback?expire=1735689600&ei=…&ip=…&id=o-ABC123&itag=248&source=youtube&requiressl=yes&mh=…&mm=31&mn=sn-…&ms=au&mv=m&signature=…&lsparams=…&lsig=… HTTP/1.1 Host: rr3---sn-abcd7.googlevideo.com Range: bytes=0- → 206 Partial Content Content-Type: video/webm Content-Range: bytes 0-1048575/52428800
Protocols on the wire: DASH (.mpd manifest) or HLS (.m3u8) over HTTPS;
QUIC/HTTP/3 and BBR on supported clients. Signatures expire—player refreshes player response when URLs near expiresInSeconds.
Recommendations — serving path (conceptual internal contract)
Training lives offline (Covington et al.); serving is a low-latency inference stack. Homepage and watch-page shelves call a ranker service with precomputed user/video embeddings from a feature store (Bigtable / Redis-class).
POST /rec/v1/rank (internal gRPC/HTTP — illustrative)
{
"userId": "UC…",
"surface": "HOME_FEED",
"candidateVideoIds": ["vid1", "vid2", …], // stage-1: hundreds
"context": { "locale": "en-US", "device": "mobile", "hourLocal": 21 }
}
→ {
"ranked": [
{ "videoId": "vid9", "score": 0.82, "expectedWatchTimeSec": 340 },
…
],
"impressionToken": "opaque-for-logging"
}
Stage 1 (candidate generation) often runs as a separate call or ANN index lookup (POST /rec/v1/candidates) that returns hundreds of ids in <10 ms;
stage 2 (ranking) scores watch-time objective on GPU/TPU pools. Logs feed continuous training pipelines.
Vitess routing and core relational schema
VTGate parses SQL and routes by shard vindex. Common shard keys: video_id for video metadata rows, user_id for channels/subscriptions.
Cross-shard queries (admin reports) are expensive—product APIs are designed to hit one shard per request.
-- videos (shard key: video_id)
CREATE TABLE videos (
video_id CHAR(11) PRIMARY KEY,
channel_id CHAR(24) NOT NULL,
title VARCHAR(100) NOT NULL,
description TEXT,
upload_status ENUM('uploading','processing','ready','failed') NOT NULL,
privacy ENUM('public','unlisted','private') NOT NULL,
duration_sec INT UNSIGNED,
created_at TIMESTAMP NOT NULL,
published_at TIMESTAMP NULL,
KEY idx_channel_created (channel_id, created_at)
);
-- renditions produced by transcode workers (many rows per video)
CREATE TABLE video_renditions (
video_id CHAR(11) NOT NULL,
itag SMALLINT NOT NULL,
codec VARCHAR(16) NOT NULL,
width SMALLINT,
height SMALLINT,
bitrate_bps INT UNSIGNED,
gcs_path VARCHAR(512) NOT NULL,
PRIMARY KEY (video_id, itag)
);
State machine (upload_status):
stateDiagram-v2
[*] --> uploading: resumable POST + PUT chunks
uploading --> processing: upload.complete event
processing --> ready: renditions in CDN
processing --> failed: transcode error
ready --> [*]
failed --> [*]
uploading— chunks accepted; stub row in Vitess; GCS incomplete OK.processing— raw object closed; transcode queue consumed; not yet playable on all itags.ready— minimum rendition set in CDN;playerreturnsplayabilityStatus.status = OK.failed— poison transcode; dead-letter queue; creator sees retry UI.
Async pipeline — queue message shape
After GCS finalize, a compact event triggers Borg-scheduled FFmpeg workers:
{
"eventType": "upload.complete",
"videoId": "dQw4w9WgXcQ",
"rawObjectUri": "gs://yt-uploads/raw/dQw4w9WgXcQ/source.mp4",
"bytes": 2147483648,
"checksumSha256": "a1b2…",
"uploadedAt": "2026-05-19T12:04:11Z"
}
→ transcode worker claims job → writes renditions → updates video_renditions
→ sets upload_status='ready' → warms CDN popular itags
Thumbnails — Bigtable access pattern
~60 thumbnails per watch page ⇒ ~60 point reads. Row key design (conceptual):
thumb#{video_id}#{variant} → column image/jpeg bytes.
VTGate is bypassed; a thumbnail service batches Bigtable reads and sets Cache-Control on HTTP responses at the edge.
Rate limits, quotas, and errors you will see
- Data API: daily quota units per project;
403 quotaExceeded; use exponential backoff on500/503. - Upload: max file 256 GB (documented Data API limit);
308+Rangefor resume; session URI expires → restart from step 1. - CDN:
403on expired signature → refresh/player;404on cold rare video → origin pull then fill edge. - InnerTube:
playabilityStatus.status = LOGIN_REQUIRED/UNPLAYABLEfor region or policy blocks—not transport errors.
flowchart TD
C[Client] --> DNS[DNS maps video id to cache tier]
DNS --> E[Edge cache]
E -->|miss| R[Regional cache]
R -->|miss| O[Origin / GCS segments]
C --> API[Watch page APIs]
API --> V[(Vitess MySQL)]
API --> B[(Bigtable metadata / thumbs)]
Step 12 — Reliability, security, and observability
Reliability
- Resumable uploads —
308+Rangeso mobile networks do not force a full re-upload. - Vitess resharding — hot channels move without taking down the fleet.
- CDN DNS shifts — rebalance load without redeploying player code.
- Idempotent APIs — duplicate
clientRequestIdon creator tools must not create two videos.
Security and policy
- OAuth2 on Data API; session + API key on InnerTube; short-lived signed CDN URLs.
- Quota units per project; rate limits protect shared control planes.
- Geo and age policy surfaced as
playabilityStatus, not mysterious 500 errors.
Observability
- Trace one watch:
player→ segment GETs → logitag, cache tier, TTFB. - Metrics: upload queue depth, transcode lag, CDN hit ratio, recommendation serve latency.
upload.completeandtrip.events-class logs for warehouse training and abuse detection.
Step 13 — Debug playback latency (hands-on)
Measure where time goes on a segment fetch—the same breakdown applies to any CDN-backed media API:
curl -w "@-" -o /dev/null -s "https://example.com/segment" <<'EOF'
time_namelookup: %{time_namelookup}s (DNS)
time_connect: %{time_connect}s (TCP)
time_appconnect: %{time_appconnect}s (TLS)
time_starttransfer: %{time_starttransfer}s (TTFB)
time_total: %{time_total}s
EOF
Swap in a real segment URL. Compare regions. DNS slow? TLS slow? First byte slow? That mirrors how you reason about CDN vs. origin issues.
Step 14 — Goals → knobs (quick reference)
| Goal | Knob |
|---|---|
| Playback starts quickly | CDN edge caching, popular content pre-warmed, ABR starting at low bitrate |
| Upload survives bad networks | Chunked resumable uploads, direct-to-object-storage, parallel chunks |
| Metadata stays consistent | MySQL transactions on Vitess shards, clear video state machine (uploading → processing → ready) |
| Homepage feels personal | Two-stage ML, watch-time objective, freshness features, online feature pipelines |
| Ops at huge scale | Vitess resharding, DNS traffic shifts, Borg scheduling, automated failover |
Step 15 — Close the loop (what to practice)
On a whiteboard: three pipes, one upload story, one watch story, label Vitess vs GCS vs CDN on each step.
Out loud: five functional requirements and which non-functional target applies to bytes vs metadata.
With the technical section: trace POST /youtubei/v1/player through to the first 206 segment response.
The one line to remember
YouTube is three systems behind one play button: ingest (resumable write to object storage), process (transcode and index), and serve (Vitess metadata + CDN bytes + two-stage recommendations). Respect the pipe boundaries and the design stays teachable at billion-user scale.