Design Netflix-style video streaming
A global video platform must ingest studio masters, produce an encoding ladder for every device and network, distribute terabytes through a CDN, and play smoothly with adaptive bitrate (ABR) so viewers rarely see buffering. Netflix, Disney+, and YouTube solve the same core problem: cheap reads at massive scale and expensive, asynchronous writes (transcode) hidden from the user.
This guide covers the interview arc—requirements, capacity, architecture, metadata, playback APIs, CDN strategy—and dedicated failure points and failure modes sections at the same depth as the URL shortener, payment, and notification guides on this site.
Design prompt
Design a video-on-demand (VOD) streaming service where users browse a catalog and watch movies and shows on web, mobile, and TV.
Support upload and transcoding, multi-bitrate playback, resume position, and regional availability without melting your origin under prime-time load.
What you should be able to do after reading:
- Separate ingest/transcode, origin storage, CDN edge, and client ABR.
- Explain HLS/DASH manifests, segments, and encoding ladder tradeoffs.
- Size concurrent viewers, egress bandwidth, and catalog storage.
- Design playback authorization and signed URLs.
- Map failure points and failure modes (cache miss storm, stale manifest, partial transcode).
1. Requirements gathering
1.1 Functional requirements
- Catalog browse — titles, seasons, episodes, posters, descriptions, genres, ratings.
- Playback — start, pause, seek, resume from last position across devices.
- Upload / ingest (studio or internal ops) — accept mezzanine file (ProRes, IMF, or high-bitrate MP4).
- Transcoding — generate multiple resolutions/bitrates; package as HLS or DASH.
- Adaptive streaming — client switches quality based on bandwidth and buffer.
- Regional catalog — title available only in licensed countries.
- Subtitles & audio tracks — multiple languages per asset.
- DRM (optional) — Widevine/FairPlay for premium content.
- Analytics — watch time, startup time, rebuffer ratio, quality switches.
Usually out of scope unless asked: live sports low-latency stream (LL-HLS), social features, full recommendation ML, building your own CDN, user-generated content moderation at YouTube scale.
1.2 Non-functional requirements
- Scalability — tens of millions of concurrent viewers at peak; egress is the bottleneck.
- Availability — playback 99.99% for CDN path; catalog API 99.9%.
- Latency — time-to-first-frame < 2–3 s p95; rebuffer rate < 0.5% for good UX.
- Quality — maximize watched resolution without constant quality oscillation.
- Durability — masters and encoded assets replicated across regions.
- Cost — CDN egress dominates; cache hit ratio and encoding efficiency matter.
- Security — signed playback URLs; DRM for premium; geo-blocking for licenses.
Assumptions for capacity math: 200M subscribers; 80M daily active viewers (DAU); 2 hours average watch time per DAU; peak concurrent viewers 15M globally; average delivered bitrate 4 Mbps (ABR blend); catalog 150k titles; new ingest 200 hours of content per day.
2. Capacity estimation
2.1 Watch time and segment requests
DAU = 80,000,000
Watch hours per DAU = 2
Total watch hours per day = 160,000,000 hours
At 4 Mbps average:
Bits per day = 160M h × 3600 s × 4 Mbps
≈ 2.3 × 10^15 bits ≈ 288 PB/day delivered (theoretical upper bound)
In practice CDN cache + off-peak smooths; plan egress capacity for peak concurrent load.
2.2 Peak concurrent viewers and egress
Peak concurrent viewers = 15,000,000 Average bitrate = 4 Mbps Peak egress from CDN edges ≈ 15M × 4 Mbps = 60 Tbps (order of magnitude) This is why almost all bytes are served from CDN PoPs, not origin.
Interview tip: state that origin serves cache misses only; design for >95% CDN hit ratio on popular titles.
2.3 Segment request rate
HLS with 4-second segments:
Each viewer requests ~1 segment / 4 sec = 0.25 req/s (video) + manifest refreshes 15M viewers × 0.25 ≈ 3.75M segment GETs/sec peak + audio + subtitle requests + manifest (lower volume)
2.4 Storage (catalog)
Per title (2-hour movie example):
| Asset | Size (approx.) |
|---|---|
| Mezzanine master (4K) | 50–80 GB |
| Encoded ladder (240p–4K, H.264/HEVC) | 8–15 GB |
| Thumbnails, posters, metadata | < 50 MB |
150k titles × ~12 GB encoded avg ≈ 1.8 EB catalog (order of magnitude) Cold titles on object storage (S3/GCS); hot titles prefetched to CDN
2.5 Transcoding throughput
New content per day = 200 hours Assume 6 renditions × real-time factor 0.5 per rendition on GPU farm Worker-hours ≈ 200 × 6 × 2 = 2,400 CPU-hours/day (simplified) Burst: season drop → queue with priority; SLA publish time not real-time
2.6 Infrastructure sizing (starting point)
| Component | Initial sizing |
|---|---|
| Catalog API | Stateless; 50+ instances; heavy caching |
| Playback API | Issues signed manifest URLs; low CPU |
| Transcode workers | Autoscaling GPU pool; job queue (SQS/Celery/K8s jobs) |
| Object storage | Multi-region buckets; lifecycle to cold tier |
| CDN | Multi-CDN or single with global PoPs + origin shield |
| Metadata DB | PostgreSQL + read replicas; search index (Elasticsearch) |
| Watch progress | Cassandra/DynamoDB keyed by user_id |
3. High-level design
- Ingest service — multipart upload to object storage; creates
assetrecord. - Transcode pipeline — workers pull jobs; output segments + manifests to origin bucket.
- Catalog service — metadata, availability by region, artwork URLs.
- Playback service — authz check; returns signed manifest URL + DRM license URL.
- Origin — S3/GCS as source of truth for segments.
- CDN — caches segments and manifests at edge; origin shield reduces origin load.
- Client player — ExoPlayer, AVPlayer, Shaka; ABR algorithm.
- Analytics pipeline — client beacons → Kafka → warehouse.
flowchart TB
subgraph ingest [Ingest and transcode]
UP[Upload API]
S3O[(Origin object store)]
Q[Transcode queue]
TC[Transcode workers]
end
subgraph serve [Playback path]
CAT[Catalog API]
PLAY[Playback API]
CDN[CDN edge PoPs]
CL[Client player]
end
subgraph data [Data stores]
META[(Catalog DB)]
PROG[(Watch progress)]
end
STUDIO[Studio upload] --> UP
UP --> S3O
UP --> Q
Q --> TC
TC --> S3O
CL --> CAT
CAT --> META
CL --> PLAY
PLAY --> META
PLAY --> CDN
CL --> CDN
CDN --> S3O
CL --> PROG
Playback flow
- User selects title → client calls
GET /v1/titles/{id}/playbackwith auth token. - Playback service checks subscription, region license, parental controls.
- Returns signed URL to master manifest (
.m3u8or.mpd) on CDN hostname. - Player fetches manifest → chooses initial rendition → downloads segments from CDN.
- ABR monitors buffer and throughput → switches up/down bitrate.
- Client sends heartbeat every 30 s with position → watch progress store.
sequenceDiagram
participant C as Client
participant P as Playback API
participant CDN as CDN edge
participant O as Origin
C->>P: GET playback session
P-->>C: signed manifest URL
C->>CDN: GET master.m3u8
CDN-->>C: manifest variants
C->>CDN: GET segment_720p_004.ts
alt cache hit
CDN-->>C: segment bytes
else cache miss
CDN->>O: fetch segment
O-->>CDN: segment bytes
CDN-->>C: segment bytes
end
C->>C: ABR switch to 1080p
4. Database design
4.1 Catalog metadata (relational)
CREATE TABLE titles ( id UUID PRIMARY KEY, slug TEXT UNIQUE NOT NULL, type TEXT NOT NULL, -- movie | series title TEXT NOT NULL, description TEXT, release_year INT, rating TEXT, duration_sec INT, status TEXT NOT NULL DEFAULT 'processing', -- processing | published | retired created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() ); CREATE TABLE seasons ( id UUID PRIMARY KEY, series_id UUID NOT NULL REFERENCES titles(id), season_num INT NOT NULL, UNIQUE (series_id, season_num) ); CREATE TABLE episodes ( id UUID PRIMARY KEY, season_id UUID NOT NULL REFERENCES seasons(id), episode_num INT NOT NULL, asset_id UUID NOT NULL, duration_sec INT, UNIQUE (season_id, episode_num) ); CREATE TABLE assets ( id UUID PRIMARY KEY, master_uri TEXT NOT NULL, manifest_uri TEXT, -- CDN path after transcode codec_video TEXT, drm_policy TEXT, transcode_status TEXT NOT NULL DEFAULT 'queued', created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() ); CREATE TABLE regional_availability ( title_id UUID NOT NULL REFERENCES titles(id), country_code CHAR(2) NOT NULL, available_from DATE NOT NULL, available_to DATE, PRIMARY KEY (title_id, country_code) ); CREATE TABLE encoding_renditions ( asset_id UUID NOT NULL REFERENCES assets(id), resolution TEXT NOT NULL, -- 240p | 360p | 720p | 1080p | 4k bitrate_kbps INT NOT NULL, playlist_uri TEXT NOT NULL, PRIMARY KEY (asset_id, resolution) );
4.2 Watch progress (high write volume)
-- Cassandra / DynamoDB
-- PK: user_id, SK: title_id#device_id
{
"user_id": "usr_abc",
"title_id": "ttl_xyz",
"position_sec": 1842,
"duration_sec": 7200,
"updated_at": "2026-05-27T20:15:00Z"
}
Debounce writes (e.g. every 30 s) to avoid 15M writes/sec at peak—batch or sample for analytics separately.
4.3 Search index
Elasticsearch/OpenSearch for full-text and filters (genre, year, actor). Catalog DB is source of truth; index via CDC.
5. API design
5.1 Get title details
GET /v1/titles/{title_id}?country=US
{
"id": "ttl_xyz",
"title": "Example Movie",
"available": true,
"poster_url": "https://cdn.example/posters/ttl_xyz.jpg",
"duration_sec": 7200
}
5.2 Start playback session
POST /v1/playback/sessions
{
"title_id": "ttl_xyz",
"device_id": "dev_iphone",
"max_resolution": "1080p"
}
200 OK
{
"session_id": "ps_991",
"manifest_url": "https://cdn.example/vod/ttl_xyz/master.m3u8?token=...",
"expires_at": "2026-05-27T21:00:00Z",
"drm": {
"type": "widevine",
"license_url": "https://license.example/wv"
},
"resume_position_sec": 1842
}
5.3 Update watch progress
PUT /v1/playback/sessions/{session_id}/progress
{ "position_sec": 1900 }
5.4 Ingest (internal)
POST /v1/assets/ingest → returns upload URL for multipart upload to object storage; on complete, enqueues transcode job.
6. Diving deep into key components
6.1 Encoding ladder (ABR renditions)
Typical ladder for VOD (H.264 example):
| Rendition | Resolution | Video bitrate | Audio |
|---|---|---|---|
| 1 | 384×216 | 400 kbps | 64 kbps AAC |
| 2 | 640×360 | 800 kbps | 96 kbps |
| 3 | 1280×720 | 2.5 Mbps | 128 kbps |
| 4 | 1920×1080 | 5 Mbps | 128 kbps |
| 5 | 3840×2160 | 15 Mbps | 192 kbps |
- GOP alignment — same keyframe interval (e.g. 2 s) across renditions so ABR switches cleanly.
- Segment duration — 2–6 s; shorter = faster adapt, more requests.
- Codecs — H.264 widest compatibility; HEVC/AV1 for bandwidth savings on supported devices.
- Per-title encode — optimize ladder per complexity (animation vs action).
6.2 HLS packaging
# master.m3u8 (simplified) #EXTM3U #EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720 720p/playlist.m3u8 #EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080 1080p/playlist.m3u8 # media playlist: list of .ts or .m4s segments #EXTINF:4.0, segment_00001.ts
DASH (.mpd) is similar; many platforms ship both or pick per platform.
6.3 CDN strategy
- Cache key — path includes asset version; bump version on re-transcode to avoid stale bytes.
- TTL — long TTL for immutable segments; short TTL for live manifests.
- Origin shield — regional shield PoP collapses misses to one origin fetch per segment.
- Prefetch — push blockbuster to edges before release Friday.
- Multi-CDN — failover and negotiate egress cost (advanced).
6.4 Signed URLs and DRM
Manifest and segment URLs carry HMAC token: exp, user_id, title_id, ip_hash. CDN validates at edge (token auth module). DRM: encrypt segments; player requests license from license server after manifest parse.
6.5 Client ABR (conceptual)
Player estimates throughput from last N segments. If buffer > high watermark → switch up; if buffer < low watermark or rebuffer → switch down. Avoid oscillation with hysteresis and minimum dwell time per rendition.
6.6 Regional catalog
Resolve country from account profile + GeoIP on playback. Filter catalog queries and reject playback API if regional_availability denies. Metadata replicated globally; license rules in DB.
6.7 Transcode pipeline reliability
- Upload completes → S3 event → enqueue job with
asset_id. - Worker downloads master to local SSD → ffmpeg/encoder farm → uploads segments.
- Validate output (duration match, no silent audio) → mark asset
published. - Failed job → retry with backoff; permanent fail → alert ops, block publish.
7. Failure points
Failure points are places where faults cause buffering, wrong content, unauthorized playback, or origin overload. Design assuming CDN helps but does not eliminate these risks.
| # | Failure point | What breaks | Detection | Mitigation design |
|---|---|---|---|---|
| FP1 | Multipart upload → object store | Incomplete master file | Transcode produces corrupt output | ETag verification; resume upload; don’t enqueue until complete |
| FP2 | Transcode worker | Crash mid-ladder (only 360p done) | Manifest lists 1080p but segments missing | Atomic publish: all renditions OR status processing; integration tests |
| FP3 | Catalog DB vs CDN | Title published before CDN prefetch | Mass cache miss on launch night | Staged publish; warm CDN; origin shield |
| FP4 | CDN → origin on miss | Thundering herd on new hit show | Origin 503; global buffering | Origin shield, rate limit miss concurrency, prefetch |
| FP5 | Manifest vs segment version | Manifest points to deleted segment path | 404 on segment; player stall | Versioned path per transcode job; CDN cache bust via path v2 |
| FP6 | Playback API → signed URL | Clock skew; expired token mid-movie | Playback stops at 55 min | Long TTL + silent refresh endpoint; segment URLs independent |
| FP7 | DRM license server | License denied | Black screen on 4K only | HA license cluster; fallback to non-4K clear stream if policy allows |
| FP8 | Client ABR | Bad bandwidth estimate | Constant rebuffer or stuck at 240p | Hysteresis; throughput median; cap switch rate |
| FP9 | Single CDN PoP failure | Regional outage | One country rebuffer spike | DNS failover; anycast; multi-CDN |
| FP10 | Watch progress write | DB timeout | Resume from start after crash | Local cache on client; retry queue; merge max(position) |
flowchart LR
UP[Upload] -->|FP1| S3[(Origin)]
S3 -->|FP2| TC[Transcode]
TC --> S3
PUB[Publish] -->|FP3| CDN[CDN]
CDN -->|FP4| S3
MAN[Manifest] -->|FP5| CDN
API[Playback API] -->|FP6| CL[Client]
DRM[License server] -->|FP7| CL
CL -->|FP8| CL
POP[CDN PoP] -->|FP9| CL
CL -->|FP10| PROG[(Progress DB)]
8. Failure modes
Failure modes are recurring patterns interviewers expect you to name—with user impact and safe response.
8.1 Cache miss storm (origin meltdown)
Symptom: New season drops; millions buffer; origin CPU 100%.
Cause: FP3, FP4 — no prefetch; viral title cold on CDN.
Safe response: Prefetch to edges; origin shield; cap concurrent miss fetches; temporarily reduce ladder max bitrate.
8.2 Partial transcode published
Symptom: 1080p option plays 5 seconds then stalls.
Cause: FP2 — manifest updated before all segments uploaded.
Safe response: Gate published on validation job; integration test playlist completeness.
8.3 Stale CDN segment (wrong bytes)
Symptom: Glitch or wrong scene after re-encode fix.
Cause: FP5 — same URL path reused for new encode.
Safe response: Versioned asset path (/v3/segment.ts); never overwrite in place.
8.4 Token expiry mid-playback
Symptom: Movie stops with auth error near end.
Cause: FP6 — short signed URL TTL.
Safe response: Refresh token API; separate long-lived session vs short segment cookies if needed.
8.5 DRM license failure
Symptom: 4K TV cannot play; mobile works.
Cause: FP7 — device security level; license server region down.
Safe response: Clear lower rung fallback; clear error UX; multi-region license HA.
8.6 ABR thrashing
Symptom: Quality ping-pongs 240p ↔ 1080p; battery drain.
Cause: FP8 — noisy throughput on mobile.
Safe response: Minimum 10–20 s between switches; buffer-based rules.
8.7 Regional blackout mismatch
Symptom: Title visible in browse but playback 403.
Cause: Catalog cache stale vs playback geo check.
Safe response: Single source for availability; include playable flag in browse API per country.
8.8 Upload corruption
Symptom: Silent audio on one episode only.
Cause: FP1 — incomplete multipart.
Safe response: Checksum on complete; automated QC (loudness, black frames).
8.9 Progress loss on device switch
Symptom: Phone shows start; TV had 80% done.
Cause: FP10 — debounced write not flushed; per-device keys.
Safe response: Merge progress by max(position) per user+title across devices.
8.10 PoP / ISP congestion
Symptom: One ISP users rebuffer; others fine.
Cause: FP9 — last-mile not your CDN alone.
Safe response: Lower initial rendition; multi-CDN; partner caching; QoE analytics by ASN.
| Failure mode | Primary failure points | User impact | Core mitigation |
|---|---|---|---|
| Cache miss storm | FP3, FP4 | Buffering | Prefetch + origin shield |
| Partial transcode | FP2 | Broken quality level | Atomic publish gate |
| Stale CDN bytes | FP5 | Glitches | Versioned paths |
| Token expiry | FP6 | Playback stops | Token refresh |
| DRM failure | FP7 | Cannot play | HA license + fallback |
| ABR thrashing | FP8 | Poor QoE | Hysteresis |
| Geo mismatch | FP3 | 403 surprise | Unified availability |
| Upload corruption | FP1 | Bad asset | Checksum + QC |
| Progress loss | FP10 | Bad resume | Cross-device merge |
| ISP congestion | FP9 | Regional rebuffer | ABR + multi-CDN |
9. Scalability, availability, and security
9.1 Scalability
- Offload bytes to CDN; scale catalog/playback APIs horizontally.
- Transcode is batch/async—scale GPU workers independently of peak viewership.
- Partition analytics by time; don’t block playback path.
- Popular content: replicate metadata in Redis; edge dictionary for “top 10” lists.
9.2 Availability
- Multi-region origin replication; cross-region replication for masters.
- CDN is the availability backbone—monitor per-PoP error rate.
- Degraded mode: serve lower rungs only if 4K origin path unhealthy.
9.3 Security
- Signed URLs; short-lived playback sessions; bind to device fingerprint lightly.
- DRM for premium; watermarking (forensic) for screeners optional.
- Geo-blocking at playback API and CDN edge.
- Rate limit ingest APIs; virus scan on uploads.
10. Tradeoffs recap
| Decision | Common choice | Why |
|---|---|---|
| HLS vs DASH | Both for max devices | Platform player support |
| Segment length | 4 s | Balance startup vs request overhead |
| More renditions | 5–6 rungs | Smoother ABR; more storage/transcode cost |
| Push vs pull CDN | Pull with prefetch for hits | Cost effective at scale |
| Strong vs eventual catalog | Eventual for browse OK | Playback authz must be correct now |
11. How to present this in 45 minutes
- 5 min — clarify VOD vs live; functional requirements; out of scope.
- 7 min — capacity: concurrent viewers, Tbps egress, segment RPS, storage.
- 8 min — diagram: upload → transcode → origin → CDN → player ABR.
- 8 min — encoding ladder, HLS manifest, signed URLs, regional catalog.
- 10 min — failure points + failure modes (cache miss storm, partial transcode, token expiry).
- 7 min — DRM optional, tradeoffs, extensions (live, recommendations).
The one line to remember
Video streaming at scale is write-heavy once, read-heavy forever: transcode asynchronously into an immutable segment ladder, push bytes through a CDN, and let the client’s ABR adapt—while you guard the origin from cache miss storms and never publish a manifest until every rendition is real.