How Spotify works at scale

Music streaming looks like “press play and hear a song.” Behind that button are three hard problems: a catalog of licensed metadata for tens of millions of tracks, a delivery path that serves audio bytes with almost no buffering on bad networks, and a personalization layer that turns listening history into the right next song.

We work through the design in order—requirements first, numbers second, architecture third, APIs last—using a Spotify-class product as the mental model, not any one company’s private implementation.

What you should be able to do after reading:

Separate the three loops—catalog, delivery, personalization—and assign stores to each.
List functional and non-functional requirements for listeners, rightsholders, and the platform.
Walk one play: resolve track → get stream URL → CDN range fetch → playback event → ranker feedback.
Explain multi-bitrate encoding, offline encryption, and why search and home are different services.
Read the technical section: Web API, playback tokens, and event schemas.

Step 0 — How we will work through the problem

Ordered thinking beats memorizing a logo slide. Use this sequence when you design audio streaming:

Clarify scope. On-demand only, or radio/podcasts? Social features? Hi-fi tier? Offline on mobile only?
Write requirements. Functional = play, search, playlists. Non-functional = startup latency, rebuffer rate, royalty reporting.
Do napkin math. Catalog size, concurrent streams, MB/min per bitrate tier—so CDN egress is not a surprise.
Draw three loops before naming Cassandra or Kafka.
Tell one story—user taps play on home recommendation—then skip, offline, and region-blocked track.

flowchart LR
  subgraph catalog [Catalog loop]
    ING[Ingest masters] --> ENC[Encode ladders]
    ENC --> META[(Metadata DB)]
  end
  subgraph delivery [Delivery loop]
    PLAY[Play request] --> CDN[Audio CDN]
    CDN --> CLIENT[Client buffer]
  end
  subgraph personal [Personalization loop]
    EVT[Listening events] --> FEAT[Feature store]
    FEAT --> RANK[Home / mixes ranker]
    RANK --> PLAY
  end
  META --> PLAY

Step 1 — Functional requirements (listeners, catalog, business)

Actor	Requirement	Why scale makes it hard
Listener	Search artists, albums, tracks, podcasts	Low-latency full-text + popularity signals
Listener	Play, pause, seek, skip, queue, crossfade	Stateful session; gapless handoff
Listener	Home, Discover, Daily Mix, radio stations	Per-user ML ranking at request time
Listener	Create/share playlists; collaborative edits	CRUD + social graph edges
Listener	Offline downloads (premium)	Encrypted local files + license expiry
Listener	Connect to devices (speaker, TV, car)	Multiple active endpoints; volume sync
Catalog	Ingest new releases; takedowns by region	Rights matrix per track × territory
Business	Royalty reporting, ads on free tier	Accurate play counts; fraud detection
Artist	Upload via distributor; view stats	Separate pipeline from consumer play path

Functional details worth stating clearly

Playable ≠ in catalog. A track row may exist but be greyed out in your country—rights are a filter at play time.

Stream URL is short-lived. Clients refresh playback tokens; URLs are not permanent deep links to MP3 files.

Out of scope today (say it aloud). Building a global music licensing body, or lossless studio mastering pipeline from scratch—park them.

Step 2 — Non-functional requirements (engineering promises)

Category	Target (typical)	How we meet it	If we miss it
Latency — time to first byte	< 200–500 ms after tap	CDN edge, warm connections, small manifest	Users think app is broken
Quality — rebuffer rate	Very low % of listening time	ABR ladder, client buffer, CDN capacity	Churn on cellular
Availability — play API	99.9%+ monthly	Multi-region metadata, CDN failover	Global outage memes
Correctness — play counts	Auditable for royalties	Idempotent play events, dedupe rules	Legal disputes
Freshness — home feed	Update daily/hourly mixes	Batch + streaming feature pipelines	Stale recommendations
Cost	CDN egress dominates	Efficient codecs, peer-assisted optional, cache hit ratio	Unsustainable unit economics
Privacy	GDPR delete/export	User data partitioning, event retention TTL	Regulatory fines

Key idea: Bytes are expensive; metadata and rankings are cheap per request. Optimize delivery and encode ladders before buying more recommendation GPUs.

Step 3 — Napkin math (catalog, streams, and egress)

~100M+ tracks in catalog (order of magnitude including duplicates/mapping).
~600M+ monthly active listeners; peak concurrent streams in the tens of millions globally.
128 kbps ≈ 1 MB/min; 320 kbps ≈ 2.4 MB/min. 1 hour at 160 kbps ≈ 70 MB egress per user-hour from CDN edge.
10M simultaneous streams × 1 Mbps average ≈ 10 Tbps aggregate delivery—CDN and peering, not one origin server.
Metadata row per track is small (KB); cover art and audio files live in object storage + CDN.

Step 4 — Architecture: three loops

Catalog services own canonical track ids, album/artist graph, rights by territory. Playback service checks entitlements, returns signed CDN URLs or edge manifest. Personalization consumes listening events (Kafka), updates features, serves ranked lists to home/radio APIs. Clients are thick: cache, decode, ABR, offline vault.

flowchart TB
  subgraph clients [Clients]
    APP[Mobile / desktop / web]
  end
  subgraph edge [Edge APIs]
    GW[API gateway]
    SRCH[Search]
    HOME[Home / playlists]
    PLAY[Playback]
  end
  subgraph catalog [Catalog]
    CAT[(Metadata store)]
    RIGHTS[Rights engine]
    OBJ[(Audio object store)]
  end
  subgraph delivery [Delivery]
    CDN[CDN / edge caches]
  end
  subgraph ml [Personalization]
    K[("Event bus")]
    FS[Feature store]
    REC[Rankers]
  end
  APP --> GW
  GW --> SRCH --> CAT
  GW --> HOME --> REC
  REC --> FS
  GW --> PLAY --> RIGHTS
  PLAY --> CDN
  OBJ --> CDN
  APP --> CDN
  APP --> K
  K --> FS --> REC

Step 5 — Walk one play end to end

Home — app loads ranked shelf from GET /v1/views/home (personalized track uris + context).
User taps track — client calls PUT /v1/me/player/play with uris or context uri.
Playback service resolves track id → internal audio file ids; rights checks user country + subscription tier.
Stream manifest — returns available bitrates (96/160/320 kbps OGG/AAC) and signed URL or token for CDN host.
CDN — client HTTP range-requests segments; ABR picks rung based on bandwidth and buffer.
Events — client sends play.start, play.progress (30s threshold for royalty), skip to event pipeline.
Feedback — stream processors update user taste profile; tomorrow’s Discover mix reflects today’s skips.

sequenceDiagram
  participant C as Client
  participant P as Playback API
  participant R as Rights
  participant CDN as CDN
  participant E as Events
  C->>P: start play track_uri
  P->>R: territory + tier OK?
  R-->>P: allowed
  P-->>C: stream URLs + formats
  C->>CDN: GET audio segment
  CDN-->>C: bytes
  C->>E: play.progress 30s

Step 6 — Catalog, metadata, and rights

Canonical graph: Artist → Album → Track with ISRC identifiers mapping distributor uploads to one logical track. Rights table: (track_id, territory) → allow | block | window. Takedowns propagate to search index and invalidate cached stream manifests within minutes.

Artwork and credits are metadata; audio masters are separate blobs referenced by file_id.

Step 7 — Encoding ladders and storage

Ingest receives masters (WAV/FLAC); transcode farm produces a ladder of bitrates and codecs (historically Vorbis in Ogg; AAC for some devices). Loudness normalization (EBU R128) keeps perceived volume consistent.

Store segments or whole files per bitrate in object storage with checksum.
Version files when re-encoded; playback points at active version id.
Podcasts may use separate host/CDN policy (long files, different ads).

Step 8 — CDN delivery and signed URLs

Origin is object storage; CDN caches hot tracks at edge POPs near users. Signed URLs include expiry and HMAC so links cannot be shared forever. HTTP Range requests enable seek without downloading entire file.

Sanity check: If only origin serves traffic, egress bill and latency explode—CDN hit ratio is a first-class metric.

Step 9 — Client playback: buffer, ABR, and gapless

Buffer target — several seconds ahead; rebuffer when buffer drains.
ABR — switch bitrate up/down based on throughput estimate; avoid oscillation (hysteresis).
Gapless / crossfade — prefetch next track; align encoder delay metadata.
Background audio — OS audio session rules on iOS/Android; handle interruptions (calls).

Step 10 — Search and browse

Search combines inverted index (artist/title aliases), fuzzy match, and popularity boosts. Browse categories are editorial playlists + rules—not full ML rank on every shelf. Search must respect rights: hide unplayable tracks or label “unavailable in your region.”

Step 11 — Recommendations: home, Discover, radio

Candidate generation — collaborative filtering (“users like you”), content features (genre, tempo), graph walks on follow data. Ranking — ML model scores candidates with context (time of day, device, recent skips). Filters — diversity (not 20 songs same artist), freshness, policy blocks.

Discover Weekly — batch job weekly per user; expensive; precomputed playlist uri.
Radio — seed track/artist; infinite queue via related-artist graph + ranker.
Events — skip < 30s negative signal; full listen positive.

Step 12 — Playlists, social, and collaboration

Playlists are ordered lists of track uris + metadata (name, cover collage). Collaborative playlists need conflict resolution on reorder (last-write-wins or OT-lite). Social: follow friends, blend playlists, activity feed—lower QPS than play path.

Step 13 — Offline downloads and DRM

Premium offline: download encrypted blobs + license file bound to device/user; periodic phone-home to renew or expire. Storage quota per device; evict LRU when full. Offline play still emits events when back online (batch upload).

Step 14 — Events, royalties, and analytics

High-volume play events on Kafka → stream processing → data warehouse. Royalty allocation uses country, rightsholder share, subscription vs ad-supported rates. Fraud: bot detection on abnormal play patterns (same track loop farms).

{
  "event": "play.progress",
  "user_id": "u_…",
  "track_id": "t_…",
  "ms_played": 30000,
  "context_uri": "spotify:playlist:…",
  "timestamp": "2026-05-19T12:00:00Z",
  "country": "DE",
  "product": "premium"
}

Step 15 — Technical layer: APIs and playback

Operation	HTTP	Notes
Search	`GET /v1/search?q=…&type=track`	OAuth bearer token
Get track	`GET /v1/tracks/{id}`	Metadata + `is_playable`
Start playback	`PUT /v1/me/player/play`	Body: `uris` or `context_uri`
Player state	`GET /v1/me/player`	Active device, progress_ms
Transfer device	`PUT /v1/me/player`	Move playback to speaker

Note: Public Web API controls the logical player; actual audio bytes come from separate CDN hosts returned by internal playback services—not from api.spotify.com directly.

Logical stores

tracks(id, isrc, title, duration_ms, album_id, …)
rights(track_id, territory, allowed, valid_from, valid_to)
audio_files(track_id, bitrate, codec, storage_key, version)
playlists(id, owner_id, collaborative, …)
playlist_entries(playlist_id, position, track_id)
listening_events(user_id, track_id, ms, ts)  -- warehouse / stream

Step 16 — Reliability, observability, and failure modes

CDN miss / origin overload — scale POP capacity; pre-warm new releases.
Rights bug — track playable in search but fails play; strict single rights check in playback path.
Stale recommendations — monitor feature pipeline lag; fallback to editorial charts.
Device token expiry — refresh OAuth; graceful re-auth without killing audio mid-song when possible.

Metrics: TTFB, rebuffer ratio, skip rate, CDN hit ratio, event lag, search p95, home API p95.

Step 17 — Goals → knobs (quick reference)

Goal	Knob
Instant play	CDN, edge auth, prefetch next track, efficient codec
Smooth on 3G	Lower default bitrate, larger buffer, ABR conservatism
Great recommendations	Rich events, feature store freshness, ranker experiments
Correct royalties	30s rule, idempotent events, fraud models
Lower egress cost	Codec efficiency, cache ratio, limit hi-fi default on cellular

Step 18 — Close the loop (what to practice)

On a whiteboard: three loops, one play from home shelf to CDN bytes; mark where rights and events sit.

Out loud: why stream URLs expire; difference between catalog search and personalized home.

With the technical section: trace PUT /me/player/play and the parallel CDN fetch path.

The one line to remember

Spotify-class systems split metadata + rights, CDN audio delivery, and event-driven personalization. The play button is a rights check and a signed URL—not a database row with an MP3 column—and every skip teaches the ranker what to play next.