How the X (Twitter) timeline works at scale

A social feed looks like an infinite scroll of posts. At scale it is a write fan-out problem: one short post must appear in millions of personalized lists within seconds, without melting databases every time a celebrity speaks.

We work through the design in order—requirements first, numbers second, architecture third, APIs last—using an X/Twitter-class timeline as the mental model, not any one company’s private implementation.

What you should be able to do after reading:

Separate the three loops—publish, read, and social graph—and say what each stores.
List functional and non-functional requirements for posters, readers, and the platform.
Walk one tweet from post → fan-out → home timeline read → hydrate cards.
Explain fan-out on write vs fan-out on read and the celebrity hybrid.
Read the technical section: tweet ids, timeline caches, and timeline REST APIs.

Step 0 — How we will work through the problem

Ordered thinking beats memorizing a box diagram. Use this sequence when you design a news feed:

Clarify scope. Home “Following” only, or algorithmic “For You” too? Replies, quotes, communities, DMs out of scope?
Write requirements. Functional = post, follow, scroll. Non-functional = read latency, write propagation, abuse resistance.
Do napkin math. Posts per second, average followers, timeline size—so fan-out cost is visible before you pick Redis.
Draw three loops before naming Kafka or Cassandra.
Tell one story—user posts, follower opens app—then the celebrity case and a deleted tweet.

flowchart TB
  subgraph publish [Publish loop]
    POST[Create post] --> TWEET[(Tweet store)]
    POST --> FAN[Fan-out workers]
    FAN --> CACHE[("Timeline caches")]
  end
  subgraph read [Read loop]
    APP[Open home feed] --> TL[Timeline service]
    TL --> CACHE
    TL --> HYDRATE[Hydrate posts]
    HYDRATE --> TWEET
  end
  subgraph graph [Graph loop]
    FOL[Follow / unfollow] --> G[(Social graph)]
    G --> FAN
  end

Step 1 — Functional requirements (posters, readers, platform)

Actor	Requirement	Why scale makes it hard
Poster	Compose text, images, video, polls; post or schedule	Media upload path + async transcode
Poster	Reply, quote, repost; thread chains	Conversation id grouping; extra fan-out edges
Reader	Home timeline (Following + optional ranked For You)	Precomputed lists + ML ranker
Reader	Profile timeline (one user’s posts)	Cheaper: single author index
Reader	List / community timelines	Custom graph slices
Reader	Infinite scroll with cursored pagination	Stable cursors across concurrent writes
Reader	See engagement counts; like/repost/bookmark	Hot counter shards; idempotent actions
Social	Follow, mute, block; see who you follow	Graph writes invalidate caches
Platform	Delete post; moderation; visibility filters	Tombstone in all fan-out copies
Platform	Search, trends, notifications	Separate indexes fed by event bus

Functional details worth stating clearly

Post id is global and sortable. Snowflake-style ids give rough time order and shard keys.

Timeline is a list of ids, not full post bodies—hydration is a second step.

Out of scope today (say it aloud). Full DM architecture, ads auction, or training the ranking model from scratch—park them.

Step 2 — Non-functional requirements (engineering promises)

Category	Target (typical)	How we meet it	If we miss it
Latency — home feed read	p95 < 200 ms first page	Precomputed Redis timelines + parallel hydrate	Users churn to competitors
Latency — post visible to followers	Seconds for normal accounts	Fan-out workers, async for huge accounts	“Broken real-time” perception
Availability — read path	99.9%+ monthly	Cache replicas, degrade to profile-only	Global feed outage
Consistency — counts	Eventual OK for likes	Counter service + periodic reconcile	Wrong number briefly—not fatal
Consistency — delete	Post must disappear from feeds	Tombstone + cache purge jobs	Moderation failure
Throughput — writes	Tens of thousands posts/s peak	Sharded tweet store, partitioned fan-out queue	Publish backlog
Storage	Years of posts per user	Cold tier, compaction of old timelines	Runaway storage bill
Abuse	Rate limits, spam detection	Edge throttles + ML signals	Timeline becomes unusable

Key idea: Reads are cache-friendly; writes are fan-out expensive. Design differently for users with 50 followers vs 50 million.

Step 3 — Napkin math (why fan-out keeps you up at night)

~500M+ daily active users (order of magnitude for X-class products).
Suppose 500M posts per day → ~6k new posts/s average; spikes during live events much higher.
Average account might have 200 followers; median far lower, mean skewed by celebrities.
Naive fan-out: 6k posts/s × 200 followers ≈ 1.2M timeline writes/s average—Redis sorted-set inserts, not one SQL INSERT.
One celebrity post with 50M followers: 50M writes if done naively—must use hybrid fan-out (below).
Home timeline might keep 800–1000 recent tweet ids per user in cache; older pages pull from colder storage.

Step 4 — Architecture: three loops

Clients hit an edge API. Graph service owns follow edges. Tweet service stores canonical post rows. Timeline service maintains per-user caches (Redis sorted sets: score = time or rank). Fan-out workers consume post.created from Kafka and push tweet ids into follower caches. Hydration batch-loads tweet bodies, authors, and media for the ids returned to the app.

flowchart TB
  subgraph clients [Clients]
    IOS[iOS / Android]
    WEB[Web]
  end
  subgraph edge [Edge]
    LB[Load balancer]
    API[API / BFF]
  end
  subgraph write [Write path]
    TW[Tweet service]
    K[("Kafka events")]
    FO[Fan-out workers]
  end
  subgraph read [Read path]
    TL[Timeline service]
    RED[("Redis timelines")]
    HY[Hydration]
  end
  subgraph graph [Graph]
    GR[Graph service]
    GDB[(Graph store)]
  end
  IOS --> LB --> API
  WEB --> LB
  API --> TW --> K --> FO
  FO --> RED
  API --> TL --> RED
  TL --> HY --> TW
  API --> GR --> GDB
  GR --> FO

Step 5 — Walk one post from publish to home feed

Post — client POST /2/tweets with text + optional media_ids; API validates, assigns tweet_id (time-ordered).
Persist — tweet row written to sharded store (Cassandra/MySQL shard by author_id or tweet_id).
Event — post.created published to Kafka with author_id, tweet_id, visibility flags.
Fan-out — worker loads follower list (or cache slice); for each “normal” follower, ZADD home:{user_id} score tweet_id; trim timeline to max length.
Celebrity path — if author over threshold, skip full fan-out; insert into followers’ timelines on read merge instead.
Read — client GET /2/timeline/home; timeline service reads top N ids from Redis; hydration fetches tweet + user + media in parallel.
Rank (optional) — For You mixer reorders hydrated candidates with ML scores before response.
Delete — tombstone tweet; fan-out purge job removes id from caches; hydration filters deleted.

sequenceDiagram
  participant U as Poster
  participant API as Edge API
  participant T as Tweet store
  participant K as Event bus
  participant F as Fan-out
  participant R as Redis timeline
  participant V as Viewer
  U->>API: POST tweet
  API->>T: persist
  API->>K: post.created
  K->>F: consume
  F->>R: ZADD for each follower
  V->>API: GET home
  API->>R: ZREVRANGE ids
  API->>T: hydrate batch
  API-->>V: feed cards

Step 6 — Fan-out on write vs fan-out on read

Strategy	Write cost	Read cost	Best when
Fan-out on write	High at post time	Low O(1) cache read	Most users; few followers each
Fan-out on read	Low at post time	High merge at read	Celebrities; huge follower counts
Hybrid	Mixed	Mixed	Production default at scale

Hybrid rule of thumb: if followers < 10k fan-out on write; if followers > 1M treat author as “celebrity” and merge at read from their recent posts cache. Between thresholds—product tuning and measured worker lag.

Step 7 — Celebrity accounts and hot keys

Celebrity list — curated or computed; fan-out worker skips mass ZADD.
Read merge — when building home timeline, union cached ids with recent posts from followed celebrities (small set per user).
Hot key mitigation — shard Redis timelines by user_id; replicate celebrity tweet cache read-only across regions.
Rate limits — cap posts per minute even for verified accounts during abuse spikes.

Sanity check: If fan-out queue lag spikes only when a few accounts post, you likely need more celebrity detection—not bigger Redis alone.

Step 8 — Timeline storage: Redis and cursors

Each user’s home timeline is often a sorted set: member = tweet_id, score = timestamp or rank. ZREVRANGE returns newest first. Cap length (e.g. 1000) with ZREMRANGEBYRANK after each add.

Pagination cursor — return max_id / since_id (tweet id boundaries) so clients page without OFFSET scans. Stable under concurrent inserts because ids are monotonic-ish.

ZADD home:uid_42 1716123456789 tw_998877
ZREVRANGE home:uid_42 0 19 WITHSCORES
-- cursor for next page: max_id = tw_998800

Step 9 — Social graph service

Store directed edges (follower_id → followee_id) with metadata (created_at, notifications on). Follow triggers: increment counts, warm timeline (optional backfill of recent posts), invalidate graph cache. Unfollow stops future fan-out; does not always remove historical ids (product choice). Block/mute filters applied at fan-out or hydration so harmful content never enters timeline assembly.

Graph may live in a dedicated store (FlockDB-style, or sharded SQL) with fan-out workers reading follower lists in chunks.

Step 10 — Ranking: Following vs For You

Following (chronological) — order by tweet id / timestamp from merged fan-out cache; simple and explainable.

For You (algorithmic) — candidate generation (who you might care about) + scoring model (engagement probability, diversity, freshness). Often a separate mixer service runs after hydration or on ids only for speed.

Features: past likes, dwell time, author affinity, social graph distance, toxicity scores.
Guardrails: inject follow graph posts, cap consecutive posts from one author, downrank duplicate media.
Offline training on engagement logs; online A/B infra for model versions.

Step 11 — Engagement, replies, and counters

Likes and reposts are separate writes from the tweet body—idempotent (user_id, tweet_id) keys. Counters may use Redis INCR with async flush to SQL or Cassandra for durability. Replies attach conversation_id and in_reply_to_tweet_id; thread view is another timeline type (conversation tree or flat with root).

Step 12 — Media, cards, and hydration

Upload media to object storage; processing service generates variants (thumbnail, HLS). Tweet row stores media_ids only. Hydration batch-gets tweets, users, media metadata in parallel (single RPC multi-get pattern). Missing tweet (deleted) → skip slot or show “unavailable” placeholder.

Step 13 — Search, trends, and notifications (adjacent loops)

Search — inverted index (Elasticsearch/Lucene) fed by post.created; different SLO than home timeline. Trends — aggregate hashtag/entity counts in streaming window (Flink/Storm-class). Push — notification service consumes same events; fan-out to device tokens with per-user prefs. Keep these off the critical home timeline read path.

Step 14 — Scale, sharding, and multi-region

Shard tweets by tweet_id or author_id; co-locate author profile + their tweets when possible.
Partition fan-out queues by author_id so one celebrity does not block all workers.
Regional timelines — users primarily read from regional Redis; cross-region replication for disaster recovery.
Cold storage — timelines older than N days served from object store or wide-column scans, not hot Redis.

Step 15 — Technical layer: APIs and payloads

Operation	HTTP	Success	Notes
Create tweet	`POST /2/tweets`	`201`	Body: `text`, optional `media.media_ids`, `reply` settings
Home timeline	`GET /2/timeline/home?max_results=20&pagination_token=…`	`200`	Returns hydrated `data[]` + `meta.next_token`
User timeline	`GET /2/users/{id}/tweets`	`200`	Author index; cheaper than home merge
Delete tweet	`DELETE /2/tweets/{id}`	`200`	Triggers tombstone + cache purge
Like	`POST /2/users/{uid}/likes`	`201`	Idempotent like on duplicate

Create tweet (illustrative JSON):

POST /2/tweets
Authorization: Bearer …
Content-Type: application/json

{
  "text": "Three loops: publish, read, graph.",
  "media": { "media_ids": ["m_abc123"] }
}

→ 201
{
  "data": {
    "id": "1847263920182345728",
    "text": "Three loops: publish, read, graph.",
    "author_id": "u_991"
  }
}

Logical tables

tweets(id, author_id, text, created_at, conversation_id, in_reply_to_id, deleted)
users(id, handle, display_name, …)
follows(follower_id, followee_id, created_at)
timelines(user_id, tweet_id, score)  -- often Redis, not SQL
likes(user_id, tweet_id, created_at)
media(id, owner_id, object_key, type, status)

Step 16 — Reliability, observability, and failure modes

Failure modes

Fan-out lag — followers see delay; monitor consumer lag; scale workers; celebrity bypass.
Stale cache after unfollow — TTL + explicit invalidation on graph change.
Hydration partial failure — return partial feed with retry hints; never 500 entire page for one missing tweet.
Thundering herd on viral post — rate limit reads on single tweet id; CDN for media.

Observability

Trace: post → Kafka → fan-out duration → first follower timeline size.
Metrics: fan-out writes/s, Redis p95, hydration batch size, home timeline p95, queue lag per partition.
SLO: 99.9% home reads < 300 ms; 95% normal-account posts visible to followers within 5 s.

Step 17 — Goals → knobs (quick reference)

Goal	Knob
Feed feels instant	Fan-out on write, Redis timelines, parallel hydration
Survive celebrities	Hybrid fan-out, read merge, dedicated queues
Relevant For You	Candidate + rank services, guardrails, fresh feature pipeline
Deletes stick	Tombstones, cache purge workers, filter at hydrate
Cost under control	Trim timeline length, tier cold data, batch fan-out writes

Step 18 — Close the loop (what to practice)

On a whiteboard: three loops, one post, one home read; mark where Redis vs tweet store vs graph store sit.

Out loud: when you fan-out on write vs read; give a follower count threshold.

With the technical section: trace POST /2/tweets through Kafka fan-out to GET /2/timeline/home.

The one line to remember

The timeline is a cached list of tweet ids per user, filled by publish-time fan-out (except celebrities) and turned into UI by hydration. Optimize writes for the long tail of small accounts; optimize reads for everyone; never fan-out 50 million Redis writes for one post.