How the X (Twitter) timeline works at scale
A social feed looks like an infinite scroll of posts. At scale it is a write fan-out problem: one short post must appear in millions of personalized lists within seconds, without melting databases every time a celebrity speaks.
We work through the design in order—requirements first, numbers second, architecture third, APIs last—using an X/Twitter-class timeline as the mental model, not any one company’s private implementation.
What you should be able to do after reading:
- Separate the three loops—publish, read, and social graph—and say what each stores.
- List functional and non-functional requirements for posters, readers, and the platform.
- Walk one tweet from post → fan-out → home timeline read → hydrate cards.
- Explain fan-out on write vs fan-out on read and the celebrity hybrid.
- Read the technical section: tweet ids, timeline caches, and timeline REST APIs.
Step 0 — How we will work through the problem
Ordered thinking beats memorizing a box diagram. Use this sequence when you design a news feed:
- Clarify scope. Home “Following” only, or algorithmic “For You” too? Replies, quotes, communities, DMs out of scope?
- Write requirements. Functional = post, follow, scroll. Non-functional = read latency, write propagation, abuse resistance.
- Do napkin math. Posts per second, average followers, timeline size—so fan-out cost is visible before you pick Redis.
- Draw three loops before naming Kafka or Cassandra.
- Tell one story—user posts, follower opens app—then the celebrity case and a deleted tweet.
flowchart TB
subgraph publish [Publish loop]
POST[Create post] --> TWEET[(Tweet store)]
POST --> FAN[Fan-out workers]
FAN --> CACHE[("Timeline caches")]
end
subgraph read [Read loop]
APP[Open home feed] --> TL[Timeline service]
TL --> CACHE
TL --> HYDRATE[Hydrate posts]
HYDRATE --> TWEET
end
subgraph graph [Graph loop]
FOL[Follow / unfollow] --> G[(Social graph)]
G --> FAN
end
Step 1 — Functional requirements (posters, readers, platform)
| Actor | Requirement | Why scale makes it hard |
|---|---|---|
| Poster | Compose text, images, video, polls; post or schedule | Media upload path + async transcode |
| Poster | Reply, quote, repost; thread chains | Conversation id grouping; extra fan-out edges |
| Reader | Home timeline (Following + optional ranked For You) | Precomputed lists + ML ranker |
| Reader | Profile timeline (one user’s posts) | Cheaper: single author index |
| Reader | List / community timelines | Custom graph slices |
| Reader | Infinite scroll with cursored pagination | Stable cursors across concurrent writes |
| Reader | See engagement counts; like/repost/bookmark | Hot counter shards; idempotent actions |
| Social | Follow, mute, block; see who you follow | Graph writes invalidate caches |
| Platform | Delete post; moderation; visibility filters | Tombstone in all fan-out copies |
| Platform | Search, trends, notifications | Separate indexes fed by event bus |
Functional details worth stating clearly
Post id is global and sortable. Snowflake-style ids give rough time order and shard keys.
Timeline is a list of ids, not full post bodies—hydration is a second step.
Out of scope today (say it aloud). Full DM architecture, ads auction, or training the ranking model from scratch—park them.
Step 2 — Non-functional requirements (engineering promises)
| Category | Target (typical) | How we meet it | If we miss it |
|---|---|---|---|
| Latency — home feed read | p95 < 200 ms first page | Precomputed Redis timelines + parallel hydrate | Users churn to competitors |
| Latency — post visible to followers | Seconds for normal accounts | Fan-out workers, async for huge accounts | “Broken real-time” perception |
| Availability — read path | 99.9%+ monthly | Cache replicas, degrade to profile-only | Global feed outage |
| Consistency — counts | Eventual OK for likes | Counter service + periodic reconcile | Wrong number briefly—not fatal |
| Consistency — delete | Post must disappear from feeds | Tombstone + cache purge jobs | Moderation failure |
| Throughput — writes | Tens of thousands posts/s peak | Sharded tweet store, partitioned fan-out queue | Publish backlog |
| Storage | Years of posts per user | Cold tier, compaction of old timelines | Runaway storage bill |
| Abuse | Rate limits, spam detection | Edge throttles + ML signals | Timeline becomes unusable |
Key idea: Reads are cache-friendly; writes are fan-out expensive. Design differently for users with 50 followers vs 50 million.
Step 3 — Napkin math (why fan-out keeps you up at night)
- ~500M+ daily active users (order of magnitude for X-class products).
- Suppose 500M posts per day → ~6k new posts/s average; spikes during live events much higher.
- Average account might have 200 followers; median far lower, mean skewed by celebrities.
- Naive fan-out: 6k posts/s × 200 followers ≈ 1.2M timeline writes/s average—Redis sorted-set inserts, not one SQL
INSERT. - One celebrity post with 50M followers: 50M writes if done naively—must use hybrid fan-out (below).
- Home timeline might keep 800–1000 recent tweet ids per user in cache; older pages pull from colder storage.
Step 4 — Architecture: three loops
Clients hit an edge API. Graph service owns follow edges.
Tweet service stores canonical post rows. Timeline service maintains per-user caches (Redis sorted sets: score = time or rank).
Fan-out workers consume post.created from Kafka and push tweet ids into follower caches.
Hydration batch-loads tweet bodies, authors, and media for the ids returned to the app.
flowchart TB
subgraph clients [Clients]
IOS[iOS / Android]
WEB[Web]
end
subgraph edge [Edge]
LB[Load balancer]
API[API / BFF]
end
subgraph write [Write path]
TW[Tweet service]
K[("Kafka events")]
FO[Fan-out workers]
end
subgraph read [Read path]
TL[Timeline service]
RED[("Redis timelines")]
HY[Hydration]
end
subgraph graph [Graph]
GR[Graph service]
GDB[(Graph store)]
end
IOS --> LB --> API
WEB --> LB
API --> TW --> K --> FO
FO --> RED
API --> TL --> RED
TL --> HY --> TW
API --> GR --> GDB
GR --> FO
Step 5 — Walk one post from publish to home feed
- Post — client
POST /2/tweetswith text + optionalmedia_ids; API validates, assignstweet_id(time-ordered). - Persist — tweet row written to sharded store (Cassandra/MySQL shard by
author_idortweet_id). - Event —
post.createdpublished to Kafka withauthor_id,tweet_id, visibility flags. - Fan-out — worker loads follower list (or cache slice); for each “normal” follower,
ZADD home:{user_id} score tweet_id; trim timeline to max length. - Celebrity path — if author over threshold, skip full fan-out; insert into followers’ timelines on read merge instead.
- Read — client
GET /2/timeline/home; timeline service reads top N ids from Redis; hydration fetches tweet + user + media in parallel. - Rank (optional) — For You mixer reorders hydrated candidates with ML scores before response.
- Delete — tombstone tweet; fan-out purge job removes id from caches; hydration filters deleted.
sequenceDiagram
participant U as Poster
participant API as Edge API
participant T as Tweet store
participant K as Event bus
participant F as Fan-out
participant R as Redis timeline
participant V as Viewer
U->>API: POST tweet
API->>T: persist
API->>K: post.created
K->>F: consume
F->>R: ZADD for each follower
V->>API: GET home
API->>R: ZREVRANGE ids
API->>T: hydrate batch
API-->>V: feed cards
Step 6 — Fan-out on write vs fan-out on read
| Strategy | Write cost | Read cost | Best when |
|---|---|---|---|
| Fan-out on write | High at post time | Low O(1) cache read | Most users; few followers each |
| Fan-out on read | Low at post time | High merge at read | Celebrities; huge follower counts |
| Hybrid | Mixed | Mixed | Production default at scale |
Hybrid rule of thumb: if followers < 10k fan-out on write; if followers > 1M treat author as “celebrity” and merge at read from their recent posts cache.
Between thresholds—product tuning and measured worker lag.
Step 7 — Celebrity accounts and hot keys
- Celebrity list — curated or computed; fan-out worker skips mass ZADD.
- Read merge — when building home timeline, union cached ids with recent posts from followed celebrities (small set per user).
- Hot key mitigation — shard Redis timelines by
user_id; replicate celebrity tweet cache read-only across regions. - Rate limits — cap posts per minute even for verified accounts during abuse spikes.
Sanity check: If fan-out queue lag spikes only when a few accounts post, you likely need more celebrity detection—not bigger Redis alone.
Step 8 — Timeline storage: Redis and cursors
Each user’s home timeline is often a sorted set: member = tweet_id, score = timestamp or rank.
ZREVRANGE returns newest first. Cap length (e.g. 1000) with ZREMRANGEBYRANK after each add.
Pagination cursor — return max_id / since_id (tweet id boundaries) so clients page without OFFSET scans.
Stable under concurrent inserts because ids are monotonic-ish.
ZADD home:uid_42 1716123456789 tw_998877 ZREVRANGE home:uid_42 0 19 WITHSCORES -- cursor for next page: max_id = tw_998800
Step 9 — Social graph service
Store directed edges (follower_id → followee_id) with metadata (created_at, notifications on).
Follow triggers: increment counts, warm timeline (optional backfill of recent posts), invalidate graph cache.
Unfollow stops future fan-out; does not always remove historical ids (product choice).
Block/mute filters applied at fan-out or hydration so harmful content never enters timeline assembly.
Graph may live in a dedicated store (FlockDB-style, or sharded SQL) with fan-out workers reading follower lists in chunks.
Step 10 — Ranking: Following vs For You
Following (chronological) — order by tweet id / timestamp from merged fan-out cache; simple and explainable.
For You (algorithmic) — candidate generation (who you might care about) + scoring model (engagement probability, diversity, freshness). Often a separate mixer service runs after hydration or on ids only for speed.
- Features: past likes, dwell time, author affinity, social graph distance, toxicity scores.
- Guardrails: inject follow graph posts, cap consecutive posts from one author, downrank duplicate media.
- Offline training on engagement logs; online A/B infra for model versions.
Step 11 — Engagement, replies, and counters
Likes and reposts are separate writes from the tweet body—idempotent (user_id, tweet_id) keys.
Counters may use Redis INCR with async flush to SQL or Cassandra for durability.
Replies attach conversation_id and in_reply_to_tweet_id; thread view is another timeline type (conversation tree or flat with root).
Step 12 — Media, cards, and hydration
Upload media to object storage; processing service generates variants (thumbnail, HLS).
Tweet row stores media_ids only. Hydration batch-gets tweets, users, media metadata in parallel (single RPC multi-get pattern).
Missing tweet (deleted) → skip slot or show “unavailable” placeholder.
Step 13 — Search, trends, and notifications (adjacent loops)
Search — inverted index (Elasticsearch/Lucene) fed by post.created; different SLO than home timeline.
Trends — aggregate hashtag/entity counts in streaming window (Flink/Storm-class).
Push — notification service consumes same events; fan-out to device tokens with per-user prefs.
Keep these off the critical home timeline read path.
Step 14 — Scale, sharding, and multi-region
- Shard tweets by
tweet_idorauthor_id; co-locate author profile + their tweets when possible. - Partition fan-out queues by
author_idso one celebrity does not block all workers. - Regional timelines — users primarily read from regional Redis; cross-region replication for disaster recovery.
- Cold storage — timelines older than N days served from object store or wide-column scans, not hot Redis.
Step 15 — Technical layer: APIs and payloads
| Operation | HTTP | Success | Notes |
|---|---|---|---|
| Create tweet | POST /2/tweets |
201 |
Body: text, optional media.media_ids, reply settings |
| Home timeline | GET /2/timeline/home?max_results=20&pagination_token=… |
200 |
Returns hydrated data[] + meta.next_token |
| User timeline | GET /2/users/{id}/tweets |
200 |
Author index; cheaper than home merge |
| Delete tweet | DELETE /2/tweets/{id} |
200 |
Triggers tombstone + cache purge |
| Like | POST /2/users/{uid}/likes |
201 |
Idempotent like on duplicate |
Create tweet (illustrative JSON):
POST /2/tweets
Authorization: Bearer …
Content-Type: application/json
{
"text": "Three loops: publish, read, graph.",
"media": { "media_ids": ["m_abc123"] }
}
→ 201
{
"data": {
"id": "1847263920182345728",
"text": "Three loops: publish, read, graph.",
"author_id": "u_991"
}
}
Logical tables
tweets(id, author_id, text, created_at, conversation_id, in_reply_to_id, deleted) users(id, handle, display_name, …) follows(follower_id, followee_id, created_at) timelines(user_id, tweet_id, score) -- often Redis, not SQL likes(user_id, tweet_id, created_at) media(id, owner_id, object_key, type, status)
Step 16 — Reliability, observability, and failure modes
Failure modes
- Fan-out lag — followers see delay; monitor consumer lag; scale workers; celebrity bypass.
- Stale cache after unfollow — TTL + explicit invalidation on graph change.
- Hydration partial failure — return partial feed with retry hints; never 500 entire page for one missing tweet.
- Thundering herd on viral post — rate limit reads on single tweet id; CDN for media.
Observability
- Trace: post → Kafka → fan-out duration → first follower timeline size.
- Metrics: fan-out writes/s, Redis p95, hydration batch size, home timeline p95, queue lag per partition.
- SLO: 99.9% home reads < 300 ms; 95% normal-account posts visible to followers within 5 s.
Step 17 — Goals → knobs (quick reference)
| Goal | Knob |
|---|---|
| Feed feels instant | Fan-out on write, Redis timelines, parallel hydration |
| Survive celebrities | Hybrid fan-out, read merge, dedicated queues |
| Relevant For You | Candidate + rank services, guardrails, fresh feature pipeline |
| Deletes stick | Tombstones, cache purge workers, filter at hydrate |
| Cost under control | Trim timeline length, tier cold data, batch fan-out writes |
Step 18 — Close the loop (what to practice)
On a whiteboard: three loops, one post, one home read; mark where Redis vs tweet store vs graph store sit.
Out loud: when you fan-out on write vs read; give a follower count threshold.
With the technical section: trace POST /2/tweets through Kafka fan-out to GET /2/timeline/home.
The one line to remember
The timeline is a cached list of tweet ids per user, filled by publish-time fan-out (except celebrities) and turned into UI by hydration. Optimize writes for the long tail of small accounts; optimize reads for everyone; never fan-out 50 million Redis writes for one post.