sharpbyte.dev

How the X (Twitter) timeline works at scale

A social feed looks like an infinite scroll of posts. At scale it is a write fan-out problem: one short post must appear in millions of personalized lists within seconds, without melting databases every time a celebrity speaks.

We work through the design in order—requirements first, numbers second, architecture third, APIs last—using an X/Twitter-class timeline as the mental model, not any one company’s private implementation.

What you should be able to do after reading:

Step 0 — How we will work through the problem

Ordered thinking beats memorizing a box diagram. Use this sequence when you design a news feed:

  1. Clarify scope. Home “Following” only, or algorithmic “For You” too? Replies, quotes, communities, DMs out of scope?
  2. Write requirements. Functional = post, follow, scroll. Non-functional = read latency, write propagation, abuse resistance.
  3. Do napkin math. Posts per second, average followers, timeline size—so fan-out cost is visible before you pick Redis.
  4. Draw three loops before naming Kafka or Cassandra.
  5. Tell one story—user posts, follower opens app—then the celebrity case and a deleted tweet.
flowchart TB
  subgraph publish [Publish loop]
    POST[Create post] --> TWEET[(Tweet store)]
    POST --> FAN[Fan-out workers]
    FAN --> CACHE[("Timeline caches")]
  end
  subgraph read [Read loop]
    APP[Open home feed] --> TL[Timeline service]
    TL --> CACHE
    TL --> HYDRATE[Hydrate posts]
    HYDRATE --> TWEET
  end
  subgraph graph [Graph loop]
    FOL[Follow / unfollow] --> G[(Social graph)]
    G --> FAN
  end
    

Step 1 — Functional requirements (posters, readers, platform)

ActorRequirementWhy scale makes it hard
PosterCompose text, images, video, polls; post or scheduleMedia upload path + async transcode
PosterReply, quote, repost; thread chainsConversation id grouping; extra fan-out edges
ReaderHome timeline (Following + optional ranked For You)Precomputed lists + ML ranker
ReaderProfile timeline (one user’s posts)Cheaper: single author index
ReaderList / community timelinesCustom graph slices
ReaderInfinite scroll with cursored paginationStable cursors across concurrent writes
ReaderSee engagement counts; like/repost/bookmarkHot counter shards; idempotent actions
SocialFollow, mute, block; see who you followGraph writes invalidate caches
PlatformDelete post; moderation; visibility filtersTombstone in all fan-out copies
PlatformSearch, trends, notificationsSeparate indexes fed by event bus

Functional details worth stating clearly

Post id is global and sortable. Snowflake-style ids give rough time order and shard keys.

Timeline is a list of ids, not full post bodies—hydration is a second step.

Out of scope today (say it aloud). Full DM architecture, ads auction, or training the ranking model from scratch—park them.

Step 2 — Non-functional requirements (engineering promises)

CategoryTarget (typical)How we meet itIf we miss it
Latency — home feed readp95 < 200 ms first pagePrecomputed Redis timelines + parallel hydrateUsers churn to competitors
Latency — post visible to followersSeconds for normal accountsFan-out workers, async for huge accounts“Broken real-time” perception
Availability — read path99.9%+ monthlyCache replicas, degrade to profile-onlyGlobal feed outage
Consistency — countsEventual OK for likesCounter service + periodic reconcileWrong number briefly—not fatal
Consistency — deletePost must disappear from feedsTombstone + cache purge jobsModeration failure
Throughput — writesTens of thousands posts/s peakSharded tweet store, partitioned fan-out queuePublish backlog
StorageYears of posts per userCold tier, compaction of old timelinesRunaway storage bill
AbuseRate limits, spam detectionEdge throttles + ML signalsTimeline becomes unusable

Key idea: Reads are cache-friendly; writes are fan-out expensive. Design differently for users with 50 followers vs 50 million.

Step 3 — Napkin math (why fan-out keeps you up at night)

Step 4 — Architecture: three loops

Clients hit an edge API. Graph service owns follow edges. Tweet service stores canonical post rows. Timeline service maintains per-user caches (Redis sorted sets: score = time or rank). Fan-out workers consume post.created from Kafka and push tweet ids into follower caches. Hydration batch-loads tweet bodies, authors, and media for the ids returned to the app.

flowchart TB
  subgraph clients [Clients]
    IOS[iOS / Android]
    WEB[Web]
  end
  subgraph edge [Edge]
    LB[Load balancer]
    API[API / BFF]
  end
  subgraph write [Write path]
    TW[Tweet service]
    K[("Kafka events")]
    FO[Fan-out workers]
  end
  subgraph read [Read path]
    TL[Timeline service]
    RED[("Redis timelines")]
    HY[Hydration]
  end
  subgraph graph [Graph]
    GR[Graph service]
    GDB[(Graph store)]
  end
  IOS --> LB --> API
  WEB --> LB
  API --> TW --> K --> FO
  FO --> RED
  API --> TL --> RED
  TL --> HY --> TW
  API --> GR --> GDB
  GR --> FO
    

Step 5 — Walk one post from publish to home feed

  1. Post — client POST /2/tweets with text + optional media_ids; API validates, assigns tweet_id (time-ordered).
  2. Persist — tweet row written to sharded store (Cassandra/MySQL shard by author_id or tweet_id).
  3. Eventpost.created published to Kafka with author_id, tweet_id, visibility flags.
  4. Fan-out — worker loads follower list (or cache slice); for each “normal” follower, ZADD home:{user_id} score tweet_id; trim timeline to max length.
  5. Celebrity path — if author over threshold, skip full fan-out; insert into followers’ timelines on read merge instead.
  6. Read — client GET /2/timeline/home; timeline service reads top N ids from Redis; hydration fetches tweet + user + media in parallel.
  7. Rank (optional) — For You mixer reorders hydrated candidates with ML scores before response.
  8. Delete — tombstone tweet; fan-out purge job removes id from caches; hydration filters deleted.
sequenceDiagram
  participant U as Poster
  participant API as Edge API
  participant T as Tweet store
  participant K as Event bus
  participant F as Fan-out
  participant R as Redis timeline
  participant V as Viewer
  U->>API: POST tweet
  API->>T: persist
  API->>K: post.created
  K->>F: consume
  F->>R: ZADD for each follower
  V->>API: GET home
  API->>R: ZREVRANGE ids
  API->>T: hydrate batch
  API-->>V: feed cards
    

Step 6 — Fan-out on write vs fan-out on read

StrategyWrite costRead costBest when
Fan-out on writeHigh at post timeLow O(1) cache readMost users; few followers each
Fan-out on readLow at post timeHigh merge at readCelebrities; huge follower counts
HybridMixedMixedProduction default at scale

Hybrid rule of thumb: if followers < 10k fan-out on write; if followers > 1M treat author as “celebrity” and merge at read from their recent posts cache. Between thresholds—product tuning and measured worker lag.

Step 7 — Celebrity accounts and hot keys

Sanity check: If fan-out queue lag spikes only when a few accounts post, you likely need more celebrity detection—not bigger Redis alone.

Step 8 — Timeline storage: Redis and cursors

Each user’s home timeline is often a sorted set: member = tweet_id, score = timestamp or rank. ZREVRANGE returns newest first. Cap length (e.g. 1000) with ZREMRANGEBYRANK after each add.

Pagination cursor — return max_id / since_id (tweet id boundaries) so clients page without OFFSET scans. Stable under concurrent inserts because ids are monotonic-ish.

ZADD home:uid_42 1716123456789 tw_998877
ZREVRANGE home:uid_42 0 19 WITHSCORES
-- cursor for next page: max_id = tw_998800

Step 9 — Social graph service

Store directed edges (follower_id → followee_id) with metadata (created_at, notifications on). Follow triggers: increment counts, warm timeline (optional backfill of recent posts), invalidate graph cache. Unfollow stops future fan-out; does not always remove historical ids (product choice). Block/mute filters applied at fan-out or hydration so harmful content never enters timeline assembly.

Graph may live in a dedicated store (FlockDB-style, or sharded SQL) with fan-out workers reading follower lists in chunks.

Step 10 — Ranking: Following vs For You

Following (chronological) — order by tweet id / timestamp from merged fan-out cache; simple and explainable.

For You (algorithmic) — candidate generation (who you might care about) + scoring model (engagement probability, diversity, freshness). Often a separate mixer service runs after hydration or on ids only for speed.

Step 11 — Engagement, replies, and counters

Likes and reposts are separate writes from the tweet body—idempotent (user_id, tweet_id) keys. Counters may use Redis INCR with async flush to SQL or Cassandra for durability. Replies attach conversation_id and in_reply_to_tweet_id; thread view is another timeline type (conversation tree or flat with root).

Step 12 — Media, cards, and hydration

Upload media to object storage; processing service generates variants (thumbnail, HLS). Tweet row stores media_ids only. Hydration batch-gets tweets, users, media metadata in parallel (single RPC multi-get pattern). Missing tweet (deleted) → skip slot or show “unavailable” placeholder.

Step 13 — Search, trends, and notifications (adjacent loops)

Search — inverted index (Elasticsearch/Lucene) fed by post.created; different SLO than home timeline. Trends — aggregate hashtag/entity counts in streaming window (Flink/Storm-class). Push — notification service consumes same events; fan-out to device tokens with per-user prefs. Keep these off the critical home timeline read path.

Step 14 — Scale, sharding, and multi-region

Step 15 — Technical layer: APIs and payloads

OperationHTTPSuccessNotes
Create tweet POST /2/tweets 201 Body: text, optional media.media_ids, reply settings
Home timeline GET /2/timeline/home?max_results=20&pagination_token=… 200 Returns hydrated data[] + meta.next_token
User timeline GET /2/users/{id}/tweets 200 Author index; cheaper than home merge
Delete tweet DELETE /2/tweets/{id} 200 Triggers tombstone + cache purge
Like POST /2/users/{uid}/likes 201 Idempotent like on duplicate

Create tweet (illustrative JSON):

POST /2/tweets
Authorization: Bearer …
Content-Type: application/json

{
  "text": "Three loops: publish, read, graph.",
  "media": { "media_ids": ["m_abc123"] }
}

→ 201
{
  "data": {
    "id": "1847263920182345728",
    "text": "Three loops: publish, read, graph.",
    "author_id": "u_991"
  }
}

Logical tables

tweets(id, author_id, text, created_at, conversation_id, in_reply_to_id, deleted)
users(id, handle, display_name, …)
follows(follower_id, followee_id, created_at)
timelines(user_id, tweet_id, score)  -- often Redis, not SQL
likes(user_id, tweet_id, created_at)
media(id, owner_id, object_key, type, status)

Step 16 — Reliability, observability, and failure modes

Failure modes

Observability

Step 17 — Goals → knobs (quick reference)

GoalKnob
Feed feels instantFan-out on write, Redis timelines, parallel hydration
Survive celebritiesHybrid fan-out, read merge, dedicated queues
Relevant For YouCandidate + rank services, guardrails, fresh feature pipeline
Deletes stickTombstones, cache purge workers, filter at hydrate
Cost under controlTrim timeline length, tier cold data, batch fan-out writes

Step 18 — Close the loop (what to practice)

On a whiteboard: three loops, one post, one home read; mark where Redis vs tweet store vs graph store sit.

Out loud: when you fan-out on write vs read; give a follower count threshold.

With the technical section: trace POST /2/tweets through Kafka fan-out to GET /2/timeline/home.

The one line to remember

The timeline is a cached list of tweet ids per user, filled by publish-time fan-out (except celebrities) and turned into UI by hydration. Optimize writes for the long tail of small accounts; optimize reads for everyone; never fan-out 50 million Redis writes for one post.