sharpbyte.dev

How Reddit works at scale

Reddit looks like a front page and comment threads. At scale it is millions of communities (subreddits), each with its own posts, votes, and moderators—merged into personalized feeds and ranked by algorithms that balance score, time, and trust.

We work through the design in order—requirements first, numbers second, architecture third, APIs last—using a Reddit-class product as the mental model, not any one company’s private implementation.

What you should be able to do after reading:

Step 0 — How we will work through the problem

Ordered thinking beats memorizing an orange alien. Use this sequence when you design a forum at scale:

  1. Clarify scope. Posts + comments only, or chat/DMs? Old.reddit vs card feed? Ads and awards in scope?
  2. Write requirements. Functional = submit, vote, moderate. Non-functional = feed latency, vote accuracy, anti-abuse.
  3. Do napkin math. DAU, votes per second, comments per post on viral threads—size caches and queues.
  4. Draw three loops before naming Cassandra or Redis.
  5. Tell one story—new link in r/technology hits front page—then brigading and mod removal.
flowchart LR
  subgraph content [Content loop]
    SUB[Submit post] --> STORE[(Post + comment store)]
    VOTE[Votes] --> STORE
  end
  subgraph feed [Feed loop]
    STORE --> RANK[Score + hot rank]
    RANK --> LIST[Listing caches]
  end
  subgraph community [Community loop]
    MOD[Mods + rules] --> STORE
    SUBR[Subreddit metadata] --> LIST
  end
    

Step 1 — Functional requirements (users, mods, platform)

ActorRequirementWhy scale makes it hard
ReaderHome feed from subscriptions + r/allMerge hundreds of subreddit lists
ReaderSubreddit sorts: hot, new, top, rising, controversialDifferent indexes per sort
ReaderNested comments, collapse, continue threadDeep trees; “load more” pagination
PosterText, link, image, video, crosspostMedia pipeline + link preview
PosterUpvote/downvote posts and commentsExtreme write rate on viral posts
MemberSubscribe, multireddits, save, hidePer-user feed composition
ModRemove, lock, sticky, ban, AutoModeratorRules engine + audit log
PlatformSearch, trending, notificationsSeparate indexes and push fan-out
PlatformKarma, awards, adsDerived counters; billing isolation

Functional details worth stating clearly

Thing ids — base36 ids like t3_abc (post), t1_xyz (comment); prefix encodes type.

Sorts are different products. “Hot” decays with time; “Top” needs time window (day/week/all).

Out of scope today (say it aloud). Building a full ad auction, or realtime chat for every subreddit—park if excluded.

Step 2 — Non-functional requirements (engineering promises)

CategoryTarget (typical)How we meet itIf we miss it
Latency — feed readp95 < 300 ms first pagePrecomputed listings, CDN for staticUsers bounce to competitors
Latency — voteFeels instant; async aggregate OKWrite vote row; batch counter updatesPerceived lag on buttons
Correctness — scoreNo lost votes at scaleIdempotent vote keys; event logWrong ranking, user outrage
Availability99.9%+ read path monthlyCache fallbacks, read replicas“Reddit is down” posts on Twitter
Abuse resistanceBrigades, bots, spamRate limits, ML, shadow delaysFront page unusable
Consistency — commentsTree readable under loadPer-post comment shards; lazy loadMissing parent comments
RetentionYears of posts searchableTiered storage; archive cold subredditsStorage cost explosion

Key idea: Votes are write-heavy; feeds are read-heavy with precomputed sorts. Never compute “hot” for every post on every home page load.

Step 3 — Napkin math (posts, votes, and viral threads)

Step 4 — Architecture: three loops

Content services store posts, comments, votes. Listing service maintains sorted sets per subreddit per sort (Redis or custom). Ranking workers recompute hot scores on schedule or on vote thresholds. Community service holds subreddit config, mod permissions, rules. Event bus (Kafka) connects votes to counters, search, notifications, anti-abuse.

flowchart TB
  subgraph clients [Clients]
    WEB[Web / mobile]
  end
  subgraph edge [Edge]
    GW[API gateway]
  end
  subgraph content [Content]
    POST[Post service]
    CMT[Comment service]
    VOT[Vote service]
  end
  subgraph feed [Feed]
    LIST[Listing / rank]
    HOME[Home merger]
  end
  subgraph data [Data]
    DB[(Sharded stores)]
    CACHE[("Redis sorted sets")]
    BUS[("Event bus")]
    OBJ[(Media CDN)]
  end
  WEB --> GW
  GW --> POST --> DB
  GW --> CMT --> DB
  GW --> VOT --> DB
  VOT --> BUS
  BUS --> LIST --> CACHE
  LIST --> HOME
  POST --> OBJ
  GW --> HOME
    

Step 5 — Walk one post from submit to front page

  1. SubmitPOST /api/submit with sr (subreddit), title, url or selftext.
  2. Validate — karma thresholds, rate limits, subreddit rules (flair required, etc.).
  3. Persist — post row with post_id, created_utc, author; media async to CDN.
  4. Listing — insert id into subreddit new listing; initialize score=1 (author upvote).
  5. Votes arrive — vote service appends (user_id, post_id, direction); emits event.
  6. Aggregator updates score, upvote_ratio; recomputes hot_rank periodically.
  7. Home — if subreddit in many subscriptions and hot rank crosses threshold, appears in merged feeds.
  8. Read — client GET /r/technology/hot returns listing of ids + hydrate post cards in batch.
sequenceDiagram
  participant U as User
  participant A as API
  participant P as Post store
  participant L as Listings
  participant V as Votes
  U->>A: submit post
  A->>P: insert
  A->>L: add to new/hot candidate
  U->>A: upvote
  A->>V: record vote
  V->>L: update score/rank
  U->>A: GET hot listing
  A->>L: top ids
  A->>P: hydrate
    

Step 6 — Subreddits, multireddits, and namespaces

Subreddit — config: title, rules, allowed post types, NSFW flag, default sort. Subscription — user ↔ subreddit edge stored for home feed merger. Multireddit — custom bundle of subreddits as pseudo-feed. Sharding often by subreddit_id for listings and by post_id for comments.

Step 7 — Voting, scores, and the hot algorithm

Each user gets at most one vote per thing: +1, -1, or 0 (unvote). Store votes in a table keyed (user_id, thing_id) for idempotency.

Wilson score or Reddit hot classics: hot combines log(score) with time decay so new good posts rise and old posts fall. “Top” uses score within time window; “Controversial” uses ratio patterns.

# Simplified hot intuition (not exact legacy source)
score = upvotes - downvotes
order = log10(max(abs(score), 1))
sign = 1 if score > 0 else -1 if score < 0 else 0
seconds = epoch_seconds - created_utc
hot = sign * order - seconds / 45000

Vote fuzzing — UI shows approximate counts on hot posts to obscure exact brigade timing (product choice).

Step 8 — Listings, caches, and feed merge

Step 9 — Comment trees and pagination

Comments store parent_id (post or comment) and link_id (root post). Sorts: best (confidence), top, new, controversial, old. Load top-level first; “load more replies” fetches subtrees by parent_id with limits. Very deep threads may collapse or cap depth for UX and SQL recursion cost.

comments(
  comment_id,
  link_id,      -- root post t3_
  parent_id,    -- t3_ or t1_
  author_id,
  body,
  score,
  created_utc,
  collapsed,
  deleted
)

Step 10 — Moderation, AutoMod, and safety

Step 11 — Media, previews, and CDN

Image/video uploads to object storage; generate thumbnails and HLS for video. Link posts fetch OG preview async (with SSRF protection). Card view vs classic view is client rendering; API returns structured media metadata.

Step 12 — Search, notifications, and karma

Search — Elasticsearch index of title/body/subreddit; separate from listing Redis. Notifications — reply to comment, username mention, mod mail; inbox table + push. Karma — derived from vote events with caps per day to limit farming; not spent like currency (awards separate).

Step 13 — Anti-abuse and rate limits

Step 14 — Scale, sharding, and read replicas

Step 15 — Technical layer: API patterns

OperationPatternNotes
List hotGET /r/{subreddit}/hot.json?limit=25&after=t3_…OAuth or cookie session
SubmitPOST /api/submitForm: kind, sr, title, text/url
VotePOST /api/voteid=t3_…&dir=1|-1|0
CommentsGET /comments/{article_id}.jsonPost + comment tree
Post commentPOST /api/commentthing_id parent, text
POST /api/vote
Content-Type: application/x-www-form-urlencoded

id=t3_1abc2de&dir=1

→ { "json": { "errors": [] } }

Listing excerpt:
{
  "data": {
    "children": [
      { "kind": "t3", "data": { "id": "1abc2de", "title": "…", "score": 4201, "num_comments": 512 } }
    ],
    "after": "t3_nextid"
  }
}

Step 16 — Reliability, observability, and failure modes

Metrics: vote write rate, listing p95, hydrate batch size, mod queue depth, search lag, error 5xx on submit.

Step 17 — Goals → knobs (quick reference)

GoalKnob
Front page feels freshHot decay constants, rising listing, vote velocity signals
Subreddit fastRedis listings per sort; shard by subreddit
Fair votingIdempotent votes, brigade detection, fuzz/delay
Mods stay saneAutoMod, reports, audit logs, mod tools API
Survive viral threadComment pagination, per-post shards, read caches

Step 18 — Close the loop (what to practice)

On a whiteboard: three loops, one post path through hot listing; separate vote write path.

Out loud: hot vs top vs new; how home feed differs from r/technology/hot.

With the technical section: trace submit → listing insert → vote → hot read.

The one line to remember

Reddit-class systems are sharded content plus precomputed ranked listings per community and sort. Votes stream in fast; feeds read from caches—not from sorting millions of posts on every page load.