How Reddit works at scale

Reddit looks like a front page and comment threads. At scale it is millions of communities (subreddits), each with its own posts, votes, and moderators—merged into personalized feeds and ranked by algorithms that balance score, time, and trust.

We work through the design in order—requirements first, numbers second, architecture third, APIs last—using a Reddit-class product as the mental model, not any one company’s private implementation.

What you should be able to do after reading:

Separate the three loops—content, feed/ranking, and community/moderation.
List functional and non-functional requirements for posters, readers, and mods.
Walk one post: submit → store → vote events → hot score → home/subreddit feed read.
Explain comment trees, vote fuzzing, and why “hot” is not the same as “top.”
Read the technical section: listings API, thing ids, and vote idempotency.

Step 0 — How we will work through the problem

Ordered thinking beats memorizing an orange alien. Use this sequence when you design a forum at scale:

Clarify scope. Posts + comments only, or chat/DMs? Old.reddit vs card feed? Ads and awards in scope?
Write requirements. Functional = submit, vote, moderate. Non-functional = feed latency, vote accuracy, anti-abuse.
Do napkin math. DAU, votes per second, comments per post on viral threads—size caches and queues.
Draw three loops before naming Cassandra or Redis.
Tell one story—new link in r/technology hits front page—then brigading and mod removal.

flowchart LR
  subgraph content [Content loop]
    SUB[Submit post] --> STORE[(Post + comment store)]
    VOTE[Votes] --> STORE
  end
  subgraph feed [Feed loop]
    STORE --> RANK[Score + hot rank]
    RANK --> LIST[Listing caches]
  end
  subgraph community [Community loop]
    MOD[Mods + rules] --> STORE
    SUBR[Subreddit metadata] --> LIST
  end

Step 1 — Functional requirements (users, mods, platform)

Actor	Requirement	Why scale makes it hard
Reader	Home feed from subscriptions + r/all	Merge hundreds of subreddit lists
Reader	Subreddit sorts: hot, new, top, rising, controversial	Different indexes per sort
Reader	Nested comments, collapse, continue thread	Deep trees; “load more” pagination
Poster	Text, link, image, video, crosspost	Media pipeline + link preview
Poster	Upvote/downvote posts and comments	Extreme write rate on viral posts
Member	Subscribe, multireddits, save, hide	Per-user feed composition
Mod	Remove, lock, sticky, ban, AutoModerator	Rules engine + audit log
Platform	Search, trending, notifications	Separate indexes and push fan-out
Platform	Karma, awards, ads	Derived counters; billing isolation

Functional details worth stating clearly

Thing ids — base36 ids like t3_abc (post), t1_xyz (comment); prefix encodes type.

Sorts are different products. “Hot” decays with time; “Top” needs time window (day/week/all).

Out of scope today (say it aloud). Building a full ad auction, or realtime chat for every subreddit—park if excluded.

Step 2 — Non-functional requirements (engineering promises)

Category	Target (typical)	How we meet it	If we miss it
Latency — feed read	p95 < 300 ms first page	Precomputed listings, CDN for static	Users bounce to competitors
Latency — vote	Feels instant; async aggregate OK	Write vote row; batch counter updates	Perceived lag on buttons
Correctness — score	No lost votes at scale	Idempotent vote keys; event log	Wrong ranking, user outrage
Availability	99.9%+ read path monthly	Cache fallbacks, read replicas	“Reddit is down” posts on Twitter
Abuse resistance	Brigades, bots, spam	Rate limits, ML, shadow delays	Front page unusable
Consistency — comments	Tree readable under load	Per-post comment shards; lazy load	Missing parent comments
Retention	Years of posts searchable	Tiered storage; archive cold subreddits	Storage cost explosion

Key idea: Votes are write-heavy; feeds are read-heavy with precomputed sorts. Never compute “hot” for every post on every home page load.

Step 3 — Napkin math (posts, votes, and viral threads)

~50M+ DAU order of magnitude; billions of votes per month.
Viral post: 100k comments → comment tree pagination mandatory; single thread cannot load in one query.
10k votes/min on one post → hot counter service sharded by post_id; fuzz displayed score in UI.
Home feed merges 100+ subscribed subreddits → pre-merge listing service or parallel fetch + heap merge.
Media posts dominate egress; text posts dominate metadata store.

Step 4 — Architecture: three loops

Content services store posts, comments, votes. Listing service maintains sorted sets per subreddit per sort (Redis or custom). Ranking workers recompute hot scores on schedule or on vote thresholds. Community service holds subreddit config, mod permissions, rules. Event bus (Kafka) connects votes to counters, search, notifications, anti-abuse.

flowchart TB
  subgraph clients [Clients]
    WEB[Web / mobile]
  end
  subgraph edge [Edge]
    GW[API gateway]
  end
  subgraph content [Content]
    POST[Post service]
    CMT[Comment service]
    VOT[Vote service]
  end
  subgraph feed [Feed]
    LIST[Listing / rank]
    HOME[Home merger]
  end
  subgraph data [Data]
    DB[(Sharded stores)]
    CACHE[("Redis sorted sets")]
    BUS[("Event bus")]
    OBJ[(Media CDN)]
  end
  WEB --> GW
  GW --> POST --> DB
  GW --> CMT --> DB
  GW --> VOT --> DB
  VOT --> BUS
  BUS --> LIST --> CACHE
  LIST --> HOME
  POST --> OBJ
  GW --> HOME

Step 5 — Walk one post from submit to front page

Submit — POST /api/submit with sr (subreddit), title, url or selftext.
Validate — karma thresholds, rate limits, subreddit rules (flair required, etc.).
Persist — post row with post_id, created_utc, author; media async to CDN.
Listing — insert id into subreddit new listing; initialize score=1 (author upvote).
Votes arrive — vote service appends (user_id, post_id, direction); emits event.
Aggregator updates score, upvote_ratio; recomputes hot_rank periodically.
Home — if subreddit in many subscriptions and hot rank crosses threshold, appears in merged feeds.
Read — client GET /r/technology/hot returns listing of ids + hydrate post cards in batch.

sequenceDiagram
  participant U as User
  participant A as API
  participant P as Post store
  participant L as Listings
  participant V as Votes
  U->>A: submit post
  A->>P: insert
  A->>L: add to new/hot candidate
  U->>A: upvote
  A->>V: record vote
  V->>L: update score/rank
  U->>A: GET hot listing
  A->>L: top ids
  A->>P: hydrate

Step 6 — Subreddits, multireddits, and namespaces

Subreddit — config: title, rules, allowed post types, NSFW flag, default sort. Subscription — user ↔ subreddit edge stored for home feed merger. Multireddit — custom bundle of subreddits as pseudo-feed. Sharding often by subreddit_id for listings and by post_id for comments.

Step 7 — Voting, scores, and the hot algorithm

Each user gets at most one vote per thing: +1, -1, or 0 (unvote). Store votes in a table keyed (user_id, thing_id) for idempotency.

Wilson score or Reddit hot classics: hot combines log(score) with time decay so new good posts rise and old posts fall. “Top” uses score within time window; “Controversial” uses ratio patterns.

# Simplified hot intuition (not exact legacy source)
score = upvotes - downvotes
order = log10(max(abs(score), 1))
sign = 1 if score > 0 else -1 if score < 0 else 0
seconds = epoch_seconds - created_utc
hot = sign * order - seconds / 45000

Vote fuzzing — UI shows approximate counts on hot posts to obscure exact brigade timing (product choice).

Step 8 — Listings, caches, and feed merge

Per subreddit, per sort: sorted set of post_id → rank_score in Redis.
Home feed — fetch top K from each subscribed subreddit in parallel; merge by score; or maintain pre-merged shard per user (expensive).
r/all — global firehose with NSFW filters and spam thresholds.
Pagination: after=t3_abc cursor, not OFFSET.

Step 9 — Comment trees and pagination

Comments store parent_id (post or comment) and link_id (root post). Sorts: best (confidence), top, new, controversial, old. Load top-level first; “load more replies” fetches subtrees by parent_id with limits. Very deep threads may collapse or cap depth for UX and SQL recursion cost.

comments(
  comment_id,
  link_id,      -- root post t3_
  parent_id,    -- t3_ or t1_
  author_id,
  body,
  score,
  created_utc,
  collapsed,
  deleted
)

Step 10 — Moderation, AutoMod, and safety

Mod queue — reported items; mod actions logged.
AutoModerator — rule DSL (keyword, karma, account age) → remove/flair/notify.
Sitewide — admin overrides, quarantine subreddit, ban evasion detection.
Brigading — sudden vote velocity from correlated accounts; delay ranking or require captcha.

Step 11 — Media, previews, and CDN

Image/video uploads to object storage; generate thumbnails and HLS for video. Link posts fetch OG preview async (with SSRF protection). Card view vs classic view is client rendering; API returns structured media metadata.

Step 12 — Search, notifications, and karma

Search — Elasticsearch index of title/body/subreddit; separate from listing Redis. Notifications — reply to comment, username mention, mod mail; inbox table + push. Karma — derived from vote events with caps per day to limit farming; not spent like currency (awards separate).

Step 13 — Anti-abuse and rate limits

Per-IP and per-account rate limits on submit/vote/comment.
Shadowban — user thinks they post; world does not see (controversial; use carefully).
Captcha and phone verify on suspicious patterns.
Vote manipulation detection compares vote sources vs normal baselines.

Step 14 — Scale, sharding, and read replicas

Shard posts/comments by post_id; listings by subreddit_id.
Read replicas for hydrate bursts; sticky primary for vote writes on hot row.
Cache aside for post cards; invalidate on edit/delete.
Event sourcing for votes enables replay and audit; snapshots for counters.

Step 15 — Technical layer: API patterns

Operation	Pattern	Notes
List hot	`GET /r/{subreddit}/hot.json?limit=25&after=t3_…`	OAuth or cookie session
Submit	`POST /api/submit`	Form: kind, sr, title, text/url
Vote	`POST /api/vote`	id=t3_…&dir=1\|-1\|0
Comments	`GET /comments/{article_id}.json`	Post + comment tree
Post comment	`POST /api/comment`	thing_id parent, text

POST /api/vote
Content-Type: application/x-www-form-urlencoded

id=t3_1abc2de&dir=1

→ { "json": { "errors": [] } }

Listing excerpt:
{
  "data": {
    "children": [
      { "kind": "t3", "data": { "id": "1abc2de", "title": "…", "score": 4201, "num_comments": 512 } }
    ],
    "after": "t3_nextid"
  }
}

Step 16 — Reliability, observability, and failure modes

Listing stale vs post deleted — hydrate filters removed; periodic listing cleanup.
Counter drift — reconcile vote log vs displayed score nightly.
Hot post overload — circuit break comment expansion; serve cached first page only.
Feed merge timeout — return partial home feed vs error; degrade gracefully.

Metrics: vote write rate, listing p95, hydrate batch size, mod queue depth, search lag, error 5xx on submit.

Step 17 — Goals → knobs (quick reference)

Goal	Knob
Front page feels fresh	Hot decay constants, rising listing, vote velocity signals
Subreddit fast	Redis listings per sort; shard by subreddit
Fair voting	Idempotent votes, brigade detection, fuzz/delay
Mods stay sane	AutoMod, reports, audit logs, mod tools API
Survive viral thread	Comment pagination, per-post shards, read caches

Step 18 — Close the loop (what to practice)

On a whiteboard: three loops, one post path through hot listing; separate vote write path.

Out loud: hot vs top vs new; how home feed differs from r/technology/hot.

With the technical section: trace submit → listing insert → vote → hot read.

The one line to remember

Reddit-class systems are sharded content plus precomputed ranked listings per community and sort. Votes stream in fast; feeds read from caches—not from sorting millions of posts on every page load.