Developer
developerProducers, consumers, consumer groups, offset management, serialization, Spring Kafka, error handling, and testing.
Kafka Core
Internals-first guides for developers, data engineers, and platform teams—what happens on disk, on the wire, and in the JVM when you ship event-driven systems at scale.
In 2010, LinkedIn’s data pipeline was a tangle of point-to-point integrations—every new consumer meant wiring another bespoke feed from every producer. Jay Kreps, Neha Narkhede, and Jun Rao built Kafka to solve one problem: decouple producers from consumers with a durable, replayable log that many applications could read independently. They open-sourced it in 2011; it entered the Apache Incubator in 2012.
The name comes from Franz Kafka—the author of surreal bureaucracies—because Kreps wanted a system that captured the absurdity of enterprise data plumbing. The irony stuck; the architecture outlasted the joke. Today Kafka is the default event backbone for LinkedIn, Uber, Netflix, and thousands of banks and retailers.
Traditional message brokers delete messages after delivery. Kafka retains them in an append-only log. Think of a newspaper printing press: editions roll off the press in order; subscribers pick up today’s paper (or yesterday’s, if they want a replay). The press doesn’t track who read page 3—it just keeps printing.
Each partition is a directory of segment files (.log, .index, .timeindex) on broker disk. An offset is simply the byte position in that log—not a message ID assigned by the broker.
When asked “Kafka vs RabbitMQ,” lead with retention and replay, not throughput alone. Queues optimize for task distribution; logs optimize for event history and fan-out.
Kafka’s design trades broker-side complexity for sequential I/O, zero-copy networking, and consumer-driven flow control. These aren’t marketing bullets—they explain why a single broker can sustain hundreds of MB/s per disk.
Append-only writes and reads hit sequential bandwidth on HDD and SSD. Random-access queues thrash disks; Kafka batches into large sequential segments.
sendfile() moves data from page cache to the NIC without copying through userspace—critical for multi-GB/s egress on commodity hardware.
Consumers fetch at their own pace with poll(). Slow consumers don’t back-pressure the broker into dropping or spilling messages—they just lag.
Messages live for days or forever (compacted topics). New consumer groups start from earliest or latest without rewiring producers.
Parallelism scales with partition count. Each lane (partition) preserves order; more lanes = more throughput and more consumer instances in a group.
| Dimension | Traditional queue (RabbitMQ, SQS) | Kafka commit log |
|---|---|---|
| Message lifecycle | Deleted after ack | Retained by time/size/policy |
| Consumer model | Broker pushes / competes for messages | Consumer pulls via offset |
| Replay | Not supported (dead letter only) | Reset offsets, re-read history |
| Fan-out | Exchange + multiple queues | Multiple consumer groups, one topic |
| Ordering | Per queue | Per partition (key-based routing) |
Kafka optimizes for throughput and durability of history, not sub-millisecond task queues or complex routing DSLs. Use a queue for work distribution; use Kafka for event streams you may need to re-read.
One log, many consumers—each use case exploits retention, ordering per key, or replay differently.
Stream 1
Real-time pipelines: clicks, orders, sensor readings flowing between microservices.
Stream 2
Debezium reads DB transaction logs → Kafka topics mirror row-level changes.
Stream 3
App and server logs centralized; Elasticsearch or S3 sinks consume the same topic.
Stream 4
LinkedIn’s original use case—profile views, feed impressions, ad events at billions/day.
Stream 5
High-cardinality telemetry buffered and routed to time-series stores.
Stream 6
Domain events as source of truth; projections rebuild read models from the log.
Stream 7
Kafka Streams / Flink / ksqlDB derive aggregates, joins, and alerts in flight.
Uber routes trillions of messages/day through Kafka for dispatch, pricing, and fraud. Netflix uses Kafka for studio workflow events and real-time recommendations. Confluent (founded by Kafka’s creators) runs managed Kafka for thousands of enterprises.
Apache Kafka is the core broker and client APIs. Confluent and the community built the surrounding platform— know what ships in open source vs what requires Confluent Platform or Cloud.
You can run production Kafka with only Apache artifacts. Add Confluent Schema Registry when multiple teams share topics and you need enforced schema evolution—not on day one of a greenfield prototype.
From LinkedIn internal project to the default event backbone of the cloud-native era—and the multi-year migration away from ZooKeeper.
First public release. Solves the “every consumer needs a custom pipeline” problem with a shared commit log.
Kafka joins Apache; ZooKeeper manages cluster metadata, controller election, and ACLs.
Stream processing as a library—no separate cluster, changelog-backed state stores.
Idempotent producer + transactions + read_committed consumers ship in Kafka 0.11.
Fully managed Kafka; KIP-500 KRaft mode development accelerates—Raft replaces ZooKeeper for metadata.
Kafka 2.8+ supports KRaft in preview. __cluster_metadata topic stores broker/topic state.
Production-ready KRaft; faster failover, simpler ops, no external ZK ensemble to babysit.
ZK support dropped. New clusters run KRaft only. Migration tooling for existing estates.
The mental model every chapter builds on. Records land in partition logs on brokers; consumer groups track their own offsets independently.
Producers
Serialize → partition (key hash) → batch → send to leader broker
Topic / Partitions
Append-only logs on brokers; RF=3 replicas; ISR tracks in-sync followers
Consumer groups
poll() → fetch → process → commit offset to __consumer_offsets
flowchart LR P1[Producer A] P2[Producer B] B1[Broker 1\nLeader P0] B2[Broker 2\nLeader P1] B3[Broker 3\nLeader P2] CG1[Consumer Group\nAnalytics] CG2[Consumer Group\nAudit] P1 -->|orders-0| B1 P2 -->|orders-1| B2 P2 -->|orders-2| B3 B1 --> CG1 B2 --> CG1 B3 --> CG1 B1 --> CG2 B2 --> CG2 B3 --> CG2
More consumers than partitions = idle consumers. Partitions are the unit of parallelism and ordering—plan partition count up front; you cannot reduce it later.
Twelve deep-dive chapters plus cheat sheets. Recommended path: Architecture → Producers → Consumers → Reliability, then Streams, Connect, and operations as your role requires.
Learning path: Architecture · Producers · Consumers · Reliability · Patterns
Producers, consumers, consumer groups, offset management, serialization, Spring Kafka, error handling, and testing.
Kafka Streams, windowing, schema evolution, exactly-once, CDC pipelines, and multi-cluster patterns.
Broker tuning, topic admin, monitoring, capacity planning, security, DR, and KRaft operations.
Commit log, segments, KRaft, ISR, storage, zero-copy, network layer.
RecordAccumulator, partitioning, acks, compression, exactly-once producer.
poll() loop, offsets, rebalancing, lag, multi-threading patterns.
At-most/least/exactly-once, transactions, replication durability.
Topologies, KStream/KTable, windowing, state stores, scaling.
Connectors, Debezium, SMTs, source/sink tuning, error handling.
Avro, compatibility modes, evolution patterns, Spring integration.
Topic admin, rebalancing, metrics, security, capacity planning.
Event sourcing, outbox, sagas, DLQ, fan-out, MirrorMaker.
KafkaTemplate, @KafkaListener, transactions, DLT, testing.
Producer/consumer/broker tuning, latency vs throughput trade-offs.
Developer, data engineer, and platform quick references.