Delivery Guarantees & Reliability
“Exactly-once” is the most overloaded phrase in streaming systems. Kafka gives you precise primitives— idempotent producers, transactions, ISR replication—but end-to-end guarantees still require correct consumer offset commits and idempotent side effects. This chapter maps semantics to configuration and names what each layer actually promises.
The three delivery semantics — in depth
Delivery semantics describe what happens when networks fail, processes crash, and retries fire. Kafka spans producer, broker, and consumer—each layer must align or your “guarantee” collapses at the weakest link.
flowchart TB
subgraph producer["Producer layer"]
P1[acks / retries / idempotence]
end
subgraph broker["Broker layer"]
B1[replication / ISR / min.insync.replicas]
end
subgraph consumer["Consumer layer"]
C1[offset commit timing / idempotent handler]
end
P1 --> B1 --> C1
At-most-once
Definition: a record may be lost; it will not be delivered twice. Typically: acks=0 (fire-and-forget), or commit offsets before processing with no retry on failure.
Where it is acceptable:
- Metrics and telemetry where statistical sampling tolerates gaps (dashboard approximations)
- High-volume click streams where 0.1% loss is cheaper than 2× infrastructure for durability
- Ephemeral notifications (live presence pings) that self-heal on next event
- Debug/logging pipelines—not audit, billing, or inventory
acks=0
retries=0
enable.idempotence=false
At-least-once
Definition: a record will not be lost under retry scenarios; it may be delivered more than once. This is the default for most production Kafka systems: acks=all or acks=1 + retries > 0 on producer, process-then-commit (or auto-commit risks) on consumer.
Requirement: consumers and downstream sinks must be idempotent—same message applied twice produces the same state. Patterns: natural keys in DB upserts, dedup table in Redis/PostgreSQL, idempotency keys in event payload.
for (ConsumerRecord<String, OrderEvent> record : records) {
String idempotencyKey = record.topic() + "-" + record.partition() + "-" + record.offset();
if (dedupStore.exists(idempotencyKey)) {
continue; // already processed this exact broker record
}
processOrder(record.value());
dedupStore.put(idempotencyKey, Duration.ofDays(7));
consumer.commitSync(Map.of(
new TopicPartition(record.topic(), record.partition()),
new OffsetAndMetadata(record.offset() + 1)
));
}
Prefer business-level idempotency keys (orderId) over offset-based keys when replaying from an earlier offset must not skip legitimate reprocessed events.
Exactly-once
Definition: each record’s effect appears exactly once in downstream state—no loss, no duplicate side effects. Kafka does not magically guarantee this for your database writes; it provides building blocks. Two approaches:
Approach 1: Idempotent consumer (application-level)
Deduplicate using message ID in external store (Redis, DB unique constraint, DynamoDB conditional write). Works with any broker config; you own TTL, storage cost, and replay semantics. Still at-least-once on the wire—exactly-once is an application invariant.
Approach 2: Kafka transactions (platform-level)
Atomic consume → process → produce with offset commit tied to output records in one transaction. Kafka Streams processing.guarantee=exactly_once_v2 uses this internally. Requires transactional producer, read_committed consumers, and compatible brokers.
| Semantic | Loss | Duplicates | Typical config |
|---|---|---|---|
| At-most-once | Possible | No | acks=0 or commit-before-process |
| At-least-once | No (with retries) | Possible | acks=all + retries + idempotent handler |
| Exactly-once | No | No (within guarantee scope) | Transactions + read_committed + idempotent stores for external writes |
“Does Kafka support exactly-once?” — answer in layers: idempotent producer = no duplicate batches per session; transactions = atomic multi-partition write + offset commit; end-to-end EOS still needs external side effects idempotent or inside transaction boundary. Kafka EOS ≠ distributed ACID across arbitrary databases.
Kafka transactions deep dive
Transactions add a coordination layer on top of idempotent producers—grouping writes and consumer offset commits into atomic units visible only after commit.
transactional.id
Stable string per producer instance role (e.g. inventory-processor-pod-7). Broker maps it to a Producer ID (PID) and epoch. When a new producer registers with the same transactional.id, epoch increments and fences the old producer— zombie instances get ProducerFencedException on next operation.
Transaction coordinator
One broker acts as transaction coordinator per transactional.id (hash of id → broker). Manages transaction state machine: Ongoing → PrepareCommit → CompleteCommit (or Abort). Not the same as partition leader—dedicated coordination role.
__transaction_state internal topic
Compact internal topic (default 50 partitions, RF=3) storing transaction metadata: transactional id, producer id, epoch, status. Survives coordinator failover. Never manually delete in production.
Transaction phases
- initTransactions() — register PID, recover pending transactions, fence zombies
- beginTransaction() — start new txn; records tagged with transactional markers
- send() — batches include PID, epoch, sequence; written to partition logs as uncommitted
- sendOffsetsToTransaction(Map, groupMetadata) — atomically commit consumer offsets with output records
- commitTransaction() — write COMMIT marker; records become visible to read_committed
- abortTransaction() — write ABORT marker; records filtered out for consumers
sequenceDiagram participant App as Application participant TC as Transaction Coordinator participant L as Partition Leaders participant C as read_committed Consumer App->>TC: initTransactions() App->>TC: beginTransaction() App->>L: produce (uncommitted data) App->>TC: sendOffsetsToTransaction App->>TC: commitTransaction() TC->>L: COMMIT control record C->>L: fetch — sees committed data only
Partition log timeline:
[txn-start][data][data][ABORT] ← aborted txn — invisible to read_committed
[txn-start][data][data][COMMIT] ← committed txn — visible
↑ uncommitted until COMMIT marker appended
read_committed isolation
Consumers with isolation.level=read_committed filter aborted transactions and delay visibility of open transactions until commit marker arrives—prevents consumers from seeing partial writes. Default read_uncommitted sees all records immediately (including aborted txn data before abort marker—ordering nuance).
producer.initTransactions();
while (running) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(200));
if (records.isEmpty()) continue;
try {
producer.beginTransaction();
for (ConsumerRecord<String, String> record : records) {
String output = transform(record.value());
producer.send(new ProducerRecord<>("output-topic", record.key(), output));
}
Map<TopicPartition, OffsetAndMetadata> offsets = new HashMap<>();
for (TopicPartition tp : records.partitions()) {
List<ConsumerRecord<String, String>> part = records.records(tp);
long last = part.get(part.size() - 1).offset();
offsets.put(tp, new OffsetAndMetadata(last + 1));
}
producer.sendOffsetsToTransaction(offsets, consumer.groupMetadata());
producer.commitTransaction();
} catch (Exception e) {
producer.abortTransaction();
throw e;
}
}
spring.kafka:
producer:
transaction-id-prefix: inventory-txn-
properties:
enable.idempotence: true
acks: all
consumer:
enable-auto-commit: false
properties:
isolation.level: read_committed
Performance cost
Transactions add coordination round-trips per commit, control records on partitions, and coordinator load. Typical overhead: ~3–5% throughput reduction and higher p99 latency per commit boundary compared to idempotent at-least-once. Batch more records per transaction where business logic allows.
Transactions serialize commit points—high-frequency micro-transactions (one record per txn) destroy throughput. Aim for poll-batch-sized transactions unless latency SLA demands otherwise.
transaction.timeout.ms (default 60s) — transaction open longer than broker allows is aborted automatically. Long processing inside one transaction without tuning causes TransactionAbortedException.
| Property | Default | Production | Impact |
|---|---|---|---|
transactional.id |
null | Unique per logical producer | Enables transactions + fencing |
transaction.timeout.ms |
60000 | 60000–300000 | Max open transaction duration |
isolation.level |
read_uncommitted | read_committed | Consumer visibility of txn data |
Replication & durability
Producer acks mean nothing if replicas never caught the write. Broker replication is the durability floor— ISR membership and min.insync.replicas decide whether you prefer availability or data loss on failure.
Replication factor recommendations
- Production: replication.factor=3 — survives single broker loss with quorum
- Never < 2 in any environment that cares about data—single replica is a single disk failure from loss
- Development: RF=1 acceptable on single-broker Docker; not representative of production behavior
The safe production triad
replication.factor=3 + min.insync.replicas=2 + producer acks=all — write acknowledged only when at least 2 replicas (including leader) have the record. One broker down: writes continue. Two simultaneous failures: risk of unavailability (correct) not silent loss.
# Broker defaults
default.replication.factor=3
min.insync.replicas=2
unclean.leader.election.enable=false
# Per-topic override (kafka-configs.sh)
min.insync.replicas=2
cleanup.policy=delete
What happens when a broker fails
- Leader broker dies — controller detects via ZK/KRaft session; partitions with leader on dead broker become unavailable for produce/fetch briefly
- Leader election — controller elects new leader from ISR only (if unclean disabled)
- Followers catch up — new leader serves; out-of-sync replicas truncate to HW and replicate
- Clients refresh metadata — producers/consumers redirect to new leaders (automatic metadata refresh)
- Under-replicated partitions — metric rises until followers rejoin ISR
sequenceDiagram participant P as Producer participant B1 as Broker 1 Leader participant B2 as Broker 2 Follower participant C as Controller participant B3 as Broker 3 Follower P->>B1: produce acks=all B1->>B2: replicate B1->>B3: replicate Note over B1: Broker 1 crashes C->>B2: elect leader for partition P->>B2: produce resumes after metadata refresh
Preferred leader election
Each partition has a preferred leader (replica 0 in assignment). After broker restart, kafka-leader-election.sh --election-type preferred or auto.leader.rebalance.enable (legacy) moves leadership back to preferred replica— rebalancing load to intended topology and avoiding stranded leaders on replacement brokers.
Unclean leader election
unclean.leader.election.enable=false (production default): never elect a leader from outside ISR. Partition may go offline rather than serve stale data. true: availability over consistency—may promote lagging replica and lose unreplicated records.
| Setting | Broker down (1 of 3) | ISR = 1 replica |
|---|---|---|
| unclean=false, min.insync=2 | Writes continue | Produces with acks=all fail (NOT_ENOUGH_REPLICAS) |
| unclean=true | May elect out-of-sync leader | Writes may succeed with data loss risk |
ISR shrink under load — slow followers drop from ISR; with min.insync.replicas=2 and only leader in ISR, acks=all producers halt. Alert on UnderReplicatedPartitions before producers fail.
Confluent and major cloud providers enforce RF≥3 and min.insync.replicas=2 on production tiers. LinkedIn runs RF=3 with unclean election disabled—outages prefer partition unavailability over silent truncation.
Idempotent producer details
Idempotence solves producer retry duplication—not consumer duplicates, not cross-session duplicates. Understand PID, epoch, and sequence numbers to debug fencing errors and ordering guarantees.
Producer ID (PID)
On first connection with enable.idempotence=true, broker assigns a unique Producer ID (64-bit). All batches from this producer session carry this PID. Broker uses PID + partition + sequence to deduplicate.
Epoch
When a producer with transactional.id calls initTransactions(), broker increments epoch. Any older producer instance with same transactional id but lower epoch receives ProducerFencedException—prevents zombie writers after failover or rolling deploy.
Sequence number
Per-partition monotonic counter starting at 0 for each PID epoch. Broker rejects:
- Duplicate sequence — retry of already-committed batch (silent success, no duplicate in log)
- Out-of-order sequence — gap detected (e.g. max.in.flight > 1 without idempotence caused reorder); with idempotence, broker enforces strict order per partition
PID=7021, Partition orders-0: seq=0 batch [A,B] → accepted seq=1 batch [C] → network timeout, producer retries seq=1 batch [C] → duplicate → broker acks without rewrite seq=2 batch [D] → accepted
Automatic config coupling
When enable.idempotence=true, client enforces:
- acks=all
- retries > 0 (Integer.MAX_VALUE)
- max.in.flight.requests.per.connection ≤ 5 (and safe with 5)
Limitations
- Deduplication only within a producer session (same PID)
- Producer restart without transactional.id → new PID → retries across sessions can duplicate
- Does not deduplicate logically identical messages sent with different sequence contexts (two explicit sends)
- No cross-partition atomicity—that requires transactions
enable.idempotence=true
# Implicit: acks=all, retries=MAX, max.in.flight=5
# No transactional.id — no consume-transform-produce atomicity
Broker maintains in-memory sequence state per (PID, partition) with periodic snapshot to disk. High PID churn (frequent producer restarts) adds coordinator overhead—reuse long-lived producer instances in apps and connection pools.
“Idempotent producer vs transactions?” — idempotent = per-partition dedup of retry batches; transactions = multi-partition atomic commit + offset commit + fencing via transactional.id. Idempotent is lighter; transactions required for read-process-write EOS within Kafka.
End-to-end exactly-once
Kafka-native EOS covers consume → produce → offset commit inside the broker boundary. The moment you write to PostgreSQL or call Stripe, you need an explicit strategy—or you only have at-least-once with extra steps.
The full Kafka-native pattern
- Transactional producer with stable transactional.id
- Consumer with enable.auto.commit=false and isolation.level=read_committed
- Process records from poll batch
- Produce output records in same transaction
- sendOffsetsToTransaction — offset commit atomic with produces
- commitTransaction — all visible together or none
flowchart LR IN[input topic] CON[Consumer poll] TXN[Transactional producer] OUT[output topic] OFF[__consumer_offsets] IN --> CON --> TXN TXN --> OUT TXN --> OFF
Kafka Streams exactly_once_v2
processing.guarantee=exactly_once_v2 (EOS v2, Kafka 2.5+) wraps the same primitives: transactional producer per stream thread, changelog topics for state stores, offset commits tied to output and state updates in one transaction boundary.
application.id=inventory-aggregator
processing.guarantee=exactly_once_v2
replication.factor=3
# Creates internal changelog + repartition topics with RF=3
External side effects — the honesty section
Kafka EOS does not roll back your database. Patterns for external writes:
| Pattern | Mechanism | EOS scope |
|---|---|---|
| Idempotent upsert | Natural key in DB; retry safe | Effective EOS at business level |
| Outbox table | DB txn writes row + event; CDC publishes | At-least-once to Kafka; DB ACID |
| Transactional messaging | ChainedKafkaTransactionManager (Spring) — Kafka + DB | Best-effort 2PC; DB must support XA or custom |
| Saga compensation | Emit compensating event on failure | Eventual consistency |
Cost — when to enable EOS
- Enable when duplicate output records cause financial errors, double inventory deduction, or incorrect aggregates
- Skip when idempotent sinks are cheap (upsert by key) and 1–2% duplicate rate is harmless
- Measure throughput and p99 before/after—expect 3–10% hit depending on commit frequency and partition count
# Producer
enable.idempotence: true
transactional.id: ${APP_NAME}-${INSTANCE_ID}
acks: all
# Consumer
enable.auto.commit: false
isolation.level: read_committed
# Broker / topic
replication.factor: 3
min.insync.replicas: 2
unclean.leader.election.enable: false
# Streams (if applicable)
processing.guarantee: exactly_once_v2
Durability vs throughput vs exactly-once — you can maximize two, not all three. High-throughput firehose: at-least-once + idempotent sink. Ledger pipeline: acks=all + transactions + read_committed + accept latency tax.
Start with idempotent producer + at-least-once consumer + DB upsert idempotency. Add transactions only when you prove duplicates occur in consume-transform-produce loops and idempotency at sink is insufficient.
Netflix uses Kafka EOS in critical stream processors where duplicate billing events would corrupt revenue metrics. Many payment pipelines still prefer at-least-once + idempotent ledger entries—simpler ops, easier replay, audit-friendly dedup keys.