Multiple threads update shared data incorrectly

Scenario

Production shows wrong balances, duplicate events, impossible counters, or rare crashes that disappear on retry. Tests pass on a laptop. Under load, two threads mutate the same field or collection without a happens-before relationship—a data race. You must find the shared mutable state and fix it without freezing the whole service.

After reading, you should be able to:

Tell a race from deadlock or lock contention (wrong data vs stuck threads).
Spot check-then-act, non-atomic read-modify-write, and unsafe HashMap patterns.
Confirm with stress tests and targeted logging—not only thread dumps.
Fix with confinement, immutability, atomics, or concurrent collections.

Why — interleaving breaks assumptions

A data race occurs when two or more threads access the same memory location, at least one write is involved, and there is no synchronization establishing happens-before between them. The JVM does not guarantee which order you see; logic that works on one thread fails when another interleaves.

Race vs other concurrency bugs

Bug	Symptom	Threads
Data race	Wrong values, duplicates, torn reads	Usually still RUNNABLE; service “works” but lies
Deadlock	Hang, zero progress	Stuck in lock cycle — see deadlock guide
Contention	Slow, timeouts	Many BLOCKED on one lock
Pool exhausted	503, all workers busy	Waiting on I/O — see pool guide

Classic patterns that race in production

Check-then-act — if (!map.containsKey(k)) map.put(k, v); two threads both pass the check.
Non-atomic counter — count++ on a shared int field (lost updates).
Unsafe collections — HashMap, ArrayList mutated from many request threads.
Mutable singleton cache — static map updated without sync; visible in one pod under burst traffic.
Broken double-checked locking — partially constructed object published without volatile.
Compound actions — read balance, subtract, write; another thread interleaves between read and write.
Visibility only — one thread writes a flag, another reads without volatile / sync; reader never sees update.

Heisenbugs. Races are timing-dependent. A bug may appear only on certain hardware, after a deploy that changes GC pauses, or at peak QPS. “Cannot reproduce locally” often means insufficient concurrency in the test, not absence of the bug.

What — find shared mutable state and prove the race

Define the invariant that broke — e.g. “account balance never negative,” “event id unique,” “inventory never below zero.” That tells you which variables must be atomic as a unit.
Correlate with load — errors spike with traffic? Single region/pod? After feature flag? Points to per-instance mutable cache vs DB issue.
Audit code paths for shared mutation Search for:
- static non-final fields, especially collections and counters
- Singleton beans holding mutable maps/lists
- HashMap / ArrayList on objects shared across requests
- if ( then put / add without lock or computeIfAbsent
Thread dumps are secondary for races — dumps show where threads are, not that a counter was lost. Use dumps to rule out deadlock and massive blocking.

Reproduce under stress

// JUnit: many threads, CountDownLatch start gate, CyclicBarrier
ExecutorService ex = Executors.newFixedThreadPool(32);
// 10_000 iterations: same code path as production
// Assert invariant (sum, size, no duplicates)

Tools: jcstress, multithreaded stress IT in CI, Gatling/k6 at 2× peak RPS in staging.

Targeted logging (temporary) — log thread id + before/after values on suspect updates; compare sum of parts vs global counter in metrics.
Business reconciliation — DB totals vs in-memory cache; payment ledger vs API counter. Mismatch localizes which subsystem races.
Rule out DB race — lost update at DB layer needs transaction isolation or optimistic locking (UPDATE … WHERE version = ?), not only Java fixes.

Smoking-gun code smells

// Race: two threads can both create
if (!cache.containsKey(id)) {
  cache.put(id, loadExpensive(id));
}

// Race: lost increments
metrics.successCount++;

// Race: ConcurrentModificationException or corrupt bucket
sharedList.add(item);  // ArrayList from many threads

How — fix safely without over-locking

Fix hierarchy (prefer top to bottom)

Approach	Use when
Immutability	Replace map with new immutable copy on change; readers see stable snapshot
Thread confinement	Object never leaves creating thread (per-request locals only)
Concurrent collections / atomics	Shared cache, counters — `ConcurrentHashMap`, `LongAdder`, `AtomicReference`
Single-writer queue	All mutations on one actor thread; API posts events
Synchronized / Lock	Compound invariant spanning multiple fields
Database as source of truth	Unique constraint + transactional update; cache is read-through only

Before / after examples

// Safe lazy init per key
cache.computeIfAbsent(id, this::loadExpensive);

// Safe counter
private final LongAdder successCount = new LongAdder();
successCount.increment();

// Safe publish of immutable snapshot
private volatile Map<String, Config> configRef = Map.of();
void reload(Map<String, Config> next) {
  configRef = Map.copyOf(next);  // readers see old or new, never torn
}

Financial / inventory style updates

// Bad: read-modify-write on shared object
balance = balance - amount;

// Good: DB with row lock or optimistic version
UPDATE account SET balance = balance - ?, version = version + 1
WHERE id = ? AND version = ? AND balance >= ?

What not to do

Wrap entire service methods in synchronized — fixes races by serializing everything; creates contention and pool exhaustion.
Rely on volatile for count++ — volatile does not make compound ops atomic.
Use Collections.synchronizedMap for compound check-then-act without synchronizing on the map monitor for the whole operation.

Verify the fix

Stress test that failed before: same thread count, duration ≥ 10 min.
Reconciliation job: invariants hold over 24h in staging/production canary.
Code review: no new shared mutable statics without documented thread-safety.
Optional: enable -XX:+UnlockDiagnosticVMOptions -XX:+StressConcurrent only in dedicated test JVM (not prod).

Prevention

Default new services to immutable DTOs and stateless handlers.
Checklist: any static mutable field needs explicit concurrency story.
CI: multithreaded tests for cache, idempotency, and counter modules.
Prefer ConcurrentHashMap over synchronized HashMap when map is truly shared.

Interview one-liner

“I define the invariant that broke, find shared mutable state updated without synchronization, reproduce with a concurrent stress test, then fix with confinement, atomics, or computeIfAbsent—and use the database for authoritative state when money or inventory is involved.”

Why — interleaving breaks assumptions

Race vs other concurrency bugs

Classic patterns that race in production

What — find shared mutable state and prove the race

Smoking-gun code smells

How — fix safely without over-locking

Fix hierarchy (prefer top to bottom)

Before / after examples

Financial / inventory style updates

What not to do

Verify the fix

Prevention

Interview one-liner

Related scenarios