Multiple threads update shared data incorrectly

Scenario

Production shows wrong balances, duplicate events, impossible counters, or rare crashes that disappear on retry. Tests pass on a laptop. Under load, two threads mutate the same field or collection without a happens-before relationship—a data race. You must find the shared mutable state and fix it without freezing the whole service.

After reading, you should be able to:

Why — interleaving breaks assumptions

A data race occurs when two or more threads access the same memory location, at least one write is involved, and there is no synchronization establishing happens-before between them. The JVM does not guarantee which order you see; logic that works on one thread fails when another interleaves.

Race vs other concurrency bugs

BugSymptomThreads
Data raceWrong values, duplicates, torn readsUsually still RUNNABLE; service “works” but lies
DeadlockHang, zero progressStuck in lock cycle — see deadlock guide
ContentionSlow, timeoutsMany BLOCKED on one lock
Pool exhausted503, all workers busyWaiting on I/O — see pool guide

Classic patterns that race in production

Heisenbugs. Races are timing-dependent. A bug may appear only on certain hardware, after a deploy that changes GC pauses, or at peak QPS. “Cannot reproduce locally” often means insufficient concurrency in the test, not absence of the bug.

What — find shared mutable state and prove the race

  1. Define the invariant that broke — e.g. “account balance never negative,” “event id unique,” “inventory never below zero.” That tells you which variables must be atomic as a unit.
  2. Correlate with load — errors spike with traffic? Single region/pod? After feature flag? Points to per-instance mutable cache vs DB issue.
  3. Audit code paths for shared mutation Search for:
    • static non-final fields, especially collections and counters
    • Singleton beans holding mutable maps/lists
    • HashMap / ArrayList on objects shared across requests
    • if ( then put / add without lock or computeIfAbsent
  4. Thread dumps are secondary for races — dumps show where threads are, not that a counter was lost. Use dumps to rule out deadlock and massive blocking.
  5. Reproduce under stress
    // JUnit: many threads, CountDownLatch start gate, CyclicBarrier
    ExecutorService ex = Executors.newFixedThreadPool(32);
    // 10_000 iterations: same code path as production
    // Assert invariant (sum, size, no duplicates)
    Tools: jcstress, multithreaded stress IT in CI, Gatling/k6 at 2× peak RPS in staging.
  6. Targeted logging (temporary) — log thread id + before/after values on suspect updates; compare sum of parts vs global counter in metrics.
  7. Business reconciliation — DB totals vs in-memory cache; payment ledger vs API counter. Mismatch localizes which subsystem races.
  8. Rule out DB race — lost update at DB layer needs transaction isolation or optimistic locking (UPDATE … WHERE version = ?), not only Java fixes.

Smoking-gun code smells

// Race: two threads can both create
if (!cache.containsKey(id)) {
  cache.put(id, loadExpensive(id));
}

// Race: lost increments
metrics.successCount++;

// Race: ConcurrentModificationException or corrupt bucket
sharedList.add(item);  // ArrayList from many threads

How — fix safely without over-locking

Fix hierarchy (prefer top to bottom)

ApproachUse when
ImmutabilityReplace map with new immutable copy on change; readers see stable snapshot
Thread confinementObject never leaves creating thread (per-request locals only)
Concurrent collections / atomicsShared cache, counters — ConcurrentHashMap, LongAdder, AtomicReference
Single-writer queueAll mutations on one actor thread; API posts events
Synchronized / LockCompound invariant spanning multiple fields
Database as source of truthUnique constraint + transactional update; cache is read-through only

Before / after examples

// Safe lazy init per key
cache.computeIfAbsent(id, this::loadExpensive);

// Safe counter
private final LongAdder successCount = new LongAdder();
successCount.increment();

// Safe publish of immutable snapshot
private volatile Map<String, Config> configRef = Map.of();
void reload(Map<String, Config> next) {
  configRef = Map.copyOf(next);  // readers see old or new, never torn
}

Financial / inventory style updates

// Bad: read-modify-write on shared object
balance = balance - amount;

// Good: DB with row lock or optimistic version
UPDATE account SET balance = balance - ?, version = version + 1
WHERE id = ? AND version = ? AND balance >= ?

What not to do

Verify the fix

  1. Stress test that failed before: same thread count, duration ≥ 10 min.
  2. Reconciliation job: invariants hold over 24h in staging/production canary.
  3. Code review: no new shared mutable statics without documented thread-safety.
  4. Optional: enable -XX:+UnlockDiagnosticVMOptions -XX:+StressConcurrent only in dedicated test JVM (not prod).

Prevention

Interview one-liner

“I define the invariant that broke, find shared mutable state updated without synchronization, reproduce with a concurrent stress test, then fix with confinement, atomics, or computeIfAbsent—and use the database for authoritative state when money or inventory is involved.”

Related scenarios