Cache returns stale data after an update

Scenario

A user updates a record; the database is correct but the UI still shows the old value from Redis or an in-process Caffeine cache. Sometimes it fixes after TTL expires. You must choose a consistency model: invalidate on write, shorter TTL, versioned entries, or accept staleness with clear product rules—not accidental forever-stale keys.

After reading, you should be able to:

Compare cache-aside, write-through, and read-through failure modes.
Prove staleness (DB vs cache) and find missing invalidation paths.
Implement delete-on-write, versioned keys, and cross-pod invalidation.
Distinguish cache staleness from replica lag.

Why — caches trade freshness for speed

A cache is a copy of data stored closer to the app (memory, Redis). Unless you synchronize every write with every copy, readers can see stale values. The common cache-aside pattern (app reads cache, on miss loads DB and fills cache) fails when updates write DB but forget to evict cache—the old value lives until TTL.

Typical causes of stale reads

Cause	What happens
No invalidation on update	DB new, cache old until TTL
Wrong key evicted	Updated `user:1` but forgot `user:1:profile`
Per-pod local cache	Pod A invalidated; Pod B still serves stale — inconsistent across pods
Write to replica, read cache filled from old primary path	Overlaps replica lag
Race: read repopulates stale after delete	Thread order: delete cache → slow read loads old DB → writes cache
Long TTL	“Stale by design” longer than product allows
CDN / HTTP cache	Separate layer from Redis—not invalidated

What — confirm cache is the problem

Compare authoritative source vs cache — query DB by primary key; GET same key from Redis; values differ → stale cache.
Log cache decision per request — cache_hit=true, cache_key, cache_version in structured logs.
Reproduce path — PUT update → immediate GET; note which pod and whether hit local vs Redis.
Map all writers — admin tool, batch job, another service also updates DB without evicting cache?
Check TTL — if staleness always ~60s, suspect TTL-only consistency.
Not cache — if no cache hit but data old → replica or wrong DB — replica lag guide.

Patterns at a glance

Pattern	Write path	Stale risk
Cache-aside	Update DB; app must evict cache	High if invalidation missed
Write-through	Update cache + DB together	Lower; cache write can fail
Write-behind	Update cache; async DB	Loss/durability risk; rare for user data
Read-through	Cache loads on miss	Stale until invalidation/TTL

How — consistency tactics

1. Cache-aside with explicit eviction (default fix)

@Transactional
void updateUser(User u) {
  userRepo.save(u);
  cache.evict("user:" + u.getId());
  cache.evict("user:" + u.getId() + ":profile");  // related keys
}

Evict after successful commit (or listen to transaction commit event) so a rolled-back TX does not clear cache unnecessarily.

2. Versioned cache entries

// DB row has version column
cache.put("user:42:v" + row.getVersion(), dto);
// Read: load latest version from DB metadata or always evict all user:42:*

Readers with old version key miss and reload—or store single key with version inside value and compare on read.

3. TTL as safety net, not primary strategy

Caffeine.newBuilder()
  .expireAfterWrite(Duration.ofMinutes(5))  // bound staleness
  .build();

Combine short TTL + invalidation on write for user-facing data.

4. Multi-pod invalidation

Redis as shared cache — one truth for all pods; evict once in Redis.
Local Caffeine + pub/sub — on write, publish “evict user:42”; all pods listen and clear local entry.
Avoid local-only cache for mutable user data unless you accept pod skew.

5. Avoid cache stampede while fixing

Mass delete without computeIfAbsent / single-flight can thundering-herd the DB — see cache stampede guide.

6. Transactional ordering (race)

After update, do not allow a concurrent reader to repopulate from stale DB: evict after commit; optional brief lock per key; or read-through only from primary DB inside same request after write.

Choose consistency tier

Data	Approach
User profile, permissions	Invalidate on write + short TTL
Product catalog	Minutes TTL + event-driven refresh OK
Financial balance	Often no cache on hot path; read DB primary

Verify

Test: update → read within 1s → matches DB on all pods.
Metrics: cache hit rate stable; support “stale UI” tickets gone.
Integration test asserts evict method called / Redis key absent after update.

Interview one-liner

“Stale cache is usually cache-aside without eviction—I confirm DB vs Redis, evict all related keys after commit, use a shared Redis or pub/sub for local caches, and keep TTL as a backstop—not the only consistency mechanism.”

Related scenarios

Read replica lag
Inconsistent logs
Cache stampede
HashMap degradation (local cache pitfalls)