Cache returns stale data after an update
Scenario
A user updates a record; the database is correct but the UI still shows the old value from Redis or an in-process Caffeine cache. Sometimes it fixes after TTL expires. You must choose a consistency model: invalidate on write, shorter TTL, versioned entries, or accept staleness with clear product rules—not accidental forever-stale keys.
After reading, you should be able to:
- Compare cache-aside, write-through, and read-through failure modes.
- Prove staleness (DB vs cache) and find missing invalidation paths.
- Implement delete-on-write, versioned keys, and cross-pod invalidation.
- Distinguish cache staleness from replica lag.
Why — caches trade freshness for speed
A cache is a copy of data stored closer to the app (memory, Redis). Unless you synchronize every write with every copy, readers can see stale values. The common cache-aside pattern (app reads cache, on miss loads DB and fills cache) fails when updates write DB but forget to evict cache—the old value lives until TTL.
Typical causes of stale reads
| Cause | What happens |
|---|---|
| No invalidation on update | DB new, cache old until TTL |
| Wrong key evicted | Updated user:1 but forgot user:1:profile |
| Per-pod local cache | Pod A invalidated; Pod B still serves stale — inconsistent across pods |
| Write to replica, read cache filled from old primary path | Overlaps replica lag |
| Race: read repopulates stale after delete | Thread order: delete cache → slow read loads old DB → writes cache |
| Long TTL | “Stale by design” longer than product allows |
| CDN / HTTP cache | Separate layer from Redis—not invalidated |
What — confirm cache is the problem
-
Compare authoritative source vs cache
— query DB by primary key;
GETsame key from Redis; values differ → stale cache. -
Log cache decision per request
—
cache_hit=true,cache_key,cache_versionin structured logs. - Reproduce path — PUT update → immediate GET; note which pod and whether hit local vs Redis.
- Map all writers — admin tool, batch job, another service also updates DB without evicting cache?
- Check TTL — if staleness always ~60s, suspect TTL-only consistency.
- Not cache — if no cache hit but data old → replica or wrong DB — replica lag guide.
Patterns at a glance
| Pattern | Write path | Stale risk |
|---|---|---|
| Cache-aside | Update DB; app must evict cache | High if invalidation missed |
| Write-through | Update cache + DB together | Lower; cache write can fail |
| Write-behind | Update cache; async DB | Loss/durability risk; rare for user data |
| Read-through | Cache loads on miss | Stale until invalidation/TTL |
How — consistency tactics
1. Cache-aside with explicit eviction (default fix)
@Transactional
void updateUser(User u) {
userRepo.save(u);
cache.evict("user:" + u.getId());
cache.evict("user:" + u.getId() + ":profile"); // related keys
}
Evict after successful commit (or listen to transaction commit event) so a rolled-back TX does not clear cache unnecessarily.
2. Versioned cache entries
// DB row has version column
cache.put("user:42:v" + row.getVersion(), dto);
// Read: load latest version from DB metadata or always evict all user:42:*
Readers with old version key miss and reload—or store single key with version inside value and compare on read.
3. TTL as safety net, not primary strategy
Caffeine.newBuilder() .expireAfterWrite(Duration.ofMinutes(5)) // bound staleness .build();
Combine short TTL + invalidation on write for user-facing data.
4. Multi-pod invalidation
- Redis as shared cache — one truth for all pods; evict once in Redis.
- Local Caffeine + pub/sub — on write, publish “evict user:42”; all pods listen and clear local entry.
- Avoid local-only cache for mutable user data unless you accept pod skew.
5. Avoid cache stampede while fixing
Mass delete without computeIfAbsent / single-flight can thundering-herd the DB — see cache stampede guide.
6. Transactional ordering (race)
After update, do not allow a concurrent reader to repopulate from stale DB: evict after commit; optional brief lock per key; or read-through only from primary DB inside same request after write.
Choose consistency tier
| Data | Approach |
|---|---|
| User profile, permissions | Invalidate on write + short TTL |
| Product catalog | Minutes TTL + event-driven refresh OK |
| Financial balance | Often no cache on hot path; read DB primary |
Verify
- Test: update → read within 1s → matches DB on all pods.
- Metrics: cache hit rate stable; support “stale UI” tickets gone.
- Integration test asserts evict method called / Redis key absent after update.
Interview one-liner
“Stale cache is usually cache-aside without eviction—I confirm DB vs Redis, evict all related keys after commit, use a shared Redis or pub/sub for local caches, and keep TTL as a backstop—not the only consistency mechanism.”
Related scenarios
- Read replica lag
- Inconsistent logs
- Cache stampede
- HashMap degradation (local cache pitfalls)