Same input, different outcome in the logs

Scenario

Support replays a user request: same JSON body, sometimes success, sometimes failure. Log lines for “the same call” disagree—different balances, different error messages, or missing steps. The system looks flaky. You must decide if logs are misleading, inputs are not actually identical, or the service is non-deterministic under concurrency.

After reading, you should be able to:

Separate logging gaps from real behavioral non-determinism.
Group logs by correlation id and compare winning vs losing traces.
Identify races, replica lag, flags, retries, and multi-version traffic.
Fix with idempotency, structured context, and trace-backed debugging.

Why — “same input” rarely means same system state

Logs feel inconsistent when you compare lines without shared context (request id, pod, code version, flag values) or when the application truly branches on timing and shared mutable state. Production adds replicas, caches, read replicas, retries, and feature flags—local single-threaded runs hide that.

Two different problems

Type	Meaning	Example
Observability gap	Behavior may be consistent; logs incomplete or not correlatable	No trace id; async log after response
Real non-determinism	Same visible input, different results	Race, stale cache, retry double-charge

Common causes of different outcomes

Hidden input differences — tenant header, auth principal, Idempotency-Key, clock, geo, canary routing.
Feature flags / A-B — different code paths per request.
Multiple app versions — rolling deploy; old and new pods behind one LB.
In-memory state per pod — local cache not shared; hit pod A vs pod B.
Read replica lag — write on primary, read on stale replica → “sometimes missing row.”
Cache inconsistency — TTL, invalidation race, thundering herd reload.
Concurrent updates — lost update without versioning — overlaps with data races.
Retries — client or gateway retry hits partial success; non-idempotent side effects.
External dependency variance — partner returns 429 then 200.
Async ordering — logs print out of order; success log before failure in another thread.

Do not trust grep alone. Searching by user id across all services without trace_id mixes unrelated requests and creates false “inconsistency.”

What — prove whether behavior or logs differ

Pick one failing and one succeeding example — same business id, close timestamps. Export full log JSON for both (not screenshots).
Align on correlation fields — trace_id, request_id, span_id, pod, deployment_revision, feature_flags. If missing → fix logging first; investigation is blocked.

Diff the effective request

# Fields that must match for "same input"
method, path, normalized body hash, tenantId,
Authorization subject, Idempotency-Key, X-Request-Id

Mismatch here explains “inconsistent” without a code bug.

Compare decision points in logs — branch taken: cache HIT/MISS, DB source (primary vs replica), downstream status codes, retry count.
Check deploy and traffic split — two build ids during incident window? Canary 5% on new logic?
Database view — row version / updated_at; two concurrent updates? Use transaction history or audit table if available.
Reproduce under controlled concurrency — parallel requests with identical payload; if failure rate > 0 → race or idempotency bug.
Distributed trace — one trace per attempt; see duplicate spans from retries — distributed trace guide.

Log patterns that hint root cause

Pattern in logs	Likely cause
Different `pod` / cache HIT mix	Per-pod cache; need distributed cache or sticky invalidation
Read from `replica` after write	Replication lag; read-your-writes routing
`retry=2` on failures only	Non-idempotent retry; duplicate side effect
Flag `newCheckout=true` only sometimes	Feature flag targeting
Interleaved thread names on one request id	MDC not propagated to async threads
Success then error same id	Two different HTTP attempts sharing business id in grep

MDC propagation (async gap)

// Bad: child thread loses trace context
executor.submit(() -> log.info("charged"));  // no trace_id in log

// Better: wrap tasks (Spring @Async with TaskDecorator, Micrometer context)

How — make behavior and logs deterministic enough to debug

Fix behavioral inconsistency

Cause	Fix
Race on shared state	Atomics, DB transaction, optimistic lock — races guide
Non-idempotent POST	Idempotency key + unique constraint
Stale replica read	Route critical reads to primary or session stickiness after write
Per-pod cache	Redis/Caffeine cluster-wide; TTL + event invalidation
Deploy skew	Faster rollouts, readiness until version uniform, feature flags

Fix log inconsistency (observability)

Structured JSON logs: trace_id, span_id, user_id, tenant_id, pod, build_version.
Propagate MDC to @Async, CompletableFuture, reactive chains.
Log decisions: cache hit/miss, which DB datasource, flag snapshot per request.
One log line per outbound call with latency and status (avoid scattered debug).
Never reuse business ids as sole correlation key across retries—use per-attempt request id.

Structured log example

{
  "msg": "order_created",
  "trace_id": "7f3a…",
  "request_id": "req_9b2…",
  "pod": "checkout-7d4f9",
  "build": "1.42.0",
  "tenant_id": "t_12",
  "cache": "MISS",
  "db": "primary",
  "order_id": "ord_88",
  "outcome": "success"
}

Verify

Replay 100 parallel identical requests in staging: 0 unexpected failures.
Support query: one trace_id shows full story across services.
Post-deploy: no mixed build ids during steady state.

Interview one-liner

“I first verify the requests are truly identical—headers, flags, pod, version—then group logs by trace id. If behavior still differs, I look for races, replica lag, caches, and non-idempotent retries; I fix the root cause and add structured decision logging so the next incident is obvious.”

Why — “same input” rarely means same system state

Two different problems

Common causes of different outcomes

What — prove whether behavior or logs differ

Log patterns that hint root cause

MDC propagation (async gap)

How — make behavior and logs deterministic enough to debug

Fix behavioral inconsistency

Fix log inconsistency (observability)

Structured log example

Verify

Interview one-liner

Related scenarios