API works locally but fails in production

Scenario

Developers verify the endpoint on localhost—200 OK. In production you see 401, 403, 500, timeouts, or silent wrong data. Reproducing locally fails. The gap is almost never “Java is different”; it is environment, configuration, network, data, and dependencies that differ between laptop and prod.

After reading, you should be able to:

Why — local is not a small copy of prod

Local development optimizes for speed: embedded DB, mocked payments, admin credentials, HTTP not HTTPS, no load balancer. Production adds real policies—secrets, firewalls, WAF, IAM roles, production data shape, and multi-hop networking. The same code path can succeed locally and fail in prod because inputs to the path changed, not because the compiler behaved differently.

Top mismatch categories

CategoryLocal typicalProduction typical
Configurationapplication-dev.yml, defaultsEnv vars, ConfigMaps, secrets, feature flags
Networkinglocalhost, no TLSDNS, mTLS, security groups, egress proxy
AuthDisabled or test JWTOAuth, IAM role, API keys, IP allowlist
DataEmpty or seed dataVolume, edge cases, time zones, charset
DependenciesMocks / Docker only you runReal SQS, Kafka ACLs, partner sandbox vs prod
RuntimeIDE classpath, more memoryContainer image, JDK flags, read-only filesystem
ScaleOne user, one podLoad balancer, multiple replicas, races

Symptom ≠ root cause. A 500 in prod may be a downstream 403 masked by a generic catch block. Always read prod logs and the actual exception before changing code.

What — investigate in this order

  1. Capture the failing prod fact — HTTP status, response body, request id, timestamp, pod name, trace id. Without this you are guessing.
  2. Reproduce with the same client call
    curl -sv -X POST 'https://api.prod.example/orders' \
      -H 'Authorization: Bearer …' \
      -H 'Content-Type: application/json' \
      -d @payload.json
    Compare to local URL, headers, and body byte-for-byte (including trailing slashes, query order).
  3. Find the log line in prod for that request id — stack trace, “connection refused,” “Access Denied,” “SSL handshake failure,” validation error.
  4. Config diff (most common fix)
    • Active Spring profile: SPRING_PROFILES_ACTIVE
    • Missing env var (empty string vs unset)
    • Wrong JDBC URL (host, SSL mode, schema)
    • Feature flag off in prod
    • Clock/skew: token “not yet valid”
  5. Network path from pod
    kubectl exec -it <pod> -- curl -sv https://partner-api/health
    kubectl exec -it <pod> -- nslookup db.prod.internal
    Fails in pod but works on laptop → SG, NACL, egress, private link, DNS.
  6. TLS / trust — corporate CA not in container truststore; cert hostname mismatch; TLS 1.2 required by partner.
  7. Auth & IAM — pod service account lacks S3/SQS permission; OAuth client secret rotated; prod API key scope.
  8. Data-dependent logic — null FK in prod, duplicate unique key, legacy row breaks new validation; timezone “today” differs.
  9. Container vs IDE differences — case-sensitive paths on Linux image; file not in JAR; profile-specific bean missing in prod build.
  10. Only fails under load? — then leave “works locally” and use pool, GC, or 502 with healthy pods guides.

Quick decision table

Prod symptomLikely first check
401 / 403Token, IAM, WAF, IP allowlist
Connection timeoutDNS, SG, wrong host/port, dependency down
SSL errorsTruststore, cert chain, SNI hostname
500 + NPE in logsProd-only null data; fix validation or migration
Works for you, not usersCDN, geo, A/B flag, canary pod version skew
IntermittentOne bad replica, sticky session, short stalls

What to paste in the incident ticket

How — fix and prevent the local/prod gap

Immediate fix

  1. Correct config/secret in prod (with change control).
  2. Open network path or rotate credentials if auth/TLS.
  3. Data patch or feature flag off until code handles prod edge case.
  4. Roll back deploy if regression started at deploy time.

Long-term parity

CI smoke test (sketch)

# After deploy to staging
./scripts/smoke.sh --base-url $STAGING_URL --token $CI_TOKEN
# Fails pipeline before prod promotion

Interview one-liner

“I capture the exact prod status and logs, replay the same request with curl, diff config and network from inside the pod, then check auth, TLS, and prod-only data—before I assume it’s a code bug. Staging parity and smoke tests prevent repeat incidents.”

Related scenarios