Visual catalog of the microservices patterns you reach for most often—each card states the problem, the solution,
trade-offs, and where to implement it in Spring/Java. Use the decision helper when you know the scenario but not the name;
browse by category when you need a refresher before a design review or interview.
leadarchitect
⌕
Pattern decision helper
Pick the domain closest to your problem, then the scenario—recommended patterns link to the chapter section where we explain trade-offs and Spring implementations.
Which pattern fits?
Recommendations are starting points—most production systems combine several patterns (e.g. saga + outbox + idempotent consumer).
Recommended patterns
💡 Pro Tip
Each catalog card below follows the same shape: Problem (why you need it), Solution (what the pattern does), Trade-offs (when not to use it), and Spring / Java (link to the chapter section with implementation detail).
Decomposition
Patterns for splitting monoliths, protecting domain models at boundaries, and shaping APIs per client—without big-bang rewrites.
A legacy monolith cannot be replaced in one release—business requires continuous delivery, and a rewrite risks years of stagnation while competitors ship.
Solution
Place a facade (API gateway or reverse proxy) in front of the monolith. Route individual endpoints or user journeys to new microservices incrementally. The monolith shrinks over time like a strangler fig consuming its host tree.
Trade-offs
Dual maintenance during migration; routing complexity; data may live in both systems temporarily requiring sync or read-from-new/write-to-both phases. Requires clear cutover criteria per slice.
Spring / Java
Service Design → Strangler Fig — Spring Cloud Gateway routes with predicates; feature-flagged path rules; coexist with legacy WAR via shared session or token bridge.
Integrating with ERP, mainframe, or third-party APIs forces ugly external models into your domain—CustNo strings, status codes that mean nothing in your ubiquitous language, leaky abstractions that corrupt bounded contexts.
Solution
Insert a dedicated translation layer (adapter module or microservice) that maps external DTOs to your domain types and back. Your core domain never imports vendor SDK types or legacy entity classes.
Trade-offs
Extra code and latency hop; ACL can become a god-module if every integration dumps logic there—prefer one ACL per upstream system. Testing requires contract fixtures for external shapes.
Spring / Java
Service Design → Anti-Corruption Layer — dedicated @Service adapter + MapStruct mappers; isolate vendor clients in infrastructure package; domain stays pure POJOs.
Mobile and web clients need different payload shapes, aggregation depth, and error handling—but forcing one generic API on all backends creates bloated responses, chatty clients, and coupling every UI change to core domain services.
Solution
One backend per client channel (Mobile BFF, Web BFF) that orchestrates calls to domain services and returns a screen-tailored DTO. BFF owns presentation logic; domain services stay channel-agnostic.
Trade-offs
More deployables to operate; duplicate orchestration if BFFs diverge unchecked—share libraries for common fan-out patterns. BFF is not a place for business rules that belong in domain services.
Spring / Java
Communication → BFF — separate Spring Boot app with WebClient parallel calls; optional GraphQL layer; team aligned to frontend squad per BFF.
Cross-cutting concerns—mTLS, retries, metrics, traffic splitting—implemented in every language and framework leads to inconsistent behavior, duplicated bugs, and redeploy-to-change-policy friction across dozens of teams.
Solution
Deploy a helper container (sidecar proxy) alongside each app container in the pod. All network traffic flows through the sidecar, which applies platform policies uniformly. Application code focuses on business logic.
Trade-offs
Extra CPU/memory per pod; operational complexity of control plane; debugging requires understanding proxy config. Not every team needs a mesh—libraries + gateway may suffice for small estates.
Spring / Java
Service Mesh → Sidecar pattern — Istio/Linkerd inject Envoy sidecar; app unchanged; disable duplicate retries in Resilience4j when mesh handles them; test through Service DNS not port-forward.
Communication
Edge routing, uniform east-west policy, reliable event emission, and multi-service transactions without two-phase commit.
External clients cannot call dozens of internal services directly—TLS termination, auth, rate limits, and routing would duplicate at every service; internal topology must stay hidden.
Solution
Single north-south entry point that routes to backend services, validates JWT/OAuth tokens, applies rate limits, and optionally aggregates responses. Also serves as strangler facade during monolith migration.
Trade-offs
Hot path and SPOF unless scaled and health-checked; gateway bloat if business logic creeps in—keep it thin. Wrong place for heavy aggregation (use BFF instead for client-specific shaping).
Spring / Java
Communication → API Gateway — Spring Cloud Gateway with route predicates, TokenRelay filter, Redis rate limiter; Kong/NGINX as alternative at platform layer.
East-west traffic between services lacks uniform mTLS, retries, circuit breaking, and observability—each team implements differently in Java, Go, and Python with no central traffic policy or zero-trust networking.
Solution
Mesh of sidecar proxies with centralized control plane (Istiod, Linkerd control plane) pushing config via xDS. VirtualService and DestinationRule define routing, timeouts, and load balancing without app redeploys.
Trade-offs
Operational overhead and latency overhead (~1–3 ms per hop); team must learn CRDs; overkill for <10 homogeneous Java services. Justify with polyglot scale or strict compliance needs.
Spring / Java
Service Mesh → Overview — Istio PeerAuthentication for mTLS; coordinate with Resilience4j to avoid double retries; Kiali for topology alongside app OTel spans.
Dual-write problem: updating the database and publishing to Kafka are two separate systems—crash between them loses events or creates inconsistency. “Write DB then publish” is not atomic.
Solution
Insert integration event into an outbox table in the same DB transaction as the business write. Separate relay process (polling or Debezium CDC) reads outbox and publishes to message broker, marking rows processed.
Trade-offs
Eventual delivery latency; outbox table growth without cleanup; idempotent consumers required for at-least-once relay. CDC adds Kafka Connect operational surface.
Spring / Java
Data Patterns → Outbox — @Transactional save + outbox row; Debezium connector or @Scheduled poller; Spring Modulith events or custom outbox entity.
A business process spans Order, Payment, and Inventory databases—no single ACID transaction across services. Failure mid-flow leaves partial state (order created, payment failed, inventory not reserved).
Solution
Central orchestrator (saga manager) commands each step in sequence and triggers compensating transactions on failure—cancelPayment, releaseInventory. State machine tracks saga progress explicitly.
Trade-offs
Orchestrator is coupling point and SPOF unless made durable and HA; easier to trace and debug than choreography; risk of “smart orchestrator” accumulating business logic.
Spring / Java
Data Patterns → Saga — Temporal workflow, Axon Saga, or custom state table + command dispatch; combine with outbox for reliable step commands.
Same multi-service consistency problem as orchestration—but teams resist a central coordinator owning cross-domain workflow and want services to stay fully decoupled.
Solution
Each service listens for domain events and reacts: Order publishes OrderPlaced → Payment listens and publishes PaymentCompleted or PaymentFailed → compensating events flow backward without central brain.
Trade-offs
Harder to visualize flow; cyclic dependencies if poorly designed; debugging requires distributed tracing and event log correlation. Simple processes only—complex sagas favor orchestration.
Spring / Java
Data Patterns → Saga — Spring Kafka @KafkaListener handlers; idempotent consumers; publish compensating events from failure handlers; trace context in headers.
Data
Autonomous data ownership, specialized read/write paths, and cross-service queries without shared tables or two-phase commit.
Shared database couples services—schema migrations require coordinated releases; one team’s slow query locks tables others depend on; no true service autonomy or polyglot persistence.
Solution
Each microservice owns its private data store (schema, database instance, or at minimum private schema with enforced access). Other services access data only via API or events—never direct SQL joins across boundaries.
Trade-offs
Cross-service queries become composition or CQRS projections; reporting needs data warehouse ETL; temporary duplication of reference data. Shared read replicas are still coupling—avoid.
Spring / Java
Data Patterns → Database per Service — one datasource per service in Spring Boot; Flyway/Liquibase per repo; no foreign keys to other services’ tables.
Read and write workloads fight over the same model—complex reporting queries slow transactional writes; read scaling needs denormalized views the write model cannot efficiently serve.
Solution
Separate command model (writes, business rules, aggregates) from query model (read-optimized DTOs, Elasticsearch, materialized views). Updates propagate to read side via events or synchronous projection.
Trade-offs
Eventual consistency on read side; more moving parts; overkill for simple CRUD. Start with logical separation in one DB before splitting stores.
Spring / Java
Data Patterns → CQRS — command handlers + separate JPA entities or Mongo read collections; Kafka consumers build projections; Axon Framework for full CQRS/ES stack.
Current-state-only storage loses history—audit requirements, debugging “how did we get here?”, and temporal queries (“account balance on March 1”) need immutable record of changes.
Solution
Persist domain events as the source of truth; current state is derived by replaying events (with snapshots for performance). Event store append-only log replaces update-in-place rows for aggregate roots.
Trade-offs
High complexity; event schema evolution (upcasting); storage growth; not every aggregate needs ES—use selectively. Pairs naturally with CQRS.
Spring / Java
Data Patterns → Event Sourcing — Axon Server/EventStore, Kafka log as store, or custom event table; snapshot every N events; never delete events.
A UI screen needs data from Order, Customer, and Product services—clients cannot join across databases; exposing internal IDs forces chatty multi-call sequences from the browser.
Solution
Composer service (often the BFF or API gateway) calls multiple backends in parallel, merges results into one response DTO. Keeps databases decoupled while simplifying client.
Trade-offs
Latency = slowest dependency; failure in one backend affects whole response unless partial results; deep composition graphs become distributed monoliths in disguise—limit depth to 2–3 calls.
Spring / Java
Data Patterns → API Composition — WebClient Mono.zip parallel fetch; combine with timeout/bulkhead per dependency; consider CQRS read model instead for hot paths.
A failing downstream service causes callers to hang, retry endlessly, and exhaust thread pools—the caller becomes the outage (cascade failure).
Solution
Track failure rate in sliding window. When threshold exceeded, breaker opens—calls fail immediately without hitting sick dependency. After cooldown, half-open probes test recovery before closing.
Trade-offs
False opens during blips; callers must handle open state (fallback or error); tuning thresholds per dependency takes iteration. Do not nest breakers without understanding interaction.
Transient network blips and occasional 503s cause user-visible failures that would succeed on second attempt—but blind retries on non-idempotent POSTs double-charge customers.
Solution
Retry failed calls with exponential backoff and random jitter, limited max attempts, only on idempotent operations or with idempotency keys. Retry safe HTTP status codes (502, 503, 504)—not 400.
Trade-offs
Retry storms amplify load on recovering service—combine with breaker and jitter; increases tail latency; must propagate deadline so total retries fit SLO budget.
Spring / Java
Resilience → Retry — Resilience4j @Retry or Spring Retry; Idempotency-Key header on POST; maxAttempts=3 with exponentialBackoff.
One slow dependency (Payment) consumes all HTTP client threads—Catalog and Inventory calls starve even though they are healthy. Shared pool = shared fate.
Solution
Partition resources into isolated pools (thread bulkhead) or semaphores per dependency. Payment gets max 20 concurrent calls; Catalog gets its own 50—failure contained like ship bulkheads.
Trade-offs
Under-utilization if pools sized wrong; queue rejections need handling; more config than single pool. Size from load test and SLO, not guesswork.
Spring / Java
Resilience → Bulkhead — Resilience4j @Bulkhead THREADPOOL or SEMAPHORE; separate WebClient instances per downstream; monitor rejected executions.
Without upper bound on wait time, threads block forever on hung sockets—Tomcat thread pool exhausts, health checks fail, Kubernetes kills pods, amplifying outage.
Solution
Set connect + read timeouts on every outbound call sized from end-to-end SLO budget divided by hop count. Propagate deadlines (gRPC context, HTTP timeout headers) so downstream knows remaining budget.
Trade-offs
Too aggressive timeouts cause false failures under load; too lenient wastes threads. Timeouts are not a substitute for fixing slow dependencies—pair with tracing.
Spring / Java
Resilience → Timeout — WebClient responseTimeout; Resilience4j TimeLimiter; Feign connectTimeout/readTimeout; never use default infinite wait.
Traffic spikes, abusive clients, or runaway batch jobs overwhelm fragile backends—every request accepted until the whole system collapses.
Solution
Cap requests per time window per client, API key, or IP using token bucket or sliding window. Return 429 with Retry-After; protect core paths (checkout) over analytics.
Trade-offs
Legitimate bursts may hit limits—tiered quotas for partners; distributed rate limiting needs Redis or gateway cluster sync; wrong limits frustrate users.
Spring / Java
Resilience → Rate Limiting — Spring Cloud Gateway Redis rate limiter; Resilience4j RateLimiter; Bucket4j; enforce at gateway first, app second for defense in depth.
When Recommendations service is down, entire product page errors—even though catalog and cart data could still enable a purchase with degraded experience.
Solution
Define alternate code path when breaker open or timeout: return cached catalog, empty recommendations list, static defaults, or queue async retry. User sees degraded UX, not 500.
Trade-offs
Stale or empty data may confuse users; fallbacks must be tested in CI—not dead code; never fallback write operations silently (payment!). Document degraded mode in UI.
Spring / Java
Resilience → Fallbacks — Resilience4j fallbackMethod; Caffeine cache with async refresh; Optional.empty() for non-critical reads; feature flag to disable optional sections.
Observability
Make distributed behavior debuggable—tie requests together, know when instances are ready, centralize logs, and follow latency across service hops.
Support reports “order 8f2a failed”—grep by timestamp across twelve services fails due to clock skew, missing context, and no shared identifier linking logs, traces, and Kafka messages for one user journey.
Solution
Generate or accept X-Request-Id at gateway; propagate W3C traceparent on every hop; populate SLF4J MDC with trace_id and request_id in structured JSON logs and message headers.
Trade-offs
Client-supplied IDs need validation; header stripping at proxies breaks chains; high-cardinality request IDs in metric labels crash Prometheus—keep IDs in logs/traces only.
Spring / Java
Observability → Correlation IDs — WebFilter sets MDC; Micrometer Tracing bridge; Kafka record headers; return request ID in API response for support.
Orchestrator and load balancers need to know if an instance can receive traffic vs should restart—but a single “/health” that checks PostgreSQL causes liveness kill loops when DB blips briefly.
Solution
Expose separate endpoints: liveness (process alive—JVM up), readiness (can serve traffic—DB/Kafka connected), optional startup (slow-init apps). Kubernetes probes map to each; failed readiness removes from Service endpoints without restart.
Trade-offs
Over-eager readiness checks slow scale-up; dependency checks in liveness are dangerous; health endpoints must not require auth for kubelet but must not leak secrets.
Spring / Java
Deployment → Health probes — Spring Boot Actuator /actuator/health/liveness and /readiness; custom HealthIndicator for Kafka; do not log probes at INFO.
Microservices on dozens of pods write logs to ephemeral container stdout—SSH and kubectl logs do not scale for incident search across fleet or retention beyond pod lifetime.
Solution
Ship logs to centralized store: DaemonSet agent (Promtail, Fluent Bit) tails container logs, enriches with K8s metadata, forwards to Loki or Elasticsearch. Apps emit structured JSON to stdout; platform handles transport.
Trade-offs
Volume and cost at scale—index selectively; PII in logs is compliance risk; agent failure on node loses visibility until restored. ELK powerful but expensive; Loki cheaper for label-first queries.
Checkout p99 is 2 seconds—metrics show slow Order Service but not whether Payment, Inventory, or DB caused it. Logs in each service lack causal ordering across async boundaries.
Solution
Trace each request as tree of spans with shared trace_id; propagate context on HTTP and Kafka; visualize waterfall in Jaeger/Tempo. Sample in prod (head + tail) to control cost; 100% in staging.
Trade-offs
Instrumentation effort; storage cost without sampling; broken traces when one service omits propagation; duplicate spans if mesh and app both instrument HTTP.
Spring / Java
Observability → Trace debugging — OpenTelemetry Java agent or Micrometer Tracing; @WithSpan on business ops; OTLP export to Collector → Tempo/Jaeger.
Deployment
Ship frequently with controlled blast radius—separate deploying code from releasing features, and keep cluster state auditable in Git.
Rolling updates expose users to mixed old/new versions during deploy—schema incompatibility or subtle bugs affect partial traffic; rollback means another slow roll-forward.
Solution
Run two full environments (blue=live, green=idle). Deploy new version to green, smoke test, switch load balancer/Service selector to green instantly. Rollback = switch back to blue in seconds.
Trade-offs
Double infrastructure cost during cutover; database migrations must be backward-compatible across both versions; stateful services harder than stateless. Not every release needs blue-green.
Spring / Java
Deployment → Blue-green — two K8s Deployments + Service label flip; Spring profiles per environment; feature flags hide new code paths until switch.
Even with tests, production traffic reveals bugs blue-green smoke tests miss—you need real user exposure limited to a small slice with automatic rollback on SLO regression.
Solution
Route 1–5% traffic to new version; monitor error rate, latency, business metrics; progressively increase weight (5→25→50→100%) or auto-rollback. Istio VirtualService weights or Argo Rollouts analysis.
Trade-offs
Requires metric gates and tooling; users on canary get different behavior—ensure sticky sessions not required or handle version skew; longer deploy cycle than big-bang flip.
Deploying code and releasing features are coupled—half-built checkout redesign must ship dark or block all deploys; emergency kill switch requires redeploy.
Solution
Wrap new behavior behind runtime toggles (LaunchDarkly, Unleash, or config service). Deploy code with flag off; enable per tenant/region/percentage; instant off during incident without rollback.
Trade-offs
Flag debt accumulates—sunset flags after full rollout; test both paths; avoid flags for security boundaries; too many flags confuse ownership.
Spring / Java
Deployment → Feature flags — Unleash Java SDK or Spring Cloud Config boolean; if (flags.isEnabled("new-checkout")); separate ops flags from product experiments.
kubectl set image changes drift from documented state—no PR review, no audit trail, disaster recovery means guessing what prod actually runs.
Solution
Git repo holds desired cluster state (Helm/Kustomize manifests). Controller (Argo CD, Flux) continuously reconciles cluster to Git. All prod changes via merged PR; cluster self-heals manual drift.
Trade-offs
Initial setup and secret management (External Secrets); reconcile delay seconds to minutes; teams must resist breaking glass without follow-up commit. Monorepo vs repo-per-service org decision.
Spring / Java
Deployment → GitOps — image tag bump in values.yaml triggers Argo sync; CI pipeline updates Git not cluster directly; Helm chart per Spring Boot service.
Anti-patterns — what to avoid
Distributed systems fail in predictable ways when microservices are adopted without changing data and ops assumptions.
Distributed Monolith
avoid
Many services that must deploy together—sync chains, shared DB, no boundaries.
Name one anti-pattern you have seen in the wild and how you fixed it—interviewers prefer war stories over defining “microservices.”
How patterns stack in a typical checkout flow
Production systems layer patterns—this map shows a common combination, not a mandatory architecture.
flowchart TB
subgraph edge [Edge]
GW[API Gateway + JWT]
BFF[BFF optional]
end
subgraph order [Order Service]
AGG[Aggregate]
OUT[Transactional Outbox]
end
subgraph async [Async]
K[Kafka]
SAG[Saga consumers]
end
subgraph platform [Platform]
M[Mesh mTLS]
OBS[Traces + SLO]
DEP[Canary deploy]
end
GW --> BFF
BFF --> order
OUT --> K
K --> SAG
order --> M
SAG --> M