Service Communication

Once services are bounded, they must talk—synchronously for queries and commands that need an immediate answer, asynchronously when decoupling and resilience matter more than milliseconds. This chapter covers protocols, discovery, gateways, and messaging with the Spring/Java implementations teams ship in production.

developer lead architect

How services talk — the decision space

Communication style is an architecture choice with long-lived consequences for availability, consistency, and team autonomy—not a framework checkbox.

In a monolith, a method call is cheap and failure is binary: the process is up or down. In microservices, every interaction crosses the network, passes through load balancers and possibly sidecars, and can fail partially while other paths succeed. Your communication patterns determine whether a slow Inventory service stalls Checkout entirely or only delays a background notification.

Two families dominate: synchronous request/response (HTTP REST, gRPC, GraphQL) and asynchronous messaging (Kafka, RabbitMQ, SNS/SQS). Most production platforms use both deliberately— sync on the critical user path where the UI needs an answer now; async for fan-out, analytics, and workflows that tolerate seconds of delay.

flowchart LR
  subgraph sync [Synchronous path]
    C1[Client] --> GW[Gateway]
    GW --> S1[Order Service]
    S1 --> S2[Payment Service]
  end
  subgraph async [Asynchronous path]
    S1 --> K[(Kafka)]
    K --> S3[Analytics]
    K --> S4[Email]
  end
⚖️ Trade-off

More sync calls mean simpler mental models and strong read-your-writes UX; more async means higher availability under partial failure but harder debugging and eventual consistency everywhere downstream.

REST over HTTP/HTTPS — still the default

JSON over HTTP remains the lingua franca for public APIs, BFFs, and many internal calls because it is debuggable with curl, visible in proxies, and understood by every language.

REST fits resource-oriented domains: you model nouns, use standard verbs, and leverage HTTP semantics (caching, status codes, content negotiation). For internal microservices, teams often standardize on JSON + OpenAPI (see contract-first design) and enforce timeouts and retries at the client (covered in Resilience Patterns).

Spring client options

ClientCharacteristicsWhen to use
RestTemplate Blocking, mature, maintenance mode. Legacy codebases; avoid for new services.
WebClient Reactive/non-blocking, Spring 5+ default. New code; integrates with Project Reactor; configure connection pools.
OpenFeign Declarative interfaces + Spring Cloud integration. Many similar CRUD calls; pair with LoadBalancer + resilience4j.
WebClient + timeout
@Bean
WebClient inventoryWebClient(WebClient.Builder builder) {
    HttpClient http = HttpClient.create()
        .responseTimeout(Duration.ofSeconds(2));
    return builder
        .clientConnector(new ReactorClientHttpConnector(http))
        .baseUrl("http://inventory-service")
        .build();
}

public Mono<StockLevel> fetchStock(String sku) {
    return inventoryWebClient.get()
        .uri("/api/v1/stock/{sku}", sku)
        .retrieve()
        .bodyToMono(StockLevel.class);
}
🔬 Under the Hood

Each REST call typically allocates a socket, negotiates TLS, serializes JSON on the CPU, and waits for TCP ACKs. At hundreds of RPS per instance, connection pool sizing and keep-alive matter as much as algorithm choice.

gRPC — typed, fast, HTTP/2-native

gRPC uses Protocol Buffers for schema-first contracts and HTTP/2 for multiplexing, header compression, and optional bidirectional streaming.

Where REST sends self-describing JSON text, gRPC sends compact binary frames with strongly typed messages generated from .proto files. That yields smaller payloads and faster parsing—often 5–10× throughput improvements on high-volume internal paths, though benchmarks depend on message size and hardware. Unary calls feel like RPC; streaming suits live feeds, log tailing, and large result sets without pagination hacks.

REST vs gRPC (internal services)

AspectREST + JSONgRPC + Protobuf
ContractOpenAPI, optional codegenProtobuf required; strict evolution rules
Browser / public APINativeNeeds grpc-web proxy
StreamingSSE, chunked HTTP (awkward)First-class server/client/bidi streams
Debuggingcurl, browser devtoolsgrpcurl, wireshark with proto defs
Load balancersL7 HTTP familiarNeed L7 aware of HTTP/2 streams or client-side LB
order.proto excerpt
syntax = "proto3";
package orders.v1;

service OrderService {
  rpc GetOrder(GetOrderRequest) returns (OrderResponse);
  rpc StreamOrderEvents(StreamRequest) returns (stream OrderEvent);
}

message GetOrderRequest {
  string order_id = 1;
}

Spring Boot 3 integrates gRPC via grpc-spring-boot-starter: generated stubs, server interceptors for auth, and metadata propagation for tracing. Pair with service mesh mTLS for zero-trust internal calls.

💡 Pro Tip

Use gRPC for hot internal east-west traffic (payments ↔ ledger); keep REST at the north-south edge for mobile/web clients unless you standardize on grpc-web everywhere.

GraphQL — flexible queries, federation complexity

GraphQL lets clients request exactly the fields they need in one round trip—powerful for varied UIs, challenging when ownership is split across many microservices.

In a monolith or single BFF, GraphQL resolvers map cleanly to database queries. In microservices, each field might call another service—classic N+1 problem becomes N+1 HTTP/gRPC calls unless you batch with DataLoader or pre-join via a dedicated read model (CQRS). Apollo Federation (and similar) lets each service expose a subgraph; a gateway stitches the supergraph, but operational ownership of schema changes becomes a platform concern.

GraphQL fits when:

  • Multiple clients (web, iOS, Android) need different field shapes from the same backend.
  • You expose a BFF or federated graph with a team maintaining schema governance.
  • Read-heavy workloads tolerate resolver-level caching.

Avoid as default internal protocol when: simple CRUD between services, strict latency SLOs on deep graphs, or immature observability for resolver fan-out.

⚠️ Pitfall

Exposing GraphQL directly on every microservice without a gateway—clients must know service topology and you lose centralized rate limiting and auth policy enforcement.

Choosing synchronous vs asynchronous

Sync optimizes for immediate feedback and simpler failure semantics; async optimizes for decoupling, buffering, and surviving downstream outages.

FactorPrefer syncPrefer async
User waitingYes — checkout total, auth gateNo — send receipt email later
ConsistencyNeed answer now (with caveats)Eventual OK; saga compensations
Fan-outOne dependencyMany subscribers (OrderPlaced → 5 services)
Peak loadSpike hits callee immediatelyQueue absorbs spike; consumer scales
FailureCallee down → caller fails fastMessages retained; retry when healthy
🎯 Interview Tip

For “design food delivery,” place order sync through Payment authorization; driver assignment and ETA updates async via events; explain what the user sees if the async pipeline lags (optimistic UI vs polling).

Service discovery — finding instances in a dynamic world

Hardcoded IPs and static host lists break the moment Kubernetes reschedules a pod or autoscaling adds a replica. Discovery tells clients where healthy instances live right now.

The problem: Order Service might run as 10.0.14.7:8080 at noon and 10.0.22.19:8080 after a deploy. DNS names help, but something must register and health-check instances and expose metadata (zone, version, weight).

Client-side discovery

The client (or a library in-process) queries a registry, caches instance lists, and load-balances locally. Netflix Eureka and HashiCorp Consul popularized this in JVM ecosystems. Spring Cloud Netflix Eureka: @EnableEurekaServer on registry; services register with spring-cloud-starter-netflix-eureka-client; callers use Spring Cloud LoadBalancer (Ribbon is deprecated) with @LoadBalanced RestTemplate or Feign.

application.yml
spring:
  application:
    name: inventory-service
eureka:
  client:
    service-url:
      defaultZone: http://eureka:8761/eureka/
  instance:
    prefer-ip-address: true

Server-side discovery

The client talks to a stable virtual IP or DNS name; a load balancer or platform router consults the registry and forwards. AWS ALB/NLB, Kubernetes Service objects, and cloud mesh ingress work this way—the client stays dumb; infrastructure picks the pod.

DNS-based discovery (Kubernetes)

CoreDNS resolves inventory-service.default.svc.cluster.local to the ClusterIP Service, which kube-proxy or eBPF dataplane load-balances to endpoints. No Eureka required on K8s-native estates if you accept platform coupling.

sequenceDiagram
  participant Client
  participant LB as Load Balancer / K8s Service
  participant Reg as Registry (optional)
  participant Svc as Inventory Pod
  Client->>LB: GET /stock (inventory-service)
  LB->>Reg: lookup healthy instances
  Reg-->>LB: 10.0.1.4, 10.0.1.9
  LB->>Svc: forward request
  Svc-->>Client: 200 OK
📦 Real World

Netflix built Eureka for AWS EC2 volatility; many teams on Kubernetes dropped Eureka in favor of K8s Services + mesh, keeping Eureka only for VM/bare-metal mixed estates.

Load balancing — spreading work across instances

Without balancing, the first instance in a list absorbs traffic until it melts while siblings idle—a failure mode even junior on-call runbooks mention.

Client-side balancing

Spring Cloud LoadBalancer chooses an instance from the registry per request. Algorithms include round-robin, random, and weighted response time (favor faster nodes). Configure per service via LoadBalancerClientFactory custom beans when defaults are naive for your latency profile.

Server-side balancing

Nginx, HAProxy, AWS ALB, and Kubernetes Services terminate client connections and pick backends. TLS often ends at the LB; HTTP/2 and gRPC require L7 awareness so streams stick to one backend correctly.

Sticky sessions — anti-pattern in microservices

Session affinity (cookie → same pod) fights autoscaling and rolling deploys: draining a node breaks users mid-session. Prefer stateless services with JWT or server-side session stores in Redis if you must share session state. Sticky sessions are a legacy monolith habit, not a microservices default.

🚫 Anti-Pattern

Enabling sticky sessions to “fix” in-memory shopping carts instead of storing cart state in Redis or the client—deployments become painful and failover fails.

API gateway — single north-south entry point

A gateway terminates TLS, authenticates callers, routes to internal services, and applies cross-cutting policies so domain services stay focused on business logic.

Typical responsibilities:

  • Routing and path rewriting (/api/orders/** → Order Service)
  • Authentication and JWT validation; optional token relay downstream
  • Rate limiting and WAF integration
  • Request/response transformation (header injection, legacy XML → JSON)
  • Response caching for idempotent GETs
  • Circuit breaking at the edge (often duplicated with service-level resilience—coordinate policies)

Spring Cloud Gateway

Reactive (WebFlux) gateway using RouteLocator, predicates (Path, Header, Cookie), and filters (RewritePath, AddRequestHeader, RequestRateLimiter). GlobalFilter beans run on every route—ideal for correlation IDs and auth. Custom filters implement org-specific audit logging.

Java config
@Bean
RouteLocator routes(RouteLocatorBuilder builder) {
    return builder.routes()
        .route("orders", r -> r.path("/api/v1/orders/**")
            .filters(f -> f
                .stripPrefix(0)
                .requestRateLimiter(c -> c
                    .setRateLimiter(redisRateLimiter())
                    .setKeyResolver(exchange -> exchange.getRequest().getRemoteAddress()))
                .circuitBreaker(c -> c.setName("orderCb")))
            .uri("lb://order-service"))
        .build();
}

Alternatives

  • Kong / APISIX — plugin ecosystem, declarative config, often paired with K8s Ingress.
  • AWS API Gateway — managed, Lambda/VPC links; good for serverless-heavy estates.
  • Nginx / Envoy — high performance; more manual config unless wrapped by mesh/Gateway API.
🔬 Under the Hood

Gateway adds one network hop and becomes a shared failure point—run it HA (multiple replicas), health-check aggressively, and keep business rules out of 500-line filter chains that nobody tests.

Backend for Frontend (BFF)

Instead of one generic API for every client, a BFF shapes responses per experience—mobile gets minimal payloads; admin web gets rich aggregates.

A single “god gateway” accumulating every client’s special case becomes unmaintainable. The BFF pattern assigns a small backend (often a Spring Boot app) per client type: mobile-bff, partner-bff. Each BFF orchestrates calls to domain microservices, handles pagination and field selection, and owns release cadence with its UI team.

BFFs are allowed to be chatty internally; they should not leak orchestration into domain services via “special query params for the app team.” Domain services stay generic; BFFs absorb presentation-driven change.

📦 Real World

SoundCloud and Spotify documented BFF-style layers to tame mobile bandwidth limits while keeping core catalog services client-agnostic.

Asynchronous communication fundamentals

Messaging decouples producers from consumers in time and space—the producer does not wait for subscribers to finish, and new subscribers can appear without redeploying the sender.

Message vs event

A command message tells a specific consumer to do something: “ReserveInventory” with a reply address—point-to-point, often one queue. An event announces a fact: “OrderPlaced”—zero or many subscribers on a topic. Commands couple names and intent; events couple less if schemas evolve carefully.

Point-to-point vs publish-subscribe

  • Queue — one consumer group processes each message once (competing consumers scale throughput).
  • Topic — each subscriber group gets a copy (fan-out for analytics, email, search indexing).

Why async helps resilience: if Email Service is down, Kafka retains OrderPlaced events until consumers recover (within retention limits). Producers stay up; users see “order confirmed” while email arrives minutes later—design UX accordingly.

⚖️ Trade-off

You trade immediate consistency for complexity: duplicate messages require idempotent consumers; ordering guarantees vary by broker partition design.

Apache Kafka — the distributed commit log

Kafka stores records in append-only partitions, replicates for durability, and lets consumer groups track offsets—making it the default event backbone at scale.

Core concepts

  • Topic — named stream split into partitions for parallelism.
  • Partition — ordered log; key hash routes related events to same partition for ordering per key.
  • Producer — sends records with acks config (0, 1, all) trading speed vs durability.
  • Consumer group — each partition assigned to one consumer in group; scale consumers ≤ partitions.
  • Offset — position in log; committed to __consumer_offsets or external store.
  • Log compaction — retains latest record per key—useful for changelog topics (KTables).

Delivery semantics

At-most-once — commit offset before processing; may lose messages on crash. At-least-once — process then commit; duplicates possible without idempotency. Exactly-once (EOS) — transactional producer + idempotent producer + read-process-write with Kafka Streams or careful Spring Kafka config; still requires idempotent side effects in external databases.

Spring Kafka producer
@Service
public class OrderEventPublisher {
    private final KafkaTemplate<String, OrderPlacedEvent> kafka;

    public void publish(OrderPlacedEvent event) {
        kafka.send("order.events.v1", event.orderId(), event)
            .whenComplete((result, ex) -> {
                if (ex != null) log.error("publish failed", ex);
            });
    }
}
⚠️ Pitfall

Creating more consumers than partitions—they sit idle. Plan partition count for peak parallel throughput upfront; increasing partitions later can break key ordering assumptions.

RabbitMQ — flexible routing, classic message broker

RabbitMQ excels at work queues, routing flexibility, and per-message TTL—often chosen when Kafka’s log retention model is heavier than needed.

Producers publish to exchanges; bindings route to queues consumers read from. Exchange types: direct (routing key match), topic (pattern keys), fanout (broadcast), headers (metadata match). Dead-letter exchanges (DLX) capture poison messages after N failures or TTL expiry—essential for ops visibility.

Use Rabbit when you need classic competing consumers on a queue, per-message ack, and complex routing without retaining infinite history. Use Kafka when multiple teams replay history, stream processing, or high-throughput event sourcing backbones dominate.

Spring AMQP listener
@RabbitListener(queues = "inventory.reserve")
public void onReserve(ReserveInventoryCommand cmd) {
    inventoryService.reserve(cmd.orderId(), cmd.lines());
}

Spring Cloud Stream — broker abstraction

Spring Cloud Stream wraps Kafka and RabbitMQ behind binders so application code focuses on functions and channels, not broker-specific APIs—at the cost of lowest-common-denominator features.

Modern apps use the functional model: define Consumer<T>, Function<T,R>, or Supplier<T> beans; binders map them to destinations via spring.cloud.stream.bindings. Legacy @EnableBinding is deprecated—migrate to functions for Spring Cloud 2020+.

Functional consumer + config
@Bean
Consumer<OrderPlacedEvent> orderPlacedHandler(InventoryService inventory) {
    return event -> inventory.reserveForOrder(event);
}
application.yml
spring:
  cloud:
    stream:
      bindings:
        orderPlacedHandler-in-0:
          destination: order.events.v1
          group: inventory-service
      kafka:
        binder:
          brokers: kafka:9092
      default:
        consumer:
          max-attempts: 3

Configure DLQ destinations for failed messages; consumer groups isolate scaling per service name. Test with Testcontainers Kafka/Rabbit in CI to catch binding typos before deploy.

Outbox pattern — reliable publishing with DB transactions

The dual-write problem: saving business data and publishing to Kafka must appear atomic. The outbox writes the event in the same database transaction as the aggregate, then a relay publishes asynchronously.

Without outbox, this sequence fails often: (1) commit order to PostgreSQL, (2) publish OrderPlaced to Kafka. If step 2 fails, other services never see the order; if you publish first and DB rolls back, consumers act on ghost orders. The fix: insert into an outbox table in the same transaction as the order row; a separate process reads outbox rows and publishes, marking them sent.

Relay options

  • Polling publisher — simple scheduled job; slight latency; watch for hot rows and use SKIP LOCKED.
  • Debezium CDC — reads database WAL, streams outbox inserts to Kafka with minimal app code—popular in Spring + PostgreSQL stacks.
sequenceDiagram
  autonumber
  participant OS as Order Service
  participant PG as PostgreSQL
  participant RL as Outbox Relay
  participant KA as Kafka

  rect rgb(30, 40, 70)
    Note over OS,PG: Single ACID transaction
    OS->>+PG: BEGIN
    OS->>PG: INSERT INTO orders
    OS->>PG: INSERT INTO outbox
    PG-->>-OS: COMMIT
  end

  alt Polling relay
    RL->>PG: SELECT unpublished rows FOR UPDATE SKIP LOCKED
    RL->>KA: produce OrderPlaced
    RL->>PG: UPDATE outbox SET published = true
  else Debezium CDC
    RL->>PG: read WAL via replication slot
    RL->>KA: stream outbox insert as change event
  end

Full implementation details, saga pairing, and idempotency keys live in Data Management Patterns. Communication layer rule: never call kafkaTemplate.send directly from domain code without outbox or transactional messaging unless you accept gaps.

💡 Pro Tip

Use the same event envelope (schema ID, correlationId, occurredAt) in outbox rows as on the wire—relay becomes dumb plumbing, schema registry stays authoritative.