Service Design & Decomposition

Domain-Driven Design as an architecture tool

Domain-Driven Design (DDD) is not a synonym for “use entities in JPA.” It is a collaboration method between domain experts and engineers to discover a model that matches how the business actually works—and then protect that model with explicit boundaries.

In a monolith, inconsistent models hide inside packages: the same English word means different things in billing versus fulfillment, and the database forces a single table shape anyway. In microservices, those contradictions become network contracts. Shipping two incompatible definitions of Customer across services produces subtle production bugs: duplicate charges, wrong shipping labels, and “fixes” that patch symptoms in one service while breaking another.

DDD gives you vocabulary and diagrams to make those splits deliberate. Strategic design (bounded contexts, context maps, subdomains) answers where to draw service boundaries. Tactical design (aggregates, entities, value objects, domain events) answers how to implement consistency inside a service so you do not recreate distributed transactions by accident.

⚖️ Trade-off

Full DDD modeling workshops are expensive. The pragmatic approach for most teams: run event storming or domain storytelling for the core revenue path, define bounded contexts for that path first, and apply tactical patterns only where invariants are genuinely hard—not on every CRUD admin screen.

Ubiquitous language — why naming is architecture

The words your product managers, support agents, and engineers use in meetings should appear in code, APIs, metrics, and runbooks—unchanged.

Ubiquitous language means the team shares one precise vocabulary for the domain. When someone says “order is placed,” everyone knows that is a business moment with rules (payment authorized, inventory reserved)—not merely a row inserted with status PLACED because the UI button fired. If developers rename that concept to TransactionRecord “for generality,” you have already lost traceability from incident to code.

Good names reduce cross-service coupling. A field called shipmentId in the Order service and fulfillmentReference in Logistics looks harmless until you need distributed tracing and support tools; operators cannot search one ID across systems. Align names at context boundaries via published language or translation tables in an anti-corruption layer—not by hoping people read wiki pages.

Practices that stick

Glossary in repo — docs/domain-glossary.md linked from README; PRs that introduce new nouns must update it.
API review against glossary — OpenAPI path and schema names reviewed by someone who talks to customers weekly.
Reject ambiguous verbs — process, handle, manage hide behavior; prefer authorizePayment, cancelSubscription.
Events name facts — past tense: PaymentAuthorized, not DoPayment.

⚠️ Pitfall

Letting database column names from a 1998 schema dictate service APIs. Legacy names (cust_no) belong behind an ACL; public contracts should speak the business language of today.

🎯 Interview Tip

When asked to design Uber Eats or Amazon checkout, spend the first five minutes defining nouns (Order, Cart, Merchant, Delivery) and who owns each. Interviewers reward clarity of language more than immediate box-and-arrow quantity.

Bounded context — the most important DDD idea for microservices

A bounded context is the boundary within which a particular domain model is defined and applicable. Outside that boundary, the same word may mean something else entirely.

Think of Amazon: Recommendation context treats Product as features and embeddings; Warehouse context treats Product as SKU, dimensions, and hazmat flags; Checkout treats Product as a price snapshot and tax category at purchase time. None of these models is “wrong.” Problems appear when one service tries to be all three—usually via a shared “Product” database every team mutates.

A microservice should align to one bounded context (or a cohesive subset). That gives the team autonomy to rename internal types, choose storage, and deploy without negotiating schema migrations across the company. Integration happens through published APIs or events, not through shared tables.

flowchart TB
  subgraph checkout [Checkout Context]
    Cart[Cart Aggregate]
    Pricing[Pricing Policy]
  end
  subgraph inventory [Inventory Context]
    Stock[Stock Aggregate]
    Reservation[Reservation]
  end
  subgraph shipping [Shipping Context]
    Label[Shipment Label]
  end
  Cart -->|OrderPlaced event| Stock
  Stock -->|InventoryReserved| Cart
  Cart -->|ReadyToShip| Label

How to discover contexts

Event storming — walk the business timeline on sticky notes; cluster events that share rules and data.
Organizational cues — different departments arguing about definitions often signals different contexts (not an excuse to copy bad politics into code, but a hint).
Change cadence — rules that change together belong together; tax logic rarely changes with loyalty point campaigns.
Consistency needs — invariants that must be instant and atomic define aggregate boundaries inside a context, not necessarily one giant service.

📦 Real World

Netflix’s early microservice split followed failure domains and team ownership (catalog, billing, streaming playback). Bounded language diverged intentionally: “title” in discovery is not the same object as “asset” in encoding pipelines.

Context map — relationships between bounded contexts

Drawing services without drawing relationships is how platforms accumulate hidden master databases and surprise downtime chains.

A context map is a strategic diagram: nodes are bounded contexts; edges are integration relationships with explicit power dynamics and translation needs. It is the architecture document executives understand and engineers can implement.

Pattern	Meaning	When to use
Partnership	Two teams succeed or fail together; coordinate releases closely.	Early startup domains, two contexts with no stable upstream yet.
Shared Kernel	Small shared model/library both teams mutate—high coordination cost.	Tiny common types (money value object); avoid large shared entities.
Customer–Supplier	Upstream (supplier) defines API; downstream (customer) adapts.	Platform team exposing billing API to product teams.
Conformist	Downstream accepts upstream model as-is—no translation layer.	Integrating SaaS you cannot influence (Stripe objects in your code).
Anti-Corruption Layer (ACL)	Downstream translates foreign model into its own ubiquitous language.	Legacy monolith APIs, third-party XML, different DDD contexts.
Open Host Service	Upstream publishes a protocol-friendly API for many consumers.	Public REST/gRPC with versioning policy and SLA.
Published Language	Well-known exchange format (JSON schema, protobuf) as contract.	Company-wide event envelope, industry standards (ISO codes).
Separate Ways	No integration—duplicate data intentionally cheaper than coupling.	Reporting replica, offline analytics extract.
Big Ball of Mud	Legacy system with no clear model—contain with ACL, do not extend.	Mainframe, 15-year ERP—strangle rather than “finish the model.”

🚫 Anti-Pattern

Labeling every integration “partnership” to avoid deciding who owns the API. Undefined upstream teams become bottlenecks; every downstream team waits for “joint release trains” that never stabilize.

Aggregates — consistency boundaries inside a service

An aggregate is a cluster of domain objects treated as one unit for data changes. The aggregate root is the only entry point for mutations.

Distributed systems cannot cheaply enforce invariants across arbitrary object graphs. Inside one service, you still need a rule: which objects must stay consistent in a single transaction. That cluster is the aggregate. The root entity (e.g., Order) exposes methods that enforce rules; internal entities (e.g., OrderLine) are not updated directly by repositories from outside.

Reference other aggregates by ID only—not by loading foreign object graphs. If Order needs customer credit status, call the Customer service or read a denormalized snapshot updated by events—not a JPA @ManyToOne to another service’s tables.

Design rules

Small aggregates — large roots serialize contention; prefer multiple roots linked by events.
One transaction per aggregate — multi-root ACID in one service is a smell; consider saga across services instead.
Eventually consistent elsewhere — outside the boundary, accept delay; design compensations.

public class Order { // aggregate root
    private final OrderId id;
    private final List<OrderLine> lines;
    private OrderStatus status;

    public void addLine(ProductRef product, int qty, Money unitPrice) {
        if (status != OrderStatus.DRAFT) {
            throw new IllegalStateException("Cannot modify placed order");
        }
        lines.add(new OrderLine(product, qty, unitPrice));
    }

    public OrderPlacedEvent place(Clock clock) {
        if (lines.isEmpty()) throw new IllegalStateException("Empty order");
        status = OrderStatus.PLACED;
        return new OrderPlacedEvent(id, clock.instant());
    }
}

🔬 Under the Hood

Spring Data JPA makes it easy to expose every entity as a repository. Architectural discipline means only OrderRepository.save(order) on the root—never OrderLineRepository in application services.

Entity vs value object

Entities have identity that persists through attribute changes; value objects are defined entirely by their attributes and are often immutable.

Entity: Customer#123 can change address, email, and tier; you still mean the same customer. Tracking identity matters for lifecycles and legal obligations. Value object: Money(USD, 19.99), EmailAddress, GeoCoordinate—replace the whole value when it changes; equality is attribute-based.

Value objects push validation inward: an EmailAddress constructor rejects invalid strings so controllers stay thin. Records (Java 16+) are excellent for values; entities stay classes with controlled mutation through the aggregate root.

public record Money(String currency, BigDecimal amount) {
    public Money {
        if (amount.scale() > 2) throw new IllegalArgumentException("Too many decimal places");
        if (!List.of("USD", "EUR", "GBP").contains(currency)) {
            throw new IllegalArgumentException("Unsupported currency");
        }
    }
}

💡 Pro Tip

Do not encode business identifiers only as primitive Long everywhere. Wrapper types (OrderId, CustomerId) prevent swapping IDs across method calls—a common source of catastrophic bugs.

Domain events — the API between contexts

A domain event records something that already happened in the business, in past tense, with just enough data for downstream reactions.

Synchronous REST coupling between every service creates availability chains: if Recommendations is down, Checkout cannot finish. Events let contexts react asynchronously while keeping models independent. The Order context publishes OrderPlaced; Inventory reserves stock; Analytics updates dashboards—without Order importing their types.

Events are not commands. ReserveInventory tells another team what to do and forces them to accept your language. OrderPlaced states a fact; Inventory decides how to react. Command/query separation at the integration boundary reduces coupling and makes versioning easier (new fields optional; old consumers ignore unknown events).

Implementation checklist:

Publish after commit (see transactional outbox in Data Patterns).
Version event schemas; upcast on read for event sourcing, or use tolerant readers in Kafka consumers.
Include correlation ID and causation ID for tracing.
Keep payloads lean—IDs and snapshots, not full foreign aggregates.

⚖️ Trade-off

Events add operational complexity (Kafka lag, poison messages, replay). Use them when temporal decoupling or multiple subscribers justify the cost—not to avoid designing a clear synchronous API where strong consistency is required.

Service decomposition strategies

How you slice services determines your deployment graph, data ownership, and incident blast radius for the next five years.

Decompose by business capability (preferred)

A business capability is what the organization does to generate value: “manage shopping cart,” “calculate tax,” “onboard seller.” Capabilities are relatively stable even when technology churns. Services aligned to capabilities change for business reasons, not because a framework upgraded. Amazon’s teams around fulfillment, payments, and catalog mirror this—technology choices differ per capability, but the boundary language stays consistent.

Decompose by subdomain (DDD)

DDD splits the problem into core (competitive advantage), supporting (necessary but not unique), and generic (buy or outsource: auth, email). Invest modeling effort in core subdomains; keep supporting contexts thin; use SaaS for generic ones. Microservices in generic areas (building your own CRM) are often waste.

Decompose by verb / use case (usually an anti-pattern)

Splitting CreateUserService, UpdateUserService, and DeleteUserService creates chatty orchestration, shared data, and no cohesive ownership. CRUD verbs are implementation details inside a capability-owned service, not boundary lines.

Strategy	Strength	Risk
Business capability	Stable boundaries; maps to team ownership and OKRs.	Requires domain research; wrong capability map is hard to undo.
Subdomain	Focuses engineering time on core complexity.	Abstract without event storming—teams may disagree on “core.”
Verb / layer	Fast to sketch on a whiteboard.	Distributed monolith; data coupling; no clear owner.

Strangler fig pattern — migrating without big-bang rewrite

Gradually replace functionality of a legacy system by intercepting traffic, routing slices to new services, and shrinking the old surface until it can be retired.

Netflix did not rewrite their DVD monolith in a weekend. New features and high-churn domains moved behind an edge proxy; stable legacy paths stayed put until risk justified migration. The strangler pattern pairs naturally with an API gateway or reverse proxy that routes by path, header, or percentage canary.

Typical steps

Introduce facade in front of monolith—no behavior change, observe traffic.
Implement new service for one capability; route new clients or feature-flagged users first.
Dual-write or sync data until new store is trusted; compare outputs (shadow traffic).
Shift read traffic, then write traffic; monitor error budgets per route.
Delete dead code paths in monolith when usage metrics hit zero for a sustained window.

sequenceDiagram
  participant Client
  participant Gateway
  participant NewSvc as New Order Service
  participant Legacy as Legacy Monolith
  Client->>Gateway: POST /orders
  alt feature flag new flow
    Gateway->>NewSvc: create order
    NewSvc-->>Gateway: 201 Created
  else legacy
    Gateway->>Legacy: create order
    Legacy-->>Gateway: 201 Created
  end
  Gateway-->>Client: response

📦 Real World

Uber’s macro architecture evolution used routing layers to move trip matching, pricing, and payments off a single API over years. Metrics on per-route error rate decided when to increase traffic percentage—not calendar deadlines alone.

Anti-corruption layer (ACL)

A translation layer that converts an external or legacy model into your bounded context’s model so foreign concepts do not leak inward.

Legacy monoliths expose XML SOAP payloads, inconsistent date formats, and status codes that encode fifteen years of exceptions. If you import those types into domain logic, every new feature inherits the legacy’s accidental complexity. An ACL module sits at the edge: adapters call legacy APIs, mappers produce clean domain objects, and the rest of the service speaks ubiquitous language only.

@Component
public class LegacyBillingAdapter implements BillingPort {
    private final LegacySoapClient legacy;

    @Override
    public InvoiceSummary fetchInvoice(InvoiceId id) {
        LegacyInvoiceDto raw = legacy.getInvoice(id.value());
        return new InvoiceSummary(
            new InvoiceId(raw.getInvNo()),
            Money.of(raw.getCurr(), raw.getAmt()),
            mapStatus(raw.getStsCode()) // translate 7 → PAID
        );
    }
}

ACL is not “one more DTO folder.” It owns retry policy, circuit breaking, and schema drift tests against legacy sandboxes. When the legacy system is retired, you delete the ACL—not refactor domain code that never knew it existed.

⚠️ Pitfall

Skipping ACL because “we’ll only use their API for six months.” Six years later, domain services still branch on legacyStatus == 4.

Service granularity — too fine vs too coarse

The right size is not “micro” but “aligned to business change and team cognition”—usually smaller than a monolith and larger than a single database table.

Too fine-grained

Nanoservices with one table each force orchestration across ten REST calls to complete checkout. Latency tails multiply; partial failures need compensating sagas for operations that used to be one local transaction. Operational overhead explodes: ten repos, ten CI pipelines, ten on-call rotations for what one team could own.

Too coarse-grained

A “CustomerPlatformService” owning profile, billing, support tickets, and marketing preferences is a monolith with extra network hops. You lose independent deployability—the reason you split in the first place.

Two-pizza team rule (heuristic)

Amazon’s guideline: a team should be feedable with two pizzas (~6–10 engineers) and own a service end-to-end (code, deploy, on-call). If twenty teams touch one service for every feature, the boundary is wrong. If one engineer runs fifteen services, you likely over-split.

Signs boundaries are wrong

Lockstep deploys — services always released together; version matrix untested.
Shared database — multiple services read/write same schema; migrations need global change advisory board.
Distributed transactions everywhere — 2PC or sagas for what should be one aggregate.
Cyclic dependencies — A calls B calls C calls A; no clear upstream.
No team owns failures — incidents bounce between squads with overlapping code paths.

🎯 Interview Tip

When asked “how many microservices for X,” answer with boundaries and ownership first, then count services. Mention starting with a modular monolith if the domain is still discovering itself.

REST API design for microservices

REST is still the default inter-service and public API style. Good REST models resources and business workflows—not RPC with verbs in URLs.

Resource naming

Nouns plural, hierarchical where ownership is clear: /customers/{id}/orders. Avoid RPC paths like /createOrder when POST /orders expresses the intent. Keep URLs stable; put volatile behavior in request bodies and headers.

HTTP verbs and idempotency

GET — safe, cacheable reads; never change state.
POST — create or non-idempotent commands; returns 201 with Location when creating.
PUT — replace entire resource; idempotent.
PATCH — partial update; document JSON Merge Patch or JSON Patch format.
DELETE — idempotent removal; 204 or 404 on repeat.

Status codes that earn trust

Code	Use
200 / 201 / 204	Success variants with/without body.
400	Client sent invalid syntax or failed validation—do not retry blindly.
401 / 403	Auth missing vs not allowed—distinct for security audits.
404	Resource unknown in this context (not “server hid error”).
409	Conflict with current state (duplicate idempotency key, version mismatch).
422	Semantically invalid (business rule failed) — popular in APIs with rich domains.
429	Rate limited — include Retry-After.
503	Temporary overload — clients may retry with backoff.

HATEOAS — when it helps

Hypermedia links (_links.cancel) shine for long-running workflows and public APIs where clients should not hardcode every state transition. For internal high-throughput service meshes, teams often skip full HATEOAS and rely on versioned OpenAPI plus shared client libraries—acceptable trade if discovery is solved elsewhere.

@RestController
@RequestMapping("/api/v1/orders")
public class OrderController {
    @PostMapping
    public ResponseEntity<OrderResponse> create(@Valid @RequestBody CreateOrderRequest req) {
        Order order = orderService.create(req);
        URI location = URI.create("/api/v1/orders/" + order.id());
        return ResponseEntity.created(location).body(OrderResponse.from(order));
    }

    @GetMapping("/{id}")
    public OrderResponse get(@PathVariable UUID id) {
        return OrderResponse.from(orderService.get(new OrderId(id)));
    }
}

API versioning strategies

Breaking changes are inevitable. Versioning policy decides whether consumers break silently or your platform team drowns in support tickets.

Approach	How it works	Trade-offs
URI versioning	/api/v2/orders	Obvious in logs and gateways; proliferates routes; easy for caches to split.
Header versioning	Accept-Version: 2 or custom header	Clean URLs; harder to test in browser; proxies must forward headers.
Content negotiation	Accept: application/vnd.myapp.orders+json;version=2	Standards-based; verbose; tooling support varies.

Semantic versioning for APIs: treat additive changes (new optional fields) as minor; removing fields or changing types as major. Publish deprecation timelines; return Sunset and Deprecation headers per RFC 8594 where possible. Run contract tests in CI so major bumps are conscious, not accidental JSON renames.

💡 Pro Tip

Prefer expand–contract: add v2 fields while v1 clients keep working; migrate consumers; remove v1 only when metrics show zero traffic for weeks.

Contract-first API design (OpenAPI)

Define the API contract before implementation so consumers and providers negotiate once, in YAML, instead of in production at 2 a.m.

Contract-first flow: product and engineering agree on OpenAPI spec → generate server interfaces or client stubs → implement controllers that satisfy the spec → verify with contract tests in CI. Spring projects often use springdoc-openapi for runtime docs, but teams serious about compatibility check in the spec file as source of truth and use OpenAPI Generator for interfaces and DTOs.

openapi: 3.0.3
info:
  title: Order API
  version: 1.0.0
paths:
  /orders:
    post:
      operationId: createOrder
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/CreateOrderRequest'
      responses:
        '201':
          description: Created
          headers:
            Location:
              schema: { type: string }
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/OrderResponse'

Benefits multiply in microservices: mock servers from spec unblock parallel development; API gateways import the same file for validation; breaking-change detectors diff specs between Git commits. The cost is discipline—specs must be updated in the same PR as code, not “later.”

Consumer-driven contract testing (Pact)

Integration tests with Testcontainers are valuable but slow. Pact captures each consumer’s expectations of a provider API and verifies the provider independently.

Traditional provider-led tests (“here is our giant suite”) do not know what fields consumers actually use. Consumer-driven contracts invert that: the Checkout service writes a pact file saying “when I POST /orders with body X, I expect 201 and fields Y.” The Order service CI runs Pact’s provider verification against accumulated pacts from all consumers before deploy.

Workflow

Consumer test uses Pact mock → generates pact JSON published to a broker (Pactflow or self-hosted).
Provider build downloads relevant pacts → spins app context → verifies interactions.
Canary or staging deploy blocked if verification fails—breaking changes caught pre-prod.

@ExtendWith(PactConsumerTestExt.class)
@PactTestFor(providerName = "order-service", port = "8080")
class OrderClientPactTest {
    @Pact(consumer = "checkout-service")
    public RequestResponsePact createOrderPact(PactDslWithProvider builder) {
        return builder
            .given("catalog is available")
            .uponReceiving("a request to create an order")
            .path("/api/v1/orders")
            .method("POST")
            .body(new PactDslJsonBody().stringValue("sku", "ABC-1"))
            .willRespondWith()
            .status(201)
            .body(new PactDslJsonBody().uuid("id"))
            .toPact();
    }

    @Test
    void createOrder(MockServer mockServer) {
        OrderClient client = new OrderClient(mockServer.getUrl());
        UUID id = client.createOrder("ABC-1");
        assertNotNull(id);
    }
}

⚖️ Trade-off

Pact excels at synchronous HTTP contracts between known services. It does not replace schema registry governance for Kafka events—use Avro/Protobuf compatibility checks there.