Spring Data & JPA

The layer where most production Spring apps spend their complexity budget: mapping object graphs to relational schemas, generating queries from method names, and keeping transactions and lazy loading under control. This chapter goes from JpaRepository to N+1 diagnosis, propagation edge cases, and audit columns.

junior mid senior Spring Boot 3.x

Spring Data abstractions

Spring Data eliminates boilerplate DAO implementations. You declare an interface; the framework generates the implementation at runtime.

Repository hierarchy

InterfaceAdds
Repository<T, ID> Marker — no methods; enables Spring Data repository detection
CrudRepository<T, ID> save, findById, findAll, deleteById, count
PagingAndSortingRepository<T, ID> findAll(Pageable), findAll(Sort)
JpaRepository<T, ID> JPA-specific: flush, saveAndFlush, deleteInBatch, getReferenceById
Typical repository
public interface OrderRepository extends JpaRepository<Order, Long> {
  List<Order> findByCustomerIdAndStatus(String customerId, OrderStatus status);
  Optional<Order> findByExternalRef(String externalRef);
}
🔬 Under the Hood

At startup, JpaRepositoryFactoryBean creates a JDK dynamic proxy implementing your interface. Each method routes to SimpleJpaRepository (built-in CRUD) or a QueryMethod parsed from the method name / @Query. No bytecode generation of implementation classes—you get a proxy delegating to shared infrastructure.

💡 Pro Tip

Prefer Optional<T> return types for single-result queries—Spring Data translates empty results to Optional.empty() instead of returning null.

Entity mapping

JPA maps Java classes to relational tables. Hibernate is the default JPA provider in Spring Boot. Entities must have a no-arg constructor (can be private) and an identifier.

Basic entity
@Entity
@Table(name = "orders", indexes = @Index(name = "idx_orders_customer", columnList = "customer_id"))
public class Order {
  @Id
  @GeneratedValue(strategy = GenerationType.IDENTITY)
  private Long id;

  @Column(name = "customer_id", nullable = false, length = 64)
  private String customerId;

  @Enumerated(EnumType.STRING)
  @Column(nullable = false, length = 32)
  private OrderStatus status;

  @Column(name = "created_at", nullable = false)
  private Instant createdAt;

  protected Order() {}  // JPA requirement

  public Order(String customerId, OrderStatus status) {
    this.customerId = customerId;
    this.status = status;
    this.createdAt = Instant.now();
  }
}

@GeneratedValue strategies

Choosing the wrong strategy causes performance issues or ID collisions across databases.

StrategyHow it worksWhen to use
IDENTITY DB auto-increment (INSERT returns ID) PostgreSQL, MySQL, SQL Server — simplest; ID known after flush
SEQUENCE Separate sequence object; Hibernate can batch allocations Oracle, PostgreSQL — better for bulk inserts; use @SequenceGenerator
TABLE Emulates sequence via lock table Legacy portability — slow; avoid in new code
AUTO Provider picks based on dialect Dev convenience — be explicit in production
PostgreSQL sequence
@Id
@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "order_seq")
@SequenceGenerator(name = "order_seq", sequenceName = "order_id_seq", allocationSize = 50)
private Long id;
⚠️ Pitfall

IDENTITY prevents Hibernate JDBC batching on inserts—the ID is required immediately per row. High-volume ingest: use SEQUENCE with allocationSize aligned to Hibernate's optimizer, or assign UUIDs in application code.

Field mapping

Control column names, nullability, length, and how Java types persist.

AnnotationPurpose
@ColumnName, nullable, length, unique, columnDefinition
@TransientNot persisted — computed fields, caches on entity (use sparingly)
@Enumerated(STRING)Store enum name — readable, survives enum reorder (preferred)
@Enumerated(ORDINAL)Store 0,1,2 — fragile if enum order changes
@LobLarge object — CLOB/BLOB; consider external object storage for big files
@ConvertCustom AttributeConverter — e.g. JSON column, encrypted strings
AttributeConverter for JSON column
@Converter
class JsonMapConverter implements AttributeConverter<Map<String, String>, String> {
  private static final ObjectMapper MAPPER = new ObjectMapper();

  @Override
  public String convertToDatabaseColumn(Map<String, String> attribute) {
    try { return MAPPER.writeValueAsString(attribute); }
    catch (JsonProcessingException e) { throw new IllegalArgumentException(e); }
  }

  @Override
  public Map<String, String> convertToEntityAttribute(String dbData) {
    try { return MAPPER.readValue(dbData, new TypeReference<>() {}); }
    catch (JsonProcessingException e) { throw new IllegalArgumentException(e); }
  }
}

Relationship mapping

Object graphs map to foreign keys and join tables. Every association has an owning side (with the FK) and optionally an inverse side (mappedBy).

AnnotationDefault fetchTypical mapping
@ManyToOneEAGERChild → parent FK column
@OneToManyLAZYParent → collection; inverse of ManyToOne
@OneToOneEAGERProfile ↔ User; either side can own FK
@ManyToManyLAZYJoin table; prefer explicit link entity in production
Bidirectional OneToMany / ManyToOne
@Entity
public class Order {
  @Id @GeneratedValue(strategy = GenerationType.IDENTITY)
  private Long id;

  @OneToMany(mappedBy = "order", cascade = CascadeType.PERSIST, orphanRemoval = true)
  private List<OrderLine> lines = new ArrayList<>();

  public void addLine(OrderLine line) {
    lines.add(line);
    line.setOrder(this);
  }
}

@Entity
public class OrderLine {
  @Id @GeneratedValue(strategy = GenerationType.IDENTITY)
  private Long id;

  @ManyToOne(fetch = FetchType.LAZY, optional = false)
  @JoinColumn(name = "order_id", nullable = false)
  private Order order;
}

CascadeType — use with caution

CascadePropagates
PERSISTpersist() to associated entities
MERGEmerge() on detached graphs
REMOVEremove() cascades deletes
ALLAll of the above + refresh, detach
⚠️ Pitfall

CascadeType.ALL on @ManyToOne or large collections can delete far more than intended—one removed parent wipes children across the DB. Prefer orphanRemoval = true only on true parent-child composition (Order → OrderLine), never on shared reference entities.

FetchType — defaults and overrides

Always set FetchType.LAZY on @ManyToOne and @OneToOne in production — the JPA default for @ManyToOne is EAGER, which causes accidental joins on every load.

ManyToMany with join table
@Entity
public class Student {
  @ManyToMany
  @JoinTable(
      name = "student_course",
      joinColumns = @JoinColumn(name = "student_id"),
      inverseJoinColumns = @JoinColumn(name = "course_id")
  )
  private Set<Course> courses = new HashSet<>();
}
📦 Real World

Replace @ManyToMany with an explicit Enrollment entity when you need extra columns (enrolledAt, grade, status). Join tables without entities can't carry metadata and complicate queries.

Embeddables

Value objects embedded in the same table—address, money, date ranges—without a separate entity lifecycle.

@Embeddable + @Embedded
@Embeddable
public record Address(
    @Column(name = "street") String street,
    @Column(name = "city") String city,
    @Column(name = "postal_code") String postalCode
) {}

@Entity
public class Customer {
  @Id @GeneratedValue(strategy = GenerationType.IDENTITY)
  private Long id;

  @Embedded
  @AttributeOverrides({
      @AttributeOverride(name = "street", column = @Column(name = "billing_street")),
      @AttributeOverride(name = "city", column = @Column(name = "billing_city"))
  })
  private Address billingAddress;
}

Inheritance strategies

Map class hierarchies to relational schema. Each strategy trades storage normalization against query performance.

StrategySchemaTrade-offs
SINGLE_TABLE One table, discriminator column Fast reads; sparse nullable columns; default strategy
JOINED Base table + subclass tables Normalized; joins on every polymorphic query
TABLE_PER_CLASS Table per concrete class Polymorphic queries use UNION — poor performance; avoid
SINGLE_TABLE inheritance
@Entity
@Inheritance(strategy = InheritanceType.SINGLE_TABLE)
@DiscriminatorColumn(name = "payment_type")
public abstract class Payment { @Id @GeneratedValue Long id; }

@Entity
@DiscriminatorValue("CARD")
public class CardPayment extends Payment { private String lastFour; }

@Entity
@DiscriminatorValue("BANK")
public class BankPayment extends Payment { private String iban; }

N+1 query problem

Load N parent rows → Hibernate fires N additional queries for each lazy association. The most common JPA performance bug in production.

sequenceDiagram
  participant App as Service
  participant EM as EntityManager
  participant DB as Database
  App->>EM: findAll Orders
  EM->>DB: SELECT star FROM orders
  DB-->>EM: 100 rows
  loop For each order access lines
    App->>EM: get lines lazy
    EM->>DB: SELECT star FROM order_line WHERE order_id equals id
  end
  Note over DB: 1 plus 100 equals 101 queries

Detection

  • Enable spring.jpa.show-sql=true (dev only) or logging: logging.level.org.hibernate.SQL=DEBUG
  • Hibernate statistics: spring.jpa.properties.hibernate.generate_statistics=true
  • p6spy or datasource proxy — count statements per request
  • APM tools (Datadog, New Relic) — spike in query count per endpoint

Fix 1: JOIN FETCH in JPQL

@Query with JOIN FETCH
@Query("SELECT DISTINCT o FROM Order o JOIN FETCH o.lines WHERE o.status = :status")
List<Order> findWithLinesByStatus(@Param("status") OrderStatus status);

Fix 2: @EntityGraph

Named entity graph
@Entity
@NamedEntityGraph(name = "Order.withLines", attributeNodes = @NamedAttributeNode("lines"))
public class Order { /* ... */ }

@EntityGraph("Order.withLines")
List<Order> findByStatus(OrderStatus status);

Fix 3: @BatchSize

Batch lazy loading
@Entity
public class Order {
  @OneToMany(mappedBy = "order")
  @BatchSize(size = 25)
  private List<OrderLine> lines;
}

// Hibernate: SELECT ... WHERE order_id IN (?,?,... 25 ids) — reduces N to N/25

Fix 4: DTO projections

Don't load entities at all—query only needed columns into a DTO or interface projection.

Interface projection
public interface OrderSummary {
  Long getId();
  String getCustomerId();
  int getLineCount();
}

@Query("""
    SELECT o.id AS id, o.customerId AS customerId, COUNT(l) AS lineCount
    FROM Order o LEFT JOIN o.lines l
    GROUP BY o.id, o.customerId
    """)
List<OrderSummary> findSummaries();
🎯 Interview Tip

Explain N+1 with concrete numbers: 1 query for list + N for each lazy collection access. Best fix depends on use case: JOIN FETCH for always-needed associations, EntityGraph for optional graphs, DTO for read-only API responses.

Query methods

Spring Data parses method names into queries, or you supply JPQL/SQL explicitly. Know when derived queries stop scaling.

Derived query method naming

PrefixExampleGenerated intent
find…By / get…ByfindByEmailSELECT … WHERE email = ?
count…BycountByStatusCOUNT … WHERE status = ?
exists…ByexistsBySkuEXISTS subquery — stops at first match
delete…BydeleteByCreatedAtBeforeDELETE … (needs @Transactional on service)

Keywords: And, Or, Between, LessThan, GreaterThan, Like, In, OrderBy, IgnoreCase, Containing.

Derived + pagination
Page<Order> findByCustomerIdAndStatusOrderByCreatedAtDesc(
    String customerId, OrderStatus status, Pageable pageable);

List<Order> findTop10ByStatusOrderByCreatedAtDesc(OrderStatus status);

@Query — JPQL and native SQL

JPQL and native queries
@Query("SELECT o FROM Order o WHERE o.createdAt >= :since AND o.status IN :statuses")
List<Order> findRecent(@Param("since") Instant since, @Param("statuses") Collection<OrderStatus> statuses);

@Query(value = """
    SELECT o.* FROM orders o
    WHERE o.customer_id = :customerId
    ORDER BY o.created_at DESC
    LIMIT :limit
    """, nativeQuery = true)
List<Order> findRecentNative(@Param("customerId") String customerId, @Param("limit") int limit);

@Modifying — UPDATE/DELETE

Bulk update
@Modifying(clearAutomatically = true, flushAutomatically = true)
@Query("UPDATE Order o SET o.status = :newStatus WHERE o.id = :id")
int updateStatus(@Param("id") Long id, @Param("newStatus") OrderStatus newStatus);
⚠️ Pitfall

@Modifying queries bypass the persistence context—managed entities in memory become stale. Use clearAutomatically = true or evict affected entities. Must run inside a transaction.

Projections

TypeMechanism
Interface closed projectionGetter names match entity properties — Spring Data generates proxy
Class-based DTOConstructor expression in JPQL: SELECT new com.acme.OrderDto(o.id, o.status)
Dynamic projectionMethod generic type parameter determines projection at runtime

Transactions

Spring's declarative transactions wrap service methods in AOP proxies. JPA requires a transaction for writes and for keeping the persistence context open during the unit of work.

Service-layer transactions — correct placement
@Service
public class OrderService {
  private final OrderRepository orderRepository;
  private final InventoryClient inventoryClient;

  @Transactional
  public Order placeOrder(PlaceOrderCommand cmd) {
    Order order = orderRepository.save(new Order(cmd.customerId()));
    inventoryClient.reserve(cmd.sku(), cmd.qty());  // participates in same TX if client is @Transactional
    return order;
  }

  @Transactional(readOnly = true)
  public OrderDto getOrder(long id) {
    return orderRepository.findById(id)
        .map(OrderDto::from)
        .orElseThrow(() -> new OrderNotFoundException(id));
  }
}

Propagation levels — concrete scenarios

PropagationBehaviorScenario
REQUIRED (default) Join existing TX or create new Normal service method — 95% of usage
REQUIRES_NEW Suspend current TX; always new TX Audit log that must commit even if outer TX rolls back
NESTED Savepoint within existing TX Partial rollback of sub-operation (JDBC savepoints; rare with JPA)
SUPPORTS Join if exists; non-transactional otherwise Read helpers called from both TX and non-TX code
NOT_SUPPORTED Suspend TX; run without Long-running report that shouldn't hold DB connection
MANDATORY Must have existing TX; else exception Internal DAO called only from transactional services
NEVER Must not have TX; else exception Enforce non-transactional side effects

Isolation levels

LevelPreventsCost
READ_UNCOMMITTEDDirty reads (theoretically)Lowest isolation — rarely used
READ_COMMITTEDDirty readsPostgreSQL/Oracle default — good for most apps
REPEATABLE_READNon-repeatable readsMySQL InnoDB default — phantom reads still possible
SERIALIZABLEPhantomsHighest consistency — contention and deadlocks

Self-invocation trap

@Transactional bypassed on internal call
@Service
public class BrokenOrderService {
  public void process(long id) {
    doTransactionalWork(id);  // NO proxy — @Transactional ignored!
  }

  @Transactional
  void doTransactionalWork(long id) { /* ... */ }
}

// Fix: inject self (careful with cycles), move to another bean, or use AspectJ weaving

Rollback behavior

Default: rollback on unchecked exceptions (RuntimeException, Error). Checked exceptions do not trigger rollback unless configured: @Transactional(rollbackFor = IOException.class).

🔬 Under the Hood

@Transactional is AOP around advice via TransactionInterceptor. readOnly=true hints Hibernate: FlushMode.MANUAL, no dirty checking flush—optimization for read paths. Put transactions on @Service, not @Repository (Spring Data repos are transactional for single operations already).

⚠️ Pitfall

@Transactional on private methods is ignored (proxy can't intercept). Catching exceptions inside the method without rethrow prevents rollback—log and rethrow or use rollbackFor.

Hibernate & persistence context

The persistence context (JPA's first-level cache) is a session-scoped map of managed entities. Understanding entity states explains dirty checking, lazy loading, and LazyInitializationException.

Entity states

StateMeaningHow you get there
TransientNot associated with persistence contextnew Order()
ManagedTracked; changes flushed at commitpersist(), find(), within @Transactional
DetachedWas managed; context closedTX ended, clear(), serialized to JSON and back
RemovedScheduled for DELETE on flushremove() on managed entity
stateDiagram-v2
  [*] --> Transient: new entity
  Transient --> Managed: persist or merge
  Managed --> Detached: transaction ends
  Detached --> Managed: merge
  Managed --> Removed: remove
  Removed --> Detached: flush delete
  Detached --> [*]

First-level cache

Within a transaction, repeated findById(1L) returns the same instance—no second SELECT. Identity map guarantees referential consistency inside the unit of work.

Second-level cache

SessionFactory-scoped cache shared across transactions. Entity must be annotated @Cacheable; configure provider (Caffeine in-process, JCache, Infinispan clustered).

Entity cache
@Entity
@Cacheable
@org.hibernate.annotations.Cache(usage = CacheConcurrencyStrategy.READ_WRITE)
public class Product {
  @Id private Long id;
  private String sku;
  private String name;
}

LazyInitializationException

Accessing a lazy association after the persistence context closed throws LazyInitializationException — classic stack trace mentions "no Session" or "could not initialize proxy."

Failure pattern
@Transactional(readOnly = true)
public Order getOrder(long id) {
  return orderRepository.findById(id).orElseThrow();  // TX ends here
}

// Controller — outside TX
order.getLines().size();  // LazyInitializationException

Proper fixes:

  • Fetch needed associations inside TX (JOIN FETCH, EntityGraph)
  • Return DTOs from service—not entities with lazy graphs
  • @Transactional on the method that traverses the graph (if truly needed)
⚠️ Pitfall

Open Session In View (OSIV)spring.jpa.open-in-view=true (Boot default) keeps session open through view rendering. Masks LazyInitializationException but causes lazy loads during JSON serialization—hidden N+1 in controllers. Disable in prod APIs: spring.jpa.open-in-view=false and fetch explicitly in services.

🔖 Version Note

Spring Boot 2.x+ logs a warning when OSIV is enabled. Boot 3 still defaults to true—explicitly set false for REST microservices.

Session management in Spring

JpaTransactionManager binds EntityManager to thread per transaction. Spring Data repositories participate automatically. Don't inject EntityManager into singleton beans without @PersistenceContext (transaction-scoped proxy).

Auditing

Automatic population of created/modified timestamps and user IDs—standard in enterprise schemas without manual setter calls in every service method.

Enable auditing
@Configuration
@EnableJpaAuditing(auditorAwareRef = "auditorProvider")
class JpaAuditingConfig {

  @Bean
  AuditorAware<String> auditorProvider() {
    return () -> Optional.ofNullable(SecurityContextHolder.getContext())
        .map(SecurityContext::getAuthentication)
        .filter(Authentication::isAuthenticated)
        .map(Authentication::getName);
  }
}
Audited entity base class
@MappedSuperclass
@EntityListeners(AuditingEntityListener.class)
public abstract class AuditableEntity {
  @CreatedDate
  @Column(nullable = false, updatable = false)
  private Instant createdAt;

  @LastModifiedDate
  @Column(nullable = false)
  private Instant updatedAt;

  @CreatedBy
  @Column(updatable = false, length = 64)
  private String createdBy;

  @LastModifiedBy
  @Column(length = 64)
  private String updatedBy;
}

@Entity
public class Order extends AuditableEntity {
  @Id @GeneratedValue(strategy = GenerationType.IDENTITY)
  private Long id;
}
💡 Pro Tip

Use Instant (UTC) for audit timestamps—not LocalDateTime without zone. For system jobs without security context, AuditorAware should return Optional.of("system"), not empty (which skips @CreatedBy).

📦 Real World

Combine JPA auditing with DB-level triggers for compliance-heavy domains (immutable audit trail). Application auditing is convenient; database triggers survive direct SQL and admin tools.