Spring Data & JPA

Spring Data abstractions

Spring Data eliminates boilerplate DAO implementations. You declare an interface; the framework generates the implementation at runtime.

Repository hierarchy

Interface	Adds
Repository<T, ID>	Marker — no methods; enables Spring Data repository detection
CrudRepository<T, ID>	save, findById, findAll, deleteById, count
PagingAndSortingRepository<T, ID>	findAll(Pageable), findAll(Sort)
JpaRepository<T, ID>	JPA-specific: flush, saveAndFlush, deleteInBatch, getReferenceById

public interface OrderRepository extends JpaRepository<Order, Long> {
  List<Order> findByCustomerIdAndStatus(String customerId, OrderStatus status);
  Optional<Order> findByExternalRef(String externalRef);
}

🔬 Under the Hood

At startup, JpaRepositoryFactoryBean creates a JDK dynamic proxy implementing your interface. Each method routes to SimpleJpaRepository (built-in CRUD) or a QueryMethod parsed from the method name / @Query. No bytecode generation of implementation classes—you get a proxy delegating to shared infrastructure.

💡 Pro Tip

Prefer Optional<T> return types for single-result queries—Spring Data translates empty results to Optional.empty() instead of returning null.

Entity mapping

JPA maps Java classes to relational tables. Hibernate is the default JPA provider in Spring Boot. Entities must have a no-arg constructor (can be private) and an identifier.

@Entity
@Table(name = "orders", indexes = @Index(name = "idx_orders_customer", columnList = "customer_id"))
public class Order {
  @Id
  @GeneratedValue(strategy = GenerationType.IDENTITY)
  private Long id;

  @Column(name = "customer_id", nullable = false, length = 64)
  private String customerId;

  @Enumerated(EnumType.STRING)
  @Column(nullable = false, length = 32)
  private OrderStatus status;

  @Column(name = "created_at", nullable = false)
  private Instant createdAt;

  protected Order() {}  // JPA requirement

  public Order(String customerId, OrderStatus status) {
    this.customerId = customerId;
    this.status = status;
    this.createdAt = Instant.now();
  }
}

@GeneratedValue strategies

Choosing the wrong strategy causes performance issues or ID collisions across databases.

Strategy	How it works	When to use
IDENTITY	DB auto-increment (INSERT returns ID)	PostgreSQL, MySQL, SQL Server — simplest; ID known after flush
SEQUENCE	Separate sequence object; Hibernate can batch allocations	Oracle, PostgreSQL — better for bulk inserts; use @SequenceGenerator
TABLE	Emulates sequence via lock table	Legacy portability — slow; avoid in new code
AUTO	Provider picks based on dialect	Dev convenience — be explicit in production

@Id
@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "order_seq")
@SequenceGenerator(name = "order_seq", sequenceName = "order_id_seq", allocationSize = 50)
private Long id;

⚠️ Pitfall

IDENTITY prevents Hibernate JDBC batching on inserts—the ID is required immediately per row. High-volume ingest: use SEQUENCE with allocationSize aligned to Hibernate's optimizer, or assign UUIDs in application code.

Field mapping

Control column names, nullability, length, and how Java types persist.

Annotation	Purpose
@Column	Name, nullable, length, unique, columnDefinition
@Transient	Not persisted — computed fields, caches on entity (use sparingly)
@Enumerated(STRING)	Store enum name — readable, survives enum reorder (preferred)
@Enumerated(ORDINAL)	Store 0,1,2 — fragile if enum order changes
@Lob	Large object — CLOB/BLOB; consider external object storage for big files
@Convert	Custom AttributeConverter — e.g. JSON column, encrypted strings

@Converter
class JsonMapConverter implements AttributeConverter<Map<String, String>, String> {
  private static final ObjectMapper MAPPER = new ObjectMapper();

  @Override
  public String convertToDatabaseColumn(Map<String, String> attribute) {
    try { return MAPPER.writeValueAsString(attribute); }
    catch (JsonProcessingException e) { throw new IllegalArgumentException(e); }
  }

  @Override
  public Map<String, String> convertToEntityAttribute(String dbData) {
    try { return MAPPER.readValue(dbData, new TypeReference<>() {}); }
    catch (JsonProcessingException e) { throw new IllegalArgumentException(e); }
  }
}

Relationship mapping

Object graphs map to foreign keys and join tables. Every association has an owning side (with the FK) and optionally an inverse side (mappedBy).

Annotation	Default fetch	Typical mapping
@ManyToOne	EAGER	Child → parent FK column
@OneToMany	LAZY	Parent → collection; inverse of ManyToOne
@OneToOne	EAGER	Profile ↔ User; either side can own FK
@ManyToMany	LAZY	Join table; prefer explicit link entity in production

@Entity
public class Order {
  @Id @GeneratedValue(strategy = GenerationType.IDENTITY)
  private Long id;

  @OneToMany(mappedBy = "order", cascade = CascadeType.PERSIST, orphanRemoval = true)
  private List<OrderLine> lines = new ArrayList<>();

  public void addLine(OrderLine line) {
    lines.add(line);
    line.setOrder(this);
  }
}

@Entity
public class OrderLine {
  @Id @GeneratedValue(strategy = GenerationType.IDENTITY)
  private Long id;

  @ManyToOne(fetch = FetchType.LAZY, optional = false)
  @JoinColumn(name = "order_id", nullable = false)
  private Order order;
}

CascadeType — use with caution

Cascade	Propagates
PERSIST	persist() to associated entities
MERGE	merge() on detached graphs
REMOVE	remove() cascades deletes
ALL	All of the above + refresh, detach

⚠️ Pitfall

CascadeType.ALL on @ManyToOne or large collections can delete far more than intended—one removed parent wipes children across the DB. Prefer orphanRemoval = true only on true parent-child composition (Order → OrderLine), never on shared reference entities.

FetchType — defaults and overrides

Always set FetchType.LAZY on @ManyToOne and @OneToOne in production — the JPA default for @ManyToOne is EAGER, which causes accidental joins on every load.

@Entity
public class Student {
  @ManyToMany
  @JoinTable(
      name = "student_course",
      joinColumns = @JoinColumn(name = "student_id"),
      inverseJoinColumns = @JoinColumn(name = "course_id")
  )
  private Set<Course> courses = new HashSet<>();
}

📦 Real World

Replace @ManyToMany with an explicit Enrollment entity when you need extra columns (enrolledAt, grade, status). Join tables without entities can't carry metadata and complicate queries.

Embeddables

Value objects embedded in the same table—address, money, date ranges—without a separate entity lifecycle.

@Embeddable
public record Address(
    @Column(name = "street") String street,
    @Column(name = "city") String city,
    @Column(name = "postal_code") String postalCode
) {}

@Entity
public class Customer {
  @Id @GeneratedValue(strategy = GenerationType.IDENTITY)
  private Long id;

  @Embedded
  @AttributeOverrides({
      @AttributeOverride(name = "street", column = @Column(name = "billing_street")),
      @AttributeOverride(name = "city", column = @Column(name = "billing_city"))
  })
  private Address billingAddress;
}

Inheritance strategies

Map class hierarchies to relational schema. Each strategy trades storage normalization against query performance.

Strategy	Schema	Trade-offs
SINGLE_TABLE	One table, discriminator column	Fast reads; sparse nullable columns; default strategy
JOINED	Base table + subclass tables	Normalized; joins on every polymorphic query
TABLE_PER_CLASS	Table per concrete class	Polymorphic queries use UNION — poor performance; avoid

@Entity
@Inheritance(strategy = InheritanceType.SINGLE_TABLE)
@DiscriminatorColumn(name = "payment_type")
public abstract class Payment { @Id @GeneratedValue Long id; }

@Entity
@DiscriminatorValue("CARD")
public class CardPayment extends Payment { private String lastFour; }

@Entity
@DiscriminatorValue("BANK")
public class BankPayment extends Payment { private String iban; }

N+1 query problem

Load N parent rows → Hibernate fires N additional queries for each lazy association. The most common JPA performance bug in production.

sequenceDiagram
  participant App as Service
  participant EM as EntityManager
  participant DB as Database
  App->>EM: findAll Orders
  EM->>DB: SELECT star FROM orders
  DB-->>EM: 100 rows
  loop For each order access lines
    App->>EM: get lines lazy
    EM->>DB: SELECT star FROM order_line WHERE order_id equals id
  end
  Note over DB: 1 plus 100 equals 101 queries

Detection

Enable spring.jpa.show-sql=true (dev only) or logging: logging.level.org.hibernate.SQL=DEBUG
Hibernate statistics: spring.jpa.properties.hibernate.generate_statistics=true
p6spy or datasource proxy — count statements per request
APM tools (Datadog, New Relic) — spike in query count per endpoint

Fix 1: JOIN FETCH in JPQL

@Query("SELECT DISTINCT o FROM Order o JOIN FETCH o.lines WHERE o.status = :status")
List<Order> findWithLinesByStatus(@Param("status") OrderStatus status);

Fix 2: @EntityGraph

@Entity
@NamedEntityGraph(name = "Order.withLines", attributeNodes = @NamedAttributeNode("lines"))
public class Order { /* ... */ }

@EntityGraph("Order.withLines")
List<Order> findByStatus(OrderStatus status);

Fix 3: @BatchSize

@Entity
public class Order {
  @OneToMany(mappedBy = "order")
  @BatchSize(size = 25)
  private List<OrderLine> lines;
}

// Hibernate: SELECT ... WHERE order_id IN (?,?,... 25 ids) — reduces N to N/25

Fix 4: DTO projections

Don't load entities at all—query only needed columns into a DTO or interface projection.

public interface OrderSummary {
  Long getId();
  String getCustomerId();
  int getLineCount();
}

@Query("""
    SELECT o.id AS id, o.customerId AS customerId, COUNT(l) AS lineCount
    FROM Order o LEFT JOIN o.lines l
    GROUP BY o.id, o.customerId
    """)
List<OrderSummary> findSummaries();

🎯 Interview Tip

Explain N+1 with concrete numbers: 1 query for list + N for each lazy collection access. Best fix depends on use case: JOIN FETCH for always-needed associations, EntityGraph for optional graphs, DTO for read-only API responses.

Query methods

Spring Data parses method names into queries, or you supply JPQL/SQL explicitly. Know when derived queries stop scaling.

Derived query method naming

Prefix	Example	Generated intent
find…By / get…By	findByEmail	SELECT … WHERE email = ?
count…By	countByStatus	COUNT … WHERE status = ?
exists…By	existsBySku	EXISTS subquery — stops at first match
delete…By	deleteByCreatedAtBefore	DELETE … (needs @Transactional on service)

Keywords: And, Or, Between, LessThan, GreaterThan, Like, In, OrderBy, IgnoreCase, Containing.

Page<Order> findByCustomerIdAndStatusOrderByCreatedAtDesc(
    String customerId, OrderStatus status, Pageable pageable);

List<Order> findTop10ByStatusOrderByCreatedAtDesc(OrderStatus status);

@Query — JPQL and native SQL

@Query("SELECT o FROM Order o WHERE o.createdAt >= :since AND o.status IN :statuses")
List<Order> findRecent(@Param("since") Instant since, @Param("statuses") Collection<OrderStatus> statuses);

@Query(value = """
    SELECT o.* FROM orders o
    WHERE o.customer_id = :customerId
    ORDER BY o.created_at DESC
    LIMIT :limit
    """, nativeQuery = true)
List<Order> findRecentNative(@Param("customerId") String customerId, @Param("limit") int limit);

@Modifying — UPDATE/DELETE

@Modifying(clearAutomatically = true, flushAutomatically = true)
@Query("UPDATE Order o SET o.status = :newStatus WHERE o.id = :id")
int updateStatus(@Param("id") Long id, @Param("newStatus") OrderStatus newStatus);

⚠️ Pitfall

@Modifying queries bypass the persistence context—managed entities in memory become stale. Use clearAutomatically = true or evict affected entities. Must run inside a transaction.

Projections

Type	Mechanism
Interface closed projection	Getter names match entity properties — Spring Data generates proxy
Class-based DTO	Constructor expression in JPQL: SELECT new com.acme.OrderDto(o.id, o.status)
Dynamic projection	Method generic type parameter determines projection at runtime

Transactions

Spring's declarative transactions wrap service methods in AOP proxies. JPA requires a transaction for writes and for keeping the persistence context open during the unit of work.

@Service
public class OrderService {
  private final OrderRepository orderRepository;
  private final InventoryClient inventoryClient;

  @Transactional
  public Order placeOrder(PlaceOrderCommand cmd) {
    Order order = orderRepository.save(new Order(cmd.customerId()));
    inventoryClient.reserve(cmd.sku(), cmd.qty());  // participates in same TX if client is @Transactional
    return order;
  }

  @Transactional(readOnly = true)
  public OrderDto getOrder(long id) {
    return orderRepository.findById(id)
        .map(OrderDto::from)
        .orElseThrow(() -> new OrderNotFoundException(id));
  }
}

Propagation levels — concrete scenarios

Propagation	Behavior	Scenario
REQUIRED (default)	Join existing TX or create new	Normal service method — 95% of usage
REQUIRES_NEW	Suspend current TX; always new TX	Audit log that must commit even if outer TX rolls back
NESTED	Savepoint within existing TX	Partial rollback of sub-operation (JDBC savepoints; rare with JPA)
SUPPORTS	Join if exists; non-transactional otherwise	Read helpers called from both TX and non-TX code
NOT_SUPPORTED	Suspend TX; run without	Long-running report that shouldn't hold DB connection
MANDATORY	Must have existing TX; else exception	Internal DAO called only from transactional services
NEVER	Must not have TX; else exception	Enforce non-transactional side effects

Isolation levels

Level	Prevents	Cost
READ_UNCOMMITTED	Dirty reads (theoretically)	Lowest isolation — rarely used
READ_COMMITTED	Dirty reads	PostgreSQL/Oracle default — good for most apps
REPEATABLE_READ	Non-repeatable reads	MySQL InnoDB default — phantom reads still possible
SERIALIZABLE	Phantoms	Highest consistency — contention and deadlocks

Self-invocation trap

@Service
public class BrokenOrderService {
  public void process(long id) {
    doTransactionalWork(id);  // NO proxy — @Transactional ignored!
  }

  @Transactional
  void doTransactionalWork(long id) { /* ... */ }
}

// Fix: inject self (careful with cycles), move to another bean, or use AspectJ weaving

Rollback behavior

Default: rollback on unchecked exceptions (RuntimeException, Error). Checked exceptions do not trigger rollback unless configured: @Transactional(rollbackFor = IOException.class).

🔬 Under the Hood

@Transactional is AOP around advice via TransactionInterceptor. readOnly=true hints Hibernate: FlushMode.MANUAL, no dirty checking flush—optimization for read paths. Put transactions on @Service, not @Repository (Spring Data repos are transactional for single operations already).

⚠️ Pitfall

@Transactional on private methods is ignored (proxy can't intercept). Catching exceptions inside the method without rethrow prevents rollback—log and rethrow or use rollbackFor.

Hibernate & persistence context

The persistence context (JPA's first-level cache) is a session-scoped map of managed entities. Understanding entity states explains dirty checking, lazy loading, and LazyInitializationException.

Entity states

State	Meaning	How you get there
Transient	Not associated with persistence context	new Order()
Managed	Tracked; changes flushed at commit	persist(), find(), within @Transactional
Detached	Was managed; context closed	TX ended, clear(), serialized to JSON and back
Removed	Scheduled for DELETE on flush	remove() on managed entity

stateDiagram-v2
  [*] --> Transient: new entity
  Transient --> Managed: persist or merge
  Managed --> Detached: transaction ends
  Detached --> Managed: merge
  Managed --> Removed: remove
  Removed --> Detached: flush delete
  Detached --> [*]

First-level cache

Within a transaction, repeated findById(1L) returns the same instance—no second SELECT. Identity map guarantees referential consistency inside the unit of work.

Second-level cache

SessionFactory-scoped cache shared across transactions. Entity must be annotated @Cacheable; configure provider (Caffeine in-process, JCache, Infinispan clustered).

@Entity
@Cacheable
@org.hibernate.annotations.Cache(usage = CacheConcurrencyStrategy.READ_WRITE)
public class Product {
  @Id private Long id;
  private String sku;
  private String name;
}

LazyInitializationException

Accessing a lazy association after the persistence context closed throws LazyInitializationException — classic stack trace mentions "no Session" or "could not initialize proxy."

@Transactional(readOnly = true)
public Order getOrder(long id) {
  return orderRepository.findById(id).orElseThrow();  // TX ends here
}

// Controller — outside TX
order.getLines().size();  // LazyInitializationException

Proper fixes:

Fetch needed associations inside TX (JOIN FETCH, EntityGraph)
Return DTOs from service—not entities with lazy graphs
@Transactional on the method that traverses the graph (if truly needed)

⚠️ Pitfall

Open Session In View (OSIV) — spring.jpa.open-in-view=true (Boot default) keeps session open through view rendering. Masks LazyInitializationException but causes lazy loads during JSON serialization—hidden N+1 in controllers. Disable in prod APIs: spring.jpa.open-in-view=false and fetch explicitly in services.

🔖 Version Note

Spring Boot 2.x+ logs a warning when OSIV is enabled. Boot 3 still defaults to true—explicitly set false for REST microservices.

Session management in Spring

JpaTransactionManager binds EntityManager to thread per transaction. Spring Data repositories participate automatically. Don't inject EntityManager into singleton beans without @PersistenceContext (transaction-scoped proxy).

Auditing

Automatic population of created/modified timestamps and user IDs—standard in enterprise schemas without manual setter calls in every service method.

@Configuration
@EnableJpaAuditing(auditorAwareRef = "auditorProvider")
class JpaAuditingConfig {

  @Bean
  AuditorAware<String> auditorProvider() {
    return () -> Optional.ofNullable(SecurityContextHolder.getContext())
        .map(SecurityContext::getAuthentication)
        .filter(Authentication::isAuthenticated)
        .map(Authentication::getName);
  }
}

@MappedSuperclass
@EntityListeners(AuditingEntityListener.class)
public abstract class AuditableEntity {
  @CreatedDate
  @Column(nullable = false, updatable = false)
  private Instant createdAt;

  @LastModifiedDate
  @Column(nullable = false)
  private Instant updatedAt;

  @CreatedBy
  @Column(updatable = false, length = 64)
  private String createdBy;

  @LastModifiedBy
  @Column(length = 64)
  private String updatedBy;
}

@Entity
public class Order extends AuditableEntity {
  @Id @GeneratedValue(strategy = GenerationType.IDENTITY)
  private Long id;
}

💡 Pro Tip

Use Instant (UTC) for audit timestamps—not LocalDateTime without zone. For system jobs without security context, AuditorAware should return Optional.of("system"), not empty (which skips @CreatedBy).

📦 Real World

Combine JPA auditing with DB-level triggers for compliance-heavy domains (immutable audit trail). Application auditing is convenient; database triggers survive direct SQL and admin tools.