One API call, hundreds of SQL queries (N+1)
Scenario
GET /orders returns 50 orders. Logs show 1 SELECT for orders and 50 SELECTs for line items—51 queries where one or two would suffice. Latency scales with list size; under load you hit connection pool and thread limits. You need to detect N+1 in production and fix it without breaking lazy-loading assumptions elsewhere.
After reading, you should be able to:
- Recognize N+1: query count grows linearly with collection size.
- Detect via queries-per-request, SQL logs, and Hibernate statistics.
- Fix with fetch joins, entity graphs, batch fetching, or a single DTO query.
- Guard with tests and metrics so N+1 does not return after refactors.
Why — one round trip becomes N
N+1 means: one query loads a parent list (the “1”), then for each row the ORM or code issues another query (the “N”)—often when accessing a @OneToMany or @ManyToOne association configured as LAZY.
Each query is cheap alone; hundreds per HTTP request dominate latency and pool usage.
Typical triggers (JPA/Hibernate)
- Loop over entities calling
order.getItems()outside a fetch plan. - JSON serialization touching lazy collections after the persistence context closed (or inside it—triggering loads).
spring.jpa.open-in-view=truehides the problem in dev by keeping session open through the whole MVC view—still N+1 in prod under load.- Repository
findAll()+ graph navigation in service layer. - Logging/debug that calls
toString()on entities with lazy refs.
-- Pattern in logs (same shape, different bind ids) SELECT * FROM orders WHERE user_id = ? SELECT * FROM line_item WHERE order_id = ? SELECT * FROM line_item WHERE order_id = ? SELECT * FROM line_item WHERE order_id = ? … repeated N times
What — find N+1 before users do
- Metric: SQL statements per HTTP request — Micrometer counter from datasource-proxy or custom filter; alert if p99 > threshold (e.g. 20).
- Trace JDBC child spans — one API trace with 100+ short JDBC spans of similar SQL — distributed trace.
-
Enable SQL logging in staging
logging.level.org.hibernate.SQL=DEBUG logging.level.org.hibernate.orm.jdbc.bind=TRACE # Or: datasource-proxy, p6spy
Count statements for one integration test call. -
Hibernate statistics (staging)
spring.jpa.properties.hibernate.generate_statistics=true # QueryExecutionCount vs entity load count
- Reproduce with realistic N — seed 100 parents; call endpoint; expect 2 queries, not 101.
- Locate code path — stack from p6spy to repository method; which association fired lazy load?
N+1 vs other “many queries”
| Pattern | Query count |
|---|---|
| N+1 lazy load | 1 + N (same SQL repeated) |
| Batch IN load | 1 + ceil(N/batch_size) — acceptable |
| Single JOIN FETCH | 1 (or 2 with pagination caution) |
| Missing index | Few queries, each very slow — DB slow |
How — fix patterns (JPA-focused)
1. JOIN FETCH (one query for graph)
@Query("""
SELECT DISTINCT o FROM Order o
JOIN FETCH o.items
WHERE o.userId = :userId
""")
List<Order> findOrdersWithItems(@Param("userId") Long userId);
Caution: cartesian product if multiple bags (List collections)—use Set, separate queries, or @EntityGraph.
2. @EntityGraph on repository method
@EntityGraph(attributePaths = {"items", "items.product"})
List<Order> findByUserId(Long userId);
3. @BatchSize (batch lazy loads)
@OneToMany(mappedBy = "order") @BatchSize(size = 25) private List<LineItem> items;
Issues fewer queries: WHERE order_id IN (?,?,…) instead of N singles—not as good as one join, but minimal code change.
4. DTO / projection query (best for APIs)
@Query("""
SELECT new com.app.OrderDto(o.id, o.status, i.sku, i.qty)
FROM Order o JOIN o.items i
WHERE o.userId = :uid
""")
List<OrderDto> listOrderRows(@Param("uid") Long uid);
Service layer groups rows—no entity graph, no lazy surprises.
5. Process hygiene
- Set
spring.jpa.open-in-view=false; fetch explicitly in service layer. - Do not return entities with lazy collections directly from REST—use DTOs.
- Integration test: assert query count (Hibernate
Statisticsor datasource-proxy).
Pagination + fetch
Do not JOIN FETCH a bag on a paged query—Hibernate may load all rows in memory. Pattern: page ids first, then WHERE id IN (:ids) with fetch join.
Verify
- Queries-per-request for endpoint: from 101 → 2 (or 3 with batch).
- p99 latency down under load test with 50–100 parents.
- Pool active connections drop — pool guide.
- CI test fails if query count regresses.
Interview one-liner
“N+1 is one query for the list plus one per row for a lazy association. I count SQL per request, fix with JOIN FETCH or a DTO query, turn off open-in-view, and add an integration test that asserts a constant query count for a fixed dataset.”