Request thread pool exhausted under load
Scenario
Under traffic, latency spikes and errors like 503, TaskRejectedException, or “all threads are busy.” Tomcat (or Netty worker + blocking adapter) shows current threads = max threads. New requests queue or fail. You need to know whether the pool is too small, threads are stuck waiting, or work is never finishing.
After reading, you should be able to:
- Tell pool exhaustion from deadlock and lock BLOCKED contention.
- Read Tomcat /
ExecutorService/ForkJoinPoolmetrics and thread dumps. - Find why threads do not return (slow DB, HTTP, missing timeout, blocking on FJP).
- Mitigate with timeouts, bulkheads, backpressure, and correct sizing—not only “raise maxThreads.”
Why — finite workers, unbounded wait
A thread pool caps how many requests run at once. Each active thread is tied up until the handler returns. Exhaustion means every worker is busy (or blocked) and the queue is full or requests are rejected—so new work cannot start. The system looks “down” even if CPU is low: threads are waiting, not computing.
Common pools in Java services
| Pool | Typical config | Symptom when exhausted |
|---|---|---|
| Tomcat / Jetty HTTP | maxThreads, acceptCount | All http-nio-*-exec-* busy; connections queue at OS |
ExecutorService | core/max + bounded queue | RejectedExecutionException; custom worker names in dump |
ForkJoinPool.commonPool() | parallelism = CPUs - 1 | CompletableFuture / parallel streams stall; workers blocked in I/O |
| Scheduled pool | fixed size | Jobs pile up; timers fire late |
Root causes (not “we need more threads”)
- Slow dependency — DB, cache, payment API; each request holds a thread for seconds.
- No timeouts — hung HTTP client or JDBC connection never releases the worker.
- Traffic spike — legitimate burst exceeds pool × acceptable latency.
- Blocking inside async —
.get()onCompletableFutureon the request thread, orparallelStream()+ blocking I/O on common pool. - Pool too small vs connection pool — 200 Tomcat threads but 20 DB connections: 180 threads block on Hikari
getConnection()— see DB pool guide. - Hidden sync — many threads BLOCKED on one lock; pool looks full while work is serialized.
- Deadlock — all workers in a cycle; zero throughput.
- Thread leak — tasks submitted to a static pool never complete (rare but catastrophic).
Bigger maxThreads is not a free fix. More threads → more stack memory, more concurrent DB connections, more lock contention. Often the fix is faster or bounded work per request, not more parallelism.
What — confirm exhaustion and find what threads wait on
-
Confirm pool saturation (metrics)
- Tomcat:
tomcat.threads.busy≈tomcat.threads.config.max; queue / accept queue growing. - Micrometer:
executor.active,executor.queued,executor.completedflatlines while load continues. - Errors: HTTP 503,
org.apache.tomcat.util.threads.ThreadPoolExecutorrejections, SpringTaskRejectedException.
- Tomcat:
-
Thread dump while unhealthy
jcmd <pid> Thread.print > /tmp/pool-$(date +%s).txt
Count threads namedhttp-nio-8080-exec-*(or your executor prefix). If almost all exist and few are idle → exhaustion confirmed. -
Classify what busy threads are doing
Stack top Likely cause socketRead0, JDBC driverSlow or stuck DB / network HttpClient/ OkHttp readDownstream API no timeout HikariPool.getConnectionConnection pool smaller than thread demand BLOCKED/waiting to lockLock contention — see BLOCKED guide JVM “Found Java-level deadlock” Deadlock TIMED_WAITINGon pool queueSubmitters waiting for a worker — secondary symptom ForkJoinPool.awaitJoinBlocking common pool; move blocking off FJP - Correlate with dependency latency — p99 DB/API up at same time as busy threads? Trace one slow request (distributed trace span: where did 8s go?).
-
Check connection pool vs thread pool
# Hikari (example JMX / metrics) hikaricp.connections.active ≈ maximum hikaricp.connections.pending > 0
Many threads ingetConnection→ size or speed mismatch, not Tomcat alone. -
Review reject policy
—
AbortPolicyfails fast;CallerRunsPolicyblocks the caller thread (can deadlock servlet accept path if misused). -
Load vs capacity math (rough)
Little’s law:
concurrency ≈ throughput × latency. If RPS × p99 latency >maxThreads, you need lower latency, fewer concurrent slow calls, or more workers (last resort).
ForkJoinPool.commonPool() specifically
parallelStream() and default CompletableFuture.supplyAsync use the common pool.
Blocking I/O on those threads reduces effective parallelism for the whole JVM.
Thread dump: many ForkJoinPool.commonPool-worker-* in socket read or synchronized.
// Anti-pattern: blocking HTTP on common pool
list.parallelStream().forEach(id -> httpClient.get("/item/" + id));
// Better: dedicated executor with bounded queue + timeouts
executor.submit(() -> httpClient.get(...));
Capture before restart
- Thread dump + 1 min of metrics (threads busy, queue, DB pool, downstream p99).
- Recent deploy / feature flag / traffic pattern change.
- Sample slow trace IDs for log correlation.
How — restore service and fix the bottleneck
Immediate mitigation
- Scale replicas — spreads load if dependency is healthy; does not fix per-pod stuck threads.
- Throttle at edge — API gateway rate limit, load shed non-critical routes.
- Circuit break / disable feature — stop calling slow dependency until recovered.
- Temporary maxThreads bump — only with headroom on DB connections and memory; watch for worse contention.
- Restart stuck pods — after dumps saved; if threads were deadlocked or leaked.
Durable fixes
| Lever | Action |
|---|---|
| Timeouts everywhere | JDBC, HTTP client, Redis—fail fast; return 504/503 with retry-safe semantics |
| Align pool sizes | maxThreads ≤ DB pool capacity you can afford; or async DB access with smaller blocking footprint |
| Bulkhead | Separate executor for heavy/reporting vs API; cap concurrent calls to fragile dependency |
| Backpressure | Bounded queue + reject; don’t accept infinite work |
| Virtual threads (Java 21+) | Many blocking I/O requests on cheap carriers; still bound DB and CPU-heavy work |
| Async I/O model | WebFlux / reactive only if team commits; don’t block event loop |
| Fix slow path | Index, cache, batch API calls, remove N+1 queries |
| Never block common FJP | Custom Executor for blocking tasks |
Sizing guidance (starting point)
- CPU-bound work: threads ≈ CPU cores (or cores + small constant).
- I/O-bound work: higher thread count can help if dependencies respond quickly; otherwise fix latency first.
- Hikari: often
maximumPoolSizemodest (10–50 per instance); total across replicas must fit DBmax_connections. - Load test: ramp RPS until p99 SLO breaks; note thread busy % at that point.
Verify
- Peak-load test: busy threads < 80% max under target RPS.
- No growth in accept queue / executor queue at steady state.
- Thread dumps: mix of idle workers; blocked threads only briefly.
- Alerts: busy threads > 85% for 5 min; pending Hikari connections > 0.
Interview one-liner
“I check whether all request threads are busy, take a dump to see if they’re waiting on DB/HTTP or locks, align connection pool with thread demand, add timeouts and bulkheads, and only then tune maxThreads—after fixing slow dependencies.”
Related scenarios
- Deadlock
- Thread BLOCKED
- Shared data races
- Randomly unresponsive
- Slow after a few hours — leak vs pool saturation