Request thread pool exhausted under load

Scenario

Under traffic, latency spikes and errors like 503, TaskRejectedException, or “all threads are busy.” Tomcat (or Netty worker + blocking adapter) shows current threads = max threads. New requests queue or fail. You need to know whether the pool is too small, threads are stuck waiting, or work is never finishing.

After reading, you should be able to:

Tell pool exhaustion from deadlock and lock BLOCKED contention.
Read Tomcat / ExecutorService / ForkJoinPool metrics and thread dumps.
Find why threads do not return (slow DB, HTTP, missing timeout, blocking on FJP).
Mitigate with timeouts, bulkheads, backpressure, and correct sizing—not only “raise maxThreads.”

Why — finite workers, unbounded wait

A thread pool caps how many requests run at once. Each active thread is tied up until the handler returns. Exhaustion means every worker is busy (or blocked) and the queue is full or requests are rejected—so new work cannot start. The system looks “down” even if CPU is low: threads are waiting, not computing.

Common pools in Java services

Pool	Typical config	Symptom when exhausted
Tomcat / Jetty HTTP	`maxThreads`, `acceptCount`	All `http-nio--exec-` busy; connections queue at OS
`ExecutorService`	core/max + bounded queue	`RejectedExecutionException`; custom worker names in dump
`ForkJoinPool.commonPool()`	`parallelism = CPUs - 1`	`CompletableFuture` / parallel streams stall; workers blocked in I/O
Scheduled pool	fixed size	Jobs pile up; timers fire late

Root causes (not “we need more threads”)

Slow dependency — DB, cache, payment API; each request holds a thread for seconds.
No timeouts — hung HTTP client or JDBC connection never releases the worker.
Traffic spike — legitimate burst exceeds pool × acceptable latency.
Blocking inside async — .get() on CompletableFuture on the request thread, or parallelStream() + blocking I/O on common pool.
Pool too small vs connection pool — 200 Tomcat threads but 20 DB connections: 180 threads block on Hikari getConnection() — see DB pool guide.
Hidden sync — many threads BLOCKED on one lock; pool looks full while work is serialized.
Deadlock — all workers in a cycle; zero throughput.
Thread leak — tasks submitted to a static pool never complete (rare but catastrophic).

Bigger maxThreads is not a free fix. More threads → more stack memory, more concurrent DB connections, more lock contention. Often the fix is faster or bounded work per request, not more parallelism.

What — confirm exhaustion and find what threads wait on

Confirm pool saturation (metrics)
- Tomcat: tomcat.threads.busy ≈ tomcat.threads.config.max; queue / accept queue growing.
- Micrometer: executor.active, executor.queued, executor.completed flatlines while load continues.
- Errors: HTTP 503, org.apache.tomcat.util.threads.ThreadPoolExecutor rejections, Spring TaskRejectedException.
Thread dump while unhealthy
```
jcmd <pid> Thread.print > /tmp/pool-$(date +%s).txt
```
Count threads named http-nio-8080-exec-* (or your executor prefix). If almost all exist and few are idle → exhaustion confirmed.

Classify what busy threads are doing

Stack top	Likely cause
`socketRead0`, JDBC driver	Slow or stuck DB / network
`HttpClient` / OkHttp read	Downstream API no timeout
`HikariPool.getConnection`	Connection pool smaller than thread demand
`BLOCKED` / `waiting to lock`	Lock contention — see BLOCKED guide
JVM “Found Java-level deadlock”	Deadlock
`TIMED_WAITING` on pool queue	Submitters waiting for a worker — secondary symptom
`ForkJoinPool.awaitJoin`	Blocking common pool; move blocking off FJP

Correlate with dependency latency — p99 DB/API up at same time as busy threads? Trace one slow request (distributed trace span: where did 8s go?).
Check connection pool vs thread pool
```
# Hikari (example JMX / metrics)
hikaricp.connections.active ≈ maximum
hikaricp.connections.pending > 0
```
Many threads in getConnection → size or speed mismatch, not Tomcat alone.
Review reject policy — AbortPolicy fails fast; CallerRunsPolicy blocks the caller thread (can deadlock servlet accept path if misused).
Load vs capacity math (rough)
Little’s law: concurrency ≈ throughput × latency. If RPS × p99 latency > maxThreads, you need lower latency, fewer concurrent slow calls, or more workers (last resort).

ForkJoinPool.commonPool() specifically

parallelStream() and default CompletableFuture.supplyAsync use the common pool. Blocking I/O on those threads reduces effective parallelism for the whole JVM. Thread dump: many ForkJoinPool.commonPool-worker-* in socket read or synchronized.

// Anti-pattern: blocking HTTP on common pool
list.parallelStream().forEach(id -> httpClient.get("/item/" + id));

// Better: dedicated executor with bounded queue + timeouts
executor.submit(() -> httpClient.get(...));

Capture before restart

Thread dump + 1 min of metrics (threads busy, queue, DB pool, downstream p99).
Recent deploy / feature flag / traffic pattern change.
Sample slow trace IDs for log correlation.

How — restore service and fix the bottleneck

Immediate mitigation

Scale replicas — spreads load if dependency is healthy; does not fix per-pod stuck threads.
Throttle at edge — API gateway rate limit, load shed non-critical routes.
Circuit break / disable feature — stop calling slow dependency until recovered.
Temporary maxThreads bump — only with headroom on DB connections and memory; watch for worse contention.
Restart stuck pods — after dumps saved; if threads were deadlocked or leaked.

Durable fixes

Lever	Action
Timeouts everywhere	JDBC, HTTP client, Redis—fail fast; return 504/503 with retry-safe semantics
Align pool sizes	`maxThreads` ≤ DB pool capacity you can afford; or async DB access with smaller blocking footprint
Bulkhead	Separate executor for heavy/reporting vs API; cap concurrent calls to fragile dependency
Backpressure	Bounded queue + reject; don’t accept infinite work
Virtual threads (Java 21+)	Many blocking I/O requests on cheap carriers; still bound DB and CPU-heavy work
Async I/O model	WebFlux / reactive only if team commits; don’t block event loop
Fix slow path	Index, cache, batch API calls, remove N+1 queries
Never block common FJP	Custom `Executor` for blocking tasks

Sizing guidance (starting point)

CPU-bound work: threads ≈ CPU cores (or cores + small constant).
I/O-bound work: higher thread count can help if dependencies respond quickly; otherwise fix latency first.
Hikari: often maximumPoolSize modest (10–50 per instance); total across replicas must fit DB max_connections.
Load test: ramp RPS until p99 SLO breaks; note thread busy % at that point.

Verify

Peak-load test: busy threads < 80% max under target RPS.
No growth in accept queue / executor queue at steady state.
Thread dumps: mix of idle workers; blocked threads only briefly.
Alerts: busy threads > 85% for 5 min; pending Hikari connections > 0.

Interview one-liner

“I check whether all request threads are busy, take a dump to see if they’re waiting on DB/HTTP or locks, align connection pool with thread demand, add timeouts and bulkheads, and only then tune maxThreads—after fixing slow dependencies.”

Related scenarios

Deadlock
Thread BLOCKED
Shared data races
Randomly unresponsive
Slow after a few hours — leak vs pool saturation