Metaspace keeps growing
Scenario
Metaspace (or “class metadata”) usage climbs over days. The Java heap looks fine, but you see OutOfMemoryError: Metaspace, long full GC pauses, or RSS growth in the off-heap gap. It often starts after many hot deploys or enabling a dynamic-code feature. How do you confirm a classloader leak vs normal class loading?
After reading, you should be able to:
- Explain what metaspace stores and how it differs from the Java heap.
- Spot classloader leaks vs legitimate class growth.
- Use JMX, NMT, and class histograms to find retainers.
- Apply
MaxMetaspaceSizeand deployment practices that prevent runaway growth.
JDK 8+ uses metaspace in native memory (not PermGen). JDK 7 and earlier used PermGen in the heap—treat as legacy if you still support it.
Why — classes stay loaded until their classloader is collected
Every loaded class needs metadata: bytecodes, constant pools, method tables, annotations. Metaspace holds that data in native memory. A class becomes eligible for unloading only when its defining classloader is garbage-collected. If something still references the loader, every class it loaded stays in metaspace.
Normal growth vs leak
| Pattern | Metaspace | Typical cause |
|---|---|---|
| Plateau after warm-up | Rises then flat | App loaded all frameworks; CGLIB proxies generated once |
| Stair-step with redeploys | Jump per deploy, never drops | Classloader leak — old WAR loaders retained |
| Linear over days | Steady climb | Dynamic scripts, per-tenant classloaders, plugin architecture bug |
| Spike at feature flag | One-time jump | New library path loading many classes (may be OK) |
Common causes in production
- Hot redeploy without process restart — Tomcat/JBoss “reload” creates new classloaders; old ones leaked via static singletons,
ThreadLocal, or JDBC driver registration. - Spring / CGLIB / ByteBuddy — many synthetic subclasses; usually bounded unless generated per request with new loader.
- Groovy, JSP, dynamic rules engines — compile on the fly → new classes each eval if not cached carefully.
- OSGi / plugin jars — load/unload cycles; one plugin version retained in a global map.
- Reflection & serialization libraries — cache
Classobjects or loaders incorrectly. - JDBC drivers —
DriverManagerholds driver classloader reference (classic leak on redeploy). - No metaspace cap — default can grow until native OOM; looks like “RSS leak” before Java reports Metaspace OOM.
Heap MAT won’t show metaspace. Use metaspace metrics, NMT Class category, and classloader histograms—not heap dumps alone.
Symptoms you see
java.lang.OutOfMemoryError: Metaspace(clear signal).- Full GC trying to unload classes; long pauses (GC pauses).
- RSS up, heap flat (heap vs RSS).
- Pod OOMKilled if total native memory exceeds cgroup limit before Metaspace OOM is thrown.
What — confirm and find the leaking loader (in order)
-
Graph metaspace used vs time
JMX:
java.lang:type=MemoryPool,name=Metaspace—Usage.used,Usage.max(if capped). Overlay deploy times and request rate. - Correlate with redeploys Stair-step that never falls after each deploy → classloader leak suspicion high.
-
NMT: Class category
jcmd <pid> VM.native_memory summary | grep -A2 Class jcmd <pid> VM.native_memory summary.diff
GrowingClasscommitted with flat heap → metaspace focus. -
Classloader count (if available)
Some APMs expose loaded classloader count; or analyze heap dump for
ClassLoaderinstances (heap dump *does* show loader objects even when metaspace is native). -
Heap dump: duplicate classloaders for same app
In MAT: histogram
ClassLoader→ group by class name of loader (WebappClassLoader,LaunchedURLClassLoader). Hundreds of dead app loaders = leak. -
Path to GC roots from old classloader
Find why loader is retained—static field on singleton,
Thread,ThreadLocal, JMX bean,DriverManager. -
Loaded class count
JMX
java.lang:type=ClassLoading—LoadedClassCount. Should stabilize; monotonic increase = leak or unbounded codegen. - Check for dynamic compilation Search config for Groovy, Janino, MVEL, runtime Java compile, per-request proxies.
Tomcat / Spring Boot redeploy checklist
- Unregister JDBC drivers in servlet
contextDestroyed. - Clear
ThreadLocalin filters; shut down thread pools started by old context. - Avoid storing application
ClassLoaderin static fields on “container” singletons. - Prefer rolling restart (new pod) over in-place WAR reload in production.
Distinguish from heap leak
| Signal | Metaspace issue | Heap issue |
|---|---|---|
| OOM message | Metaspace | Java heap space |
| MAT dominators | Many Class, loaders | Large byte[], collections |
| After full GC | Class count may drop slightly | Heap may reclaim |
| Trigger | Redeploy, dynamic code | Caches, sessions, leaks of objects |
How — cap, fix loaders, operate safely
Set a metaspace limit
-XX:MaxMetaspaceSize=256m -XX:MetaspaceSize=128m # initial commit (optional)
Fail fast with clear OOM instead of silent RSS growth. Include metaspace in container limit math.
Fix classloader leaks
- Break static reference from long-lived object to app classloader (or anything loaded by it).
- Remove/shutdown threads created by old deployment.
ThreadLocal.remove()on pooled threads after request.- Deregister JDBC drivers:
DriverManager.deregisterDriver(driver)on shutdown. - Stop creating a new classloader per request—reuse one generator with bounded cache.
Operational practices
- Kubernetes: new ReplicaSet = new JVM; avoid in-process hot reload in prod.
- Canary deploy: fresh pods shed old metaspace automatically.
- Scheduled rolling restart only as temporary mitigation—fix leak before accepting debt.
Framework-specific notes
- Spring Boot devtools — restart classloader in dev only; never in prod image.
- Hibernate / JPA — ensure
EntityManagerFactoryclosed per context; one EMF per deployment unit. - ByteBuddy / Mockito inline — test scope only; verify not bundled in prod fat jar incorrectly.
Monitoring and alerts
- Alert: metaspace used > 85% of
MaxMetaspaceSizefor 10 min. - Alert:
LoadedClassCountincreasing 24h at flat traffic. - Dashboard: metaspace + heap + RSS on one row per pod.
Interview one-liner
“Metaspace holds class metadata in native memory; classes unload only when their classloader is collected. I graph metaspace vs deploys, use NMT Class diff and ClassLoader histograms in MAT, find static or ThreadLocal roots to old loaders, set MaxMetaspaceSize, and prefer pod restarts over hot redeploy in production.”
Related scenarios
- Heap vs RSS
- Memory leak (heap objects)
- Crash without clear errors
- Slow after a few hours
JVM & runtime section complete. Next topics on the production hub cover CPU, threads, databases, and distributed systems.