Metaspace keeps growing

Scenario

Metaspace (or “class metadata”) usage climbs over days. The Java heap looks fine, but you see OutOfMemoryError: Metaspace, long full GC pauses, or RSS growth in the off-heap gap. It often starts after many hot deploys or enabling a dynamic-code feature. How do you confirm a classloader leak vs normal class loading?

After reading, you should be able to:

Explain what metaspace stores and how it differs from the Java heap.
Spot classloader leaks vs legitimate class growth.
Use JMX, NMT, and class histograms to find retainers.
Apply MaxMetaspaceSize and deployment practices that prevent runaway growth.

JDK 8+ uses metaspace in native memory (not PermGen). JDK 7 and earlier used PermGen in the heap—treat as legacy if you still support it.

Why — classes stay loaded until their classloader is collected

Every loaded class needs metadata: bytecodes, constant pools, method tables, annotations. Metaspace holds that data in native memory. A class becomes eligible for unloading only when its defining classloader is garbage-collected. If something still references the loader, every class it loaded stays in metaspace.

Normal growth vs leak

Pattern	Metaspace	Typical cause
Plateau after warm-up	Rises then flat	App loaded all frameworks; CGLIB proxies generated once
Stair-step with redeploys	Jump per deploy, never drops	Classloader leak — old WAR loaders retained
Linear over days	Steady climb	Dynamic scripts, per-tenant classloaders, plugin architecture bug
Spike at feature flag	One-time jump	New library path loading many classes (may be OK)

Common causes in production

Hot redeploy without process restart — Tomcat/JBoss “reload” creates new classloaders; old ones leaked via static singletons, ThreadLocal, or JDBC driver registration.
Spring / CGLIB / ByteBuddy — many synthetic subclasses; usually bounded unless generated per request with new loader.
Groovy, JSP, dynamic rules engines — compile on the fly → new classes each eval if not cached carefully.
OSGi / plugin jars — load/unload cycles; one plugin version retained in a global map.
Reflection & serialization libraries — cache Class objects or loaders incorrectly.
JDBC drivers — DriverManager holds driver classloader reference (classic leak on redeploy).
No metaspace cap — default can grow until native OOM; looks like “RSS leak” before Java reports Metaspace OOM.

Heap MAT won’t show metaspace. Use metaspace metrics, NMT Class category, and classloader histograms—not heap dumps alone.

Symptoms you see

java.lang.OutOfMemoryError: Metaspace (clear signal).
Full GC trying to unload classes; long pauses (GC pauses).
RSS up, heap flat (heap vs RSS).
Pod OOMKilled if total native memory exceeds cgroup limit before Metaspace OOM is thrown.

What — confirm and find the leaking loader (in order)

Graph metaspace used vs time JMX: java.lang:type=MemoryPool,name=Metaspace — Usage.used, Usage.max (if capped). Overlay deploy times and request rate.
Correlate with redeploys Stair-step that never falls after each deploy → classloader leak suspicion high.

NMT: Class category

jcmd <pid> VM.native_memory summary | grep -A2 Class
jcmd <pid> VM.native_memory summary.diff

Growing Class committed with flat heap → metaspace focus.

Classloader count (if available) Some APMs expose loaded classloader count; or analyze heap dump for ClassLoader instances (heap dump *does* show loader objects even when metaspace is native).
Heap dump: duplicate classloaders for same app In MAT: histogram ClassLoader → group by class name of loader (WebappClassLoader, LaunchedURLClassLoader). Hundreds of dead app loaders = leak.
Path to GC roots from old classloader Find why loader is retained—static field on singleton, Thread, ThreadLocal, JMX bean, DriverManager.
Loaded class count JMX java.lang:type=ClassLoading — LoadedClassCount. Should stabilize; monotonic increase = leak or unbounded codegen.
Check for dynamic compilation Search config for Groovy, Janino, MVEL, runtime Java compile, per-request proxies.

Tomcat / Spring Boot redeploy checklist

Unregister JDBC drivers in servlet contextDestroyed.
Clear ThreadLocal in filters; shut down thread pools started by old context.
Avoid storing application ClassLoader in static fields on “container” singletons.
Prefer rolling restart (new pod) over in-place WAR reload in production.

Distinguish from heap leak

Signal	Metaspace issue	Heap issue
OOM message	`Metaspace`	`Java heap space`
MAT dominators	Many `Class`, loaders	Large `byte[]`, collections
After full GC	Class count may drop slightly	Heap may reclaim
Trigger	Redeploy, dynamic code	Caches, sessions, leaks of objects

How — cap, fix loaders, operate safely

Set a metaspace limit

-XX:MaxMetaspaceSize=256m
-XX:MetaspaceSize=128m          # initial commit (optional)

Fail fast with clear OOM instead of silent RSS growth. Include metaspace in container limit math.

Fix classloader leaks

Break static reference from long-lived object to app classloader (or anything loaded by it).
Remove/shutdown threads created by old deployment.
ThreadLocal.remove() on pooled threads after request.
Deregister JDBC drivers: DriverManager.deregisterDriver(driver) on shutdown.
Stop creating a new classloader per request—reuse one generator with bounded cache.

Operational practices

Kubernetes: new ReplicaSet = new JVM; avoid in-process hot reload in prod.
Canary deploy: fresh pods shed old metaspace automatically.
Scheduled rolling restart only as temporary mitigation—fix leak before accepting debt.

Framework-specific notes

Spring Boot devtools — restart classloader in dev only; never in prod image.
Hibernate / JPA — ensure EntityManagerFactory closed per context; one EMF per deployment unit.
ByteBuddy / Mockito inline — test scope only; verify not bundled in prod fat jar incorrectly.

Monitoring and alerts

Alert: metaspace used > 85% of MaxMetaspaceSize for 10 min.
Alert: LoadedClassCount increasing 24h at flat traffic.
Dashboard: metaspace + heap + RSS on one row per pod.

Interview one-liner

“Metaspace holds class metadata in native memory; classes unload only when their classloader is collected. I graph metaspace vs deploys, use NMT Class diff and ClassLoader histograms in MAT, find static or ThreadLocal roots to old loaders, set MaxMetaspaceSize, and prefer pod restarts over hot redeploy in production.”

Related scenarios

JVM & runtime section complete. Next topics on the production hub cover CPU, threads, databases, and distributed systems.