RSS grows but the Java heap looks stable

Scenario

Dashboards show heap used flat at 60% of -Xmx, but container RSS (resident memory) climbs until the pod is OOMKilled with no OutOfMemoryError in logs. Or top shows the Java process much larger than heap charts suggest. What is eating memory outside the heap?

After reading, you should be able to:

Related: Crash without clear errors, OutOfMemoryError, Memory leak (heap-only leaks).

Why — the JVM uses more than the heap

Heap (-Xmx) holds your Java objects. RSS is physical RAM the OS assigns to the process—it includes heap plus everything else. Kubernetes cgroup memory limits apply to RSS (and cache pressure), not to “heap used” in JMX.

What lives outside the Java heap

RegionControlled byGrows when
Metaspace-XX:MaxMetaspaceSizeClasses loaded, classloader leaks, dynamic proxies
Direct / off-heap buffers-XX:MaxDirectMemorySize (default ≈ -Xmx)Netty, gRPC, NIO, Kafka clients, large ByteBuffer.allocateDirect
Thread stacks-Xss × thread countThousands of threads (pool misconfig, leak)
Code cache-XX:ReservedCodeCacheSizeJIT compilation, many hot methods
GC structuresCollector-specificLarge heap, G1 regions
JNI / native librariesLibrary mallocEmbedded DB, crypto, compression, buggy .so
JVM internal / malloc arenaglibc allocatorNative allocations; may not return RAM to OS
Mapped filesMappedByteBuffer, memory-mapped indexes
Rough mental model (4 GiB container):

  -Xmx2g          → Java objects (what JMX "heap used" tracks)
  Metaspace      → 150–300 MiB typical
  Direct memory  → 200 MiB – 1 GiB+ (Netty-heavy services)
  Thread stacks  → 500 threads × 1 MiB = 500 MiB
  Code cache + GC + JVM overhead → 200–400 MiB
  ─────────────────────────────────────────
  RSS can exceed 3.5 GiB while "heap" shows 1.2 GiB used

Why RSS keeps climbing while heap is flat

Do not use heap charts alone for K8s resources.limits.memory. Size for total process RSS with headroom, or use NMT to measure components.

What — measure the gap (in order)

  1. Compare three numbers on the same graph JMX heap used, process RSS (container_working_set), cgroup memory.limit. Gap widening over time = off-heap growth.
  2. Enable Native Memory Tracking (staging first)
    -XX:NativeMemoryTracking=summary
    # or detail for deep dives (more overhead)
    
    jcmd <pid> VM.native_memory summary
    jcmd <pid> VM.native_memory summary.diff
    Run baseline then summary.diff after load test to see what grew.
  3. Read NMT categories Look for Internal, Thread, Code, GC, Class (metaspace), Other. Largest delta = investigation target.
  4. Check direct buffer pools JMX: java.nio:type=BufferPool,name=direct — MemoryUsed, Count, TotalCapacity. Netty: leak detection via resource leak detector in staging (-Dio.netty.leakDetection.level=paranoid).
  5. Thread count over time jcmd <pid> Thread.print | count lines; JMX ThreadCount. Climbing threads → stack RSS climb.
  6. Metaspace JMX Metaspace Used; NMT Class row. After redeploy without restart → classloader leak suspect.
  7. pmap / smaps (Linux)
    pmap -x <pid> | tail -1    # total RSS
    cat /proc/<pid>/smaps_rollup
    Large anonymous regions outside heap mapping → native or direct.
  8. Rule out “allocator hoarding” RSS high but stable after load drops—may be glibc not returning memory to OS. Test with MALLOC_ARENA_MAX=2 or -XX:+UseContainerSupport (default on JDK 10+) in container.

Example NMT summary (interpretation)

Total: reserved=4194304KB, committed=3145728KB
-                 Java Heap (reserved=2097152KB, committed=1572864KB)
-                     Class (reserved=262144KB, committed=180224KB)  ← metaspace
-                    Thread (reserved=524288KB, committed=524288KB)  ← many threads
-                      Code (reserved=245760KB, committed=120000KB)
-                        GC (reserved=...)

If Java Heap committed is steady but Total committed rises, focus on non-heap rows in summary.diff.

Container vs JVM view

Metric sourceWhat it shows
JMX heap used / maxJava objects only
container_memory_working_set_bytesWhat counts toward K8s limit (≈ RSS)
NMT Total committedJVM’s view of reserved native + heap
kubectl top podCurrent working set—quick check

How — fix, size containers, monitor

Size Kubernetes memory limit

memory_limit ≥ Xmx
                 + MaxMetaspaceSize (or observed metaspace peak)
                 + expected direct memory peak
                 + (thread_count × stack_size)
                 + 300–500 MiB JVM/GC/misc headroom

Example: -Xmx2g, heavy Netty → use 3–4 GiB limit, not 2 GiB.

Fix by category

NMT / symptomFix
Direct memoryRelease buffers; cap Netty pools; -XX:MaxDirectMemorySize=512m; fix leak
ThreadReduce pools; fix thread leak; lower -Xss if stacks huge
Class / metaspaceFix classloader leak; set MaxMetaspaceSize; restart on redeploy
Internal / OtherJDK upgrade; identify JNI library; async-profiler native alloc
Heap flat but RSS at limitRaise limit or lower off-heap consumers—not raise -Xmx alone

Flags for production visibility

-XX:+UseContainerSupport
-XX:MaxRAMPercentage=75.0          # optional: derive max heap from cgroup
-XX:NativeMemoryTracking=summary  # staging/debug pods only; small overhead
-XX:MaxDirectMemorySize=512m
-XX:MaxMetaspaceSize=256m

Do not leave NativeMemoryTracking=detail on all prod pods—use on canary or during incidents.

Monitoring

When heap leak tools mislead

MAT heap dumps only show Java heap objects. If RSS grows and heap dumps look fine, switch to NMT diff and direct buffer metrics—not another heap dump.

Interview one-liner

“Heap used is only the object heap; RSS includes metaspace, thread stacks, direct buffers, and native code. I compare container working set to JMX heap, run jcmd VM.native_memory summary.diff, check direct BufferPool and thread count, then size the cgroup limit for total footprint—not just -Xmx.”

Related scenarios