RSS grows but the Java heap looks stable

Scenario

Dashboards show heap used flat at 60% of -Xmx, but container RSS (resident memory) climbs until the pod is OOMKilled with no OutOfMemoryError in logs. Or top shows the Java process much larger than heap charts suggest. What is eating memory outside the heap?

After reading, you should be able to:

Map JVM memory regions beyond the Java heap.
Use Native Memory Tracking (NMT) and container metrics to find the gap.
Explain why -Xmx alone does not size a Kubernetes memory limit.
Fix direct buffers, metaspace, threads, and native leaks.

Related: Crash without clear errors, OutOfMemoryError, Memory leak (heap-only leaks).

Why — the JVM uses more than the heap

Heap (-Xmx) holds your Java objects. RSS is physical RAM the OS assigns to the process—it includes heap plus everything else. Kubernetes cgroup memory limits apply to RSS (and cache pressure), not to “heap used” in JMX.

What lives outside the Java heap

Region	Controlled by	Grows when
Metaspace	`-XX:MaxMetaspaceSize`	Classes loaded, classloader leaks, dynamic proxies
Direct / off-heap buffers	`-XX:MaxDirectMemorySize` (default ≈ `-Xmx`)	Netty, gRPC, NIO, Kafka clients, large `ByteBuffer.allocateDirect`
Thread stacks	`-Xss` × thread count	Thousands of threads (pool misconfig, leak)
Code cache	`-XX:ReservedCodeCacheSize`	JIT compilation, many hot methods
GC structures	Collector-specific	Large heap, G1 regions
JNI / native libraries	Library malloc	Embedded DB, crypto, compression, buggy .so
JVM internal / malloc arena	glibc allocator	Native allocations; may not return RAM to OS
Mapped files	—	`MappedByteBuffer`, memory-mapped indexes

Rough mental model (4 GiB container):

  -Xmx2g          → Java objects (what JMX "heap used" tracks)
  Metaspace      → 150–300 MiB typical
  Direct memory  → 200 MiB – 1 GiB+ (Netty-heavy services)
  Thread stacks  → 500 threads × 1 MiB = 500 MiB
  Code cache + GC + JVM overhead → 200–400 MiB
  ─────────────────────────────────────────
  RSS can exceed 3.5 GiB while "heap" shows 1.2 GiB used

Why RSS keeps climbing while heap is flat

Native leak — C/C++ library or JNI not freeing; RSS up, heap flat.
Direct buffer leak — buffers not released; counted in native, not heap.
Metaspace leak — classloaders retained; metaspace grows unbounded until cap.
Thread leak — new threads never stopped; stack memory adds up.
Allocator does not unmap — glibc holds freed native pages; RSS high after spike (may not be leak).
Wrong limit math — you sized container for -Xmx only → OOMKilled.

Do not use heap charts alone for K8s resources.limits.memory. Size for total process RSS with headroom, or use NMT to measure components.

What — measure the gap (in order)

Compare three numbers on the same graph JMX heap used, process RSS (container_working_set), cgroup memory.limit. Gap widening over time = off-heap growth.

Enable Native Memory Tracking (staging first)

-XX:NativeMemoryTracking=summary
# or detail for deep dives (more overhead)

jcmd <pid> VM.native_memory summary
jcmd <pid> VM.native_memory summary.diff

Run baseline then summary.diff after load test to see what grew.

Read NMT categories Look for Internal, Thread, Code, GC, Class (metaspace), Other. Largest delta = investigation target.
Check direct buffer pools JMX: java.nio:type=BufferPool,name=direct — MemoryUsed, Count, TotalCapacity. Netty: leak detection via resource leak detector in staging (-Dio.netty.leakDetection.level=paranoid).
Thread count over time jcmd <pid> Thread.print | count lines; JMX ThreadCount. Climbing threads → stack RSS climb.
Metaspace JMX Metaspace Used; NMT Class row. After redeploy without restart → classloader leak suspect.
pmap / smaps (Linux)
```
pmap -x <pid> | tail -1    # total RSS
cat /proc/<pid>/smaps_rollup
```
Large anonymous regions outside heap mapping → native or direct.
Rule out “allocator hoarding” RSS high but stable after load drops—may be glibc not returning memory to OS. Test with MALLOC_ARENA_MAX=2 or -XX:+UseContainerSupport (default on JDK 10+) in container.

Example NMT summary (interpretation)

Total: reserved=4194304KB, committed=3145728KB
-                 Java Heap (reserved=2097152KB, committed=1572864KB)
-                     Class (reserved=262144KB, committed=180224KB)  ← metaspace
-                    Thread (reserved=524288KB, committed=524288KB)  ← many threads
-                      Code (reserved=245760KB, committed=120000KB)
-                        GC (reserved=...)

If Java Heap committed is steady but Total committed rises, focus on non-heap rows in summary.diff.

Container vs JVM view

Metric source	What it shows
JMX heap used / max	Java objects only
`container_memory_working_set_bytes`	What counts toward K8s limit (≈ RSS)
NMT Total committed	JVM’s view of reserved native + heap
`kubectl top pod`	Current working set—quick check

How — fix, size containers, monitor

Size Kubernetes memory limit

memory_limit ≥ Xmx
                 + MaxMetaspaceSize (or observed metaspace peak)
                 + expected direct memory peak
                 + (thread_count × stack_size)
                 + 300–500 MiB JVM/GC/misc headroom

Example: -Xmx2g, heavy Netty → use 3–4 GiB limit, not 2 GiB.

Fix by category

NMT / symptom	Fix
Direct memory	Release buffers; cap Netty pools; `-XX:MaxDirectMemorySize=512m`; fix leak
Thread	Reduce pools; fix thread leak; lower `-Xss` if stacks huge
Class / metaspace	Fix classloader leak; set `MaxMetaspaceSize`; restart on redeploy
Internal / Other	JDK upgrade; identify JNI library; async-profiler native alloc
Heap flat but RSS at limit	Raise limit or lower off-heap consumers—not raise `-Xmx` alone

Flags for production visibility

-XX:+UseContainerSupport
-XX:MaxRAMPercentage=75.0          # optional: derive max heap from cgroup
-XX:NativeMemoryTracking=summary  # staging/debug pods only; small overhead
-XX:MaxDirectMemorySize=512m
-XX:MaxMetaspaceSize=256m

Do not leave NativeMemoryTracking=detail on all prod pods—use on canary or during incidents.

Monitoring

Dashboard panel: RSS - heap_used (“off-heap gap”) trending up = alert.
JMX scrape: direct buffer pool MemoryUsed, Metaspace Used, ThreadCount.
Alert: working set > 90% of limit while heap < 70% of max → sizing/off-heap issue.

When heap leak tools mislead

MAT heap dumps only show Java heap objects. If RSS grows and heap dumps look fine, switch to NMT diff and direct buffer metrics—not another heap dump.

Interview one-liner

“Heap used is only the object heap; RSS includes metaspace, thread stacks, direct buffers, and native code. I compare container working set to JMX heap, run jcmd VM.native_memory summary.diff, check direct BufferPool and thread count, then size the cgroup limit for total footprint—not just -Xmx.”

Related scenarios

Crash without clear errors — OOMKilled when RSS hits limit.
Memory leak — when heap objects are the retainers.
Metaspace growth — classloader leaks.
Thread pool exhausted (coming soon)