Storage & Volumes

Storage options

Docker offers four ways to expose storage inside a container. They differ in who manages the data, where it lives on the host, and whether it survives container deletion. Think of them as four parking garages: volumes are a managed lot, bind mounts are your driveway, tmpfs is a valet that forgets everything at shutdown, and named pipes are a direct phone line between processes.

The four mount types

Type	Managed by	Host location	Survives docker rm	Best for
Volume	Docker daemon	/var/lib/docker/volumes/	Yes (until docker volume rm)	Production databases, shared app data, backups
Bind mount	You (explicit host path)	Any host directory or file	Yes (data is on host filesystem)	Dev hot-reload, config injection, host tooling
tmpfs	Kernel (RAM)	Not on disk—memory only	No (gone when container stops)	Secrets, temp caches, sensitive scratch data
Named pipe / FIFO	OS IPC	Host socket or pipe path	N/A (IPC channel, not storage)	Docker API socket, syslog, sidecar IPC

┌─── CONTAINER ─────────────────────────────────────────────┐
│  /app/data  ←── volume (mydata)     [Docker-managed]      │
│  /src       ←── bind mount ./src      [host path]          │
│  /run/secrets ← tmpfs               [RAM, no disk]        │
│  /var/run/docker.sock ← bind mount  [named socket/pipe]   │
├───────────────────────────────────────────────────────────┤
│  Writable container layer (upperdir) — ephemeral CoW      │
├───────────────────────────────────────────────────────────┤
│  Read-only image layers (lowerdir)                        │
└───────────────────────────────────────────────────────────┘

Volumes — Docker-managed persistence

Volumes are the recommended default for persistent application data. Docker creates and tracks them; you reference them by name. They bypass the container's OverlayFS upperdir, so writes do not trigger copy-on-write penalties from image layers. Multiple containers can mount the same volume read-write or read-only.

Bind mounts — host path coupling

Bind mounts map a specific host path into the container. Docker does not manage the data—you do. The container sees the host inode directly. Powerful for development (edit locally, run in container) but couples deployment to host filesystem layout and UID/GID mapping.

tmpfs — memory-backed, ephemeral

tmpfs mounts store data in RAM (and swap). Nothing hits disk—ideal for credentials that must not persist, or high-churn temp files you do not want polluting the container layer. Size is bounded by --tmpfs size flags and available memory.

Named pipes and sockets

Not traditional storage, but a mount type worth knowing: binding a Unix socket or FIFO lets containers talk to host services. The classic example is mounting /var/run/docker.sock so a CI agent can spawn sibling containers—a powerful and dangerous pattern covered in security guides.

# Volume (named, managed)
docker run -d --name db -v pgdata:/var/lib/postgresql/data postgres:16

# Bind mount (host path)
docker run --rm -v $(pwd)/src:/app/src:ro node:20 npm test

# tmpfs (RAM only)
docker run --rm --tmpfs /run/secrets:rw,noexec,nosuid,size=64m myapp

# Named socket (IPC)
docker run -v /var/run/docker.sock:/var/run/docker.sock docker:cli

⚖️ Trade-off

Volume vs bind mount: Volumes are portable across hosts and orchestrators—Compose and Swarm reference them by name. Bind mounts tie a container to a specific host path, breaking portability but enabling instant dev feedback loops. Architects standardize on volumes for stateful services; developers use bind mounts locally.

🎯 Interview Tip

"Where does container data go when the container is removed?" — Only volumes and bind-mounted host paths survive. The writable container layer is deleted. Anonymous volumes survive until pruned, but have no stable name for reattachment.

Volumes deep dive

Volumes are first-class Docker objects with their own lifecycle, drivers, and CLI. Production teams treat them like mini databases—named, backed up, monitored for disk growth, and never left anonymous.

Volume lifecycle commands

Command	Purpose	Example
docker volume create	Provision a named volume (optionally with driver/labels)	docker volume create --name app_logs --label env=prod
docker volume ls	List volumes; filter by dangling, driver, label	docker volume ls -f dangling=true
docker volume inspect	JSON metadata: mountpoint, driver, labels, scope	docker volume inspect pgdata
docker volume rm	Delete one or more volumes (must not be in use)	docker volume rm pgdata_old
docker volume prune	Remove all unused volumes (destructive—confirm in prod)	docker volume prune -f

$ docker volume create --driver local --opt type=nfs \
  --opt o=addr=10.0.1.50,rw --opt device=:/exports/pgdata pgdata_nfs
pgdata_nfs

$ docker volume inspect pgdata_nfs --format '{{.Mountpoint}}'
/var/lib/docker/volumes/pgdata_nfs/_data

$ docker run -d --name postgres -v pgdata_nfs:/var/lib/postgresql/data postgres:16
a3f8c2d1e9b0...

$ docker volume ls --filter name=pgdata
DRIVER    VOLUME NAME
local     pgdata_nfs

Volume drivers

The default local driver stores data under Docker's data root. Plugin drivers extend storage to network filesystems and cloud block stores—critical for multi-host Swarm and legacy Docker EE deployments. Kubernetes uses its own CSI drivers instead, but the concepts transfer.

Driver	Backing	Use case	Notes
local	Host disk (/var/lib/docker/volumes/)	Single-node Docker, Compose dev stacks	Default; fast on Linux; bind subdirs via --opt type=none --opt device=… --opt o=bind
NFS	Remote NFS export	Shared storage across Swarm nodes	--opt type=nfs --opt o=addr=… --opt device=:/path
Cloud plugins	EBS, EFS, Azure Disk, GCE PD	Cloud-native persistent disks	Install vendor plugin; prefer CSI in Kubernetes
Third-party	NetApp, Portworx, RexRay, etc.	Enterprise HA storage	Plugin lifecycle separate from Docker upgrades

-v vs --mount syntax

Both attach storage, but --mount is explicit and recommended for production scripts. The shorthand -v is easy to misread—host vs container path order trips people up, and it silently creates host directories for bind mounts.

Scenario	-v shorthand	--mount (preferred)
Named volume	-v mydata:/app/data	--mount source=mydata,target=/app/data
Read-only volume	-v mydata:/app/data:ro	--mount source=mydata,target=/app/data,readonly
Bind mount	-v /host/path:/container/path	--mount type=bind,source=/host/path,target=/container/path
tmpfs	--tmpfs /run:rw,noexec,size=64m	--mount type=tmpfs,target=/run,tmpfs-size=67108864,tmpfs-mode=1777

Anonymous volumes

When you specify only a container path—-v /var/lib/mysql with no name—Docker creates an anonymous volume with a random ID. Data survives container removal unless you pass -v on docker rm (v1) or use docker rm -v. These accumulate as orphaned disk usage. Official database images use anonymous volumes in their Dockerfile VOLUME instruction—always replace with a named volume in production.

Permissions and ownership

Volume data is written by the container process UID/GID. If the image runs as UID 999 (postgres) but the host directory is owned by root, you get permission denied. Fixes:

docker run --user to match host ownership
Init container or entrypoint script that chowns the mount (common in dev)
COPY --chown=app:app in Dockerfile so the runtime user owns expected paths
Named volume (Docker initializes with root; database entrypoints fix permissions on first start)

Backup with tar

The canonical volume backup pattern: mount the volume into a throwaway container and stream a tar archive to stdout. Restore is the reverse—extract tar into a fresh volume. Works for any volume driver whose data is accessible locally.

# Backup: stream volume to host tar file
docker run --rm \
  -v pgdata:/source:ro \
  -v $(pwd):/backup \
  alpine tar czf /backup/pgdata-$(date +%Y%m%d).tar.gz -C /source .

# Restore: extract tar into a new volume
docker volume create pgdata_restored
docker run --rm \
  -v pgdata_restored:/target \
  -v $(pwd):/backup \
  alpine sh -c "cd /target && tar xzf /backup/pgdata-20260605.tar.gz"

# Verify restored volume
docker run --rm -v pgdata_restored:/data alpine ls -la /data

⚠️ Pitfall

Backing up a live database with plain tar can produce inconsistent snapshots. For PostgreSQL use pg_dump; for MySQL use mysqldump or Percona XtraBackup. Tar-on-volume is fine for file uploads, static assets, and stopped containers.

📦 Real World

Platform teams run nightly docker run --rm -v $VOL:/source:ro alpine tar czf - piped to S3, with retention policies. Pair with docker volume ls -f dangling=true in CI cleanup jobs—anonymous volumes from integration tests are a top disk-usage leak on shared runners.

⚙️ Compose

Declare named volumes at the top level so they survive docker compose down (without -v): volumes: pgdata: then services.db.volumes: [pgdata:/var/lib/postgresql/data]. Use external: true when a volume is provisioned outside Compose.

Bind mounts

Bind mounts are a direct window into the host filesystem. Edit a file in your IDE, save, and the running container sees the change instantly—no image rebuild. That speed comes with coupling, permission friction, and security exposure if you mount too much.

Dev hot reload pattern

The standard local development loop: bind-mount source code, keep dependencies in the image layer or a named volume. Spring Boot DevTools, Node nodemon, and Vite HMR all depend on the host writing files the container process watches.

# Spring Boot dev: mount source, cache Maven deps in named volume
docker run --rm -p 8080:8080 \
  -v $(pwd)/src:/app/src \
  -v maven_cache:/root/.m2 \
  -w /app \
  maven:3.9-eclipse-temurin-21 mvn spring-boot:run

# Node dev: mount everything except node_modules
docker run --rm -p 3000:3000 \
  -v $(pwd):/app \
  -v /app/node_modules \
  -w /app node:20 npm run dev

The anonymous volume trick -v /app/node_modules masks the bind mount for that path—container-installed node_modules wins over the (empty or macOS-incompatible) host directory.

Read-only bind mounts (:ro)

Mount configuration and secrets read-only so a compromised container cannot modify them. Compose: ./config.yml:/app/config.yml:ro. Kubernetes equivalent is readOnly: true on volume mounts. Read-only does not prevent reading— sensitive files still need proper permissions and tmpfs for runtime secrets.

Permissions and UID mapping

Symptom	Cause	Fix
Permission denied on write	Container UID ≠ host file owner	Run with --user $(id -u):$(id -g) in dev
Root-owned files on host after container exit	Container ran as root, wrote to bind mount	Use non-root USER in Dockerfile; fix with chown
SELinux Permission denied (RHEL/Fedora)	Container process lacks label for host path	Add :z (shared) or :Z (private) suffix
Docker Desktop Mac/Win slow file watch	FUSE/gRPC filesystem sync overhead	Named volume for node_modules; Mutagen/sync tools

Risks of bind mounts

Host path exposure — mounting / or /etc gives the container full host read (or write) access
Non-portable deploys — production paths differ; bind mounts break across machines
Accidental deletion — rm -rf inside the container can destroy host data
Docker creates missing dirs — -v /nonexistent:/data creates /nonexistent as root on the host

When to use bind mounts

Use case	Mount	Mode
Live code reload (dev)	./src:/app/src	rw
Inject nginx/site config	./nginx.conf:/etc/nginx/nginx.conf	ro
Share TLS certs from host	/etc/letsencrypt:/certs	ro
Log shipping to host agent	./logs:/var/log/app	rw
Docker-in-Docker (CI)	/var/run/docker.sock	rw (high risk)

🔒 Security

Never bind-mount /var/run/docker.sock into application containers in production—it grants effective root on the host. Restrict to dedicated CI builders with gVisor or rootless Podman alternatives. Treat every bind mount as expanding the container's blast radius.

💡 Pro Tip

In Compose, use develop.watch (Compose 2.22+) to sync files without full directory bind mounts—reduces macOS FUSE overhead while keeping hot reload. For pure bind-mount workflows, exclude heavy dirs via anonymous volumes.

Container layer (writable layer)

Every running container gets a thin writable layer on top of read-only image layers. It is convenient scratch space—and a trap for anyone who treats it like persistent storage. OverlayFS copy-on-write makes every first-write to a lower-layer file expensive.

How the writable layer works

Image layers form the lowerdir stack. Docker creates a unique upperdir per container—all runtime writes land here. The merged view is the container root filesystem. When the container is removed, upperdir is deleted. No backup, no recovery.

  READ path:  upperdir → lowerN → … → lower0  (first hit wins)
  WRITE path: file in lower? → COPY to upperdir (CoW) → write
              new file?      → create directly in upperdir

  ┌──────── upperdir (container writable layer) ────────┐
  │  /app/logs/app.log  ← created here at runtime        │
  │  /etc/hosts         ← copied from lower, then edited │
  ├──────── lowerdir (image layers, read-only) ─────────┤
  │  Layer 3: RUN npm run build                          │
  │  Layer 2: COPY package.json                          │
  │  Layer 1: FROM node:20-alpine                        │
  └──────────────────────────────────────────────────────┘
           merged → container sees unified /

Copy-on-write cost

CoW triggers when a container modifies a file that exists in a lower layer. Docker copies the entire file to upperdir before applying the change—even for a one-byte edit. High-churn workloads (database page writes, log rotation, package managers writing to /usr) amplify this cost. Reads are fast; first-writes are not.

Operation	Layer impact	Performance
Read file from image	Lower layer only	Fast (page cache)
Create new file	Written to upperdir	Normal disk I/O
Modify existing image file	Full file CoW to upperdir	Slow for large files
Delete file from image	Whiteout marker in upperdir	Cheap metadata op
Heavy random writes (DB)	All in upperdir + CoW churn	Poor—use volumes instead

Never run databases in the container layer

SQLite, PostgreSQL, MySQL, Redis persistence—all belong on volumes. Database engines mmap files and issue small random writes. On the container layer this means constant CoW copies, no crash-safe guarantees across container recreation, and data loss on docker rm. This is not theoretical—it is the #1 "my database disappeared" incident.

$ docker run -d --name web nginx:alpine
f4a2b8c1d3e7...

$ docker exec web sh -c 'echo custom > /usr/share/nginx/html/test.html'

$ docker diff web
C /usr
C /usr/share
C /usr/share/nginx
C /usr/share/nginx/html
A /usr/share/nginx/html/test.html

$ docker inspect --format '{{.GraphDriver.Data.UpperDir}}' web
/var/lib/docker/overlay2/a3f8…/diff

Understanding docker diff

docker diff <container> lists changes in the writable layer relative to the image:

A — Added file or directory
C — Changed (modified metadata or CoW'd content)
D — Deleted (whiteout over lower-layer file)

Use it to debug "why is my container using 2 GB of disk?"—often log files, package caches, or database files written to the wrong path. Pair with docker system df -v for per-container writable layer size.

⚠️ Pitfall

Writing logs to /var/log inside the container fills the writable layer and the host's Docker storage pool. Configure apps to log to stdout (twelve-factor) or mount a volume/bind for log files. Monitor with docker diff and docker system df.

🔬 Under the Hood

On overlay2, each container's upperdir lives under /var/lib/docker/overlay2/<id>/diff. The work/ sibling directory is OverlayFS internal state. Volumes mount into the merged view, bypassing upperdir for that path—writes go directly to the volume mount point.

Storage performance

Storage choice directly affects latency, throughput, and developer experience. Linux-native Docker is fast; Docker Desktop on macOS and Windows pays a heavy tax for cross-VM filesystem synchronization. Architects who ignore this ship stacks that crawl locally but fly in production—or vice versa.

OverlayFS read vs write characteristics

Reads from image layers are served from the page cache like normal files—excellent for read-heavy workloads (serving static assets from the image). Writes to the container layer incur CoW overhead. Volume writes skip OverlayFS entirely—they hit the volume driver backing store (local ext4/xfs, NFS, cloud block) with native filesystem performance.

flowchart TB
  subgraph reads["Read performance"]
    IMG[Image layer read] --> FAST[Fast — page cache]
    VOL_R[Volume read] --> FAST2[Fast — native FS]
    BIND_R[Bind mount read] --> VAR[Variable — host FS + sync layer]
  end
  subgraph writes["Write performance"]
    UPPER[Container layer write] --> COW[CoW penalty on first write]
    VOL_W[Volume write] --> NATIVE[Native FS — best for DB]
    BIND_W[Bind mount write] --> DESKTOP[Docker Desktop: FUSE/gRPC tax]
  end

Volume vs bind mount on Linux

Factor	Named volume	Bind mount
Linux native Docker I/O	Near-native (local driver)	Near-native (direct host path)
CoW interaction	Bypasses container upperdir	Bypasses container upperdir
Portability	Named, orchestrator-friendly	Host-path dependent
Permission setup	Docker-managed mountpoint	Host UID/GID must align
Backup ergonomics	Predictable path via inspect	You already know the path

On bare-metal or VM Linux hosts, volume and bind performance is comparable—both avoid CoW. Choose based on operability, not raw IOPS.

Docker Desktop: macOS and Windows FUSE slowness

Docker Desktop runs containers inside a Linux VM. Bind mounts cross a filesystem synchronization boundary (VirtioFS, gRPC FUSE, or osxfs depending on version). File-heavy operations— npm install, mvn compile, webpack watchers— can be 5–10× slower than on native Linux.

Store node_modules, .m2, .gradle in named volumes, not bind mounts
Enable VirtioFS (Docker Desktop settings) for improved macOS bind performance
Use :delegated or :cached mount consistency flags on older Desktop versions
Consider devcontainers with source inside the VM volume, synced selectively

💡 Pro Tip

Spring Boot dev on Docker Desktop: bind-mount only src/; keep target/ and ~/.m2 in named volumes. Run mvn compile inside the container so bytecode writes stay in the VM. This cuts rebuild time from minutes to seconds on Mac.

Databases always use volumes

Regardless of platform, stateful data stores belong on volumes—not bind mounts, not the container layer:

Crash consistency — volume data survives container restart and recreation
I/O path — no CoW, no cross-VM sync on Desktop when volume lives in the Linux VM
Backup — stable name for automation (pgdata, redis_data)
Upgrade path — swap container image, reattach same volume

# Compose: production-shaped storage layout
services:
  app:
    image: myapp:1.0
    volumes:
      - app_logs:/var/log/myapp      # named volume — logs
    # NO bind mount of source in prod

  postgres:
    image: postgres:16
    volumes:
      - pgdata:/var/lib/postgresql/data  # named volume — always
    environment:
      POSTGRES_PASSWORD_FILE: /run/secrets/db_pass
    secrets:
      - db_pass

  redis:
    image: redis:7-alpine
    volumes:
      - redis_data:/data
    command: redis-server --appendonly yes

volumes:
  pgdata:
  redis_data:
  app_logs:

Workload	Recommended storage	Why
PostgreSQL / MySQL / MongoDB	Named volume	IOPS, persistence, backup
Redis AOF/RDB	Named volume	Survive restarts; avoid CoW
Local dev source code	Bind mount (+ dep cache volume)	Hot reload; isolate slow dirs
Build caches (Maven, npm, cargo)	Named volume	Fast on Desktop; shared across runs
Runtime secrets	tmpfs or secrets mount	Never touch disk
Static assets in production	Bake into image	Read-only layers = fastest reads

⚖️ Trade-off

Dev speed vs prod fidelity: Bind mounts optimize developer iteration; volumes mirror production I/O paths. CI should test with volume-backed databases, not bind mounts or container-layer SQLite, to catch permission and persistence bugs before deploy.

🎯 Interview Tip

"Why is my Docker setup slow on Mac but fast in production?" — Explain the VM boundary, FUSE/gRPC sync for bind mounts, and the fix: named volumes for I/O-heavy dirs, VirtioFS, minimize cross-boundary file churn. Mention CoW only applies to the container layer, not volumes.