Docker Images & Layers

An image is not a single file—it is an ordered stack of read-only layers, each a filesystem diff, content-addressed by SHA256 digest. Every Dockerfile instruction is a potential layer and a potential cache boundary. Master layer ordering and you master fast CI builds, small production images, and reproducible deploys.

developer devops BuildKit Docker 24+

Image anatomy

Think of an image as a git repository frozen in time—each layer is a commit (filesystem diff), the manifest is the branch pointer, and the config is metadata about how to run the result.

What an image contains

  • Ordered read-only layers — each layer is a tar archive of files added, modified, or deleted (whiteout markers for deletes)
  • Image manifest — JSON listing layer digests + config digest; may be a manifest list for multi-arch
  • Image config — env vars, entrypoint, cmd, exposed ports, labels, build history metadata
  • Content-addressed storage — layers identified by SHA256 digest; same bytes = same digest everywhere

Manifest and config

When you docker pull nginx:1.25, the client fetches the manifest for that tag, then pulls each layer blob by digest. The tag is a mutable pointer; the digest (nginx@sha256:abc…) is immutable. Production deploys should pin digests.

Dangling images

After rebuilding, old layer chains lose their tag—they become dangling images (<none> in docker image ls). They still consume disk until pruned. CI runners accumulate these rapidly without docker image prune or registry lifecycle policies.

bash
# Inspect image layers and sizes
docker history --no-trunc myapp:1.0

# Show manifest digest (immutable reference)
docker inspect --format '{{index .RepoDigests 0}}' nginx:1.25-alpine

# Find dangling images
docker images -f dangling=true
🔬 Under the Hood

Locally, layers live under /var/lib/docker/overlay2/ (or containerd's content store). Registry storage is identical in structure—blobs keyed by digest. This is why pulling an image someone else built reuses layers you already have.

🎯 Interview Tip

"What's the difference between an image and a container?" — An image is read-only layers + config. A container is an image + a writable container layer + runtime config (network, mounts, cgroup limits). Many containers can share one image's lower layers.

Layer caching

Docker's build cache is the difference between a 30-second CI build and a 10-minute one. Each instruction is a cache key; change one line and every layer after it rebuilds.

How the cache works

For each Dockerfile instruction, BuildKit/Docker computes a cache key from:

  • The instruction text itself
  • The parent layer digest
  • A hash of files referenced by COPY/ADD (build context)

Cache hit → skip execution, reuse existing layer. Cache miss → execute instruction, create new layer, invalidate all subsequent instructions.

The golden pattern

Order from least-changing to most-changing:

  1. OS packages and system dependencies
  2. Language dependency manifests (package-lock.json, pom.xml, requirements.txt)
  3. Install dependencies (npm ci, mvn dependency:go-offline)
  4. Application source code last
dockerfile
# syntax=docker/dockerfile:1.6
FROM node:20-alpine
WORKDIR /app
# 1. Copy only lockfiles — cache survives source changes
COPY package.json package-lock.json ./
RUN npm ci --omit=dev
# 2. Source changes only invalidate this layer
COPY . .
RUN npm run build

Cache invalidation triggers

Trigger Effect Mitigation
Any instruction text change Miss at that layer + all following Pin versions; avoid churn in early layers
COPY . . early in Dockerfile Any file change busts entire cache Copy manifests first; use .dockerignore
Build context includes node_modules Context hash changes constantly .dockerignore excludes heavy dirs
--no-cache flag Full rebuild Use only when debugging cache issues
Base image tag updated (:latest) Miss from FROM onward Pin to digest or specific patch tag
⚠️ Pitfall

COPY . . as the first instruction after FROM is the most common cache killer. Every code edit rebuilds dependency installation. Always copy lockfiles first.

💡 Pro Tip

Use docker build --progress=plain to see cache hit/miss per step. In CI, use registry cache (--cache-from) so cold runners still benefit from previous pipeline runs.

Dockerfile instructions deep dive

Every instruction has runtime vs build-time semantics, layer implications, and production pitfalls. The table below is your reference—then we walk the critical ones in detail.

Instruction Purpose Example Pitfall
FROM Base image; starts new stage FROM eclipse-temurin:21-jre@sha256:… :latest breaks reproducibility
RUN Execute command; creates layer RUN apt-get update && apt-get install -y curl Each RUN = layer; chain with &&
COPY Copy from build context COPY --chown=app:app target/app.jar . Large context slows hash computation
ADD Copy + tar auto-extract + URL fetch ADD https://…/file.tar.gz /tmp/ Prefer COPY; ADD surprises in review
WORKDIR Set working directory WORKDIR /app Use absolute paths only
ENV Runtime environment variable ENV NODE_ENV=production Visible in docker inspect
ARG Build-time variable only ARG MAVEN_VERSION=3.9 Visible in docker history—never secrets
EXPOSE Document intended port EXPOSE 8080 Does NOT publish port to host
ENTRYPOINT Main executable (PID 1 target) ENTRYPOINT ["java","-jar","app.jar"] Shell form makes sh PID 1
CMD Default args to ENTRYPOINT CMD ["--spring.profiles.active=prod"] Overridden by docker run … cmd
USER Run as non-root user USER 1001 Set before final CMD/ENTRYPOINT
HEALTHCHECK Container health probe HEALTHCHECK CMD curl -f http://localhost/actuator/health Missing in prod = blind orchestration
LABEL Image metadata LABEL org.opencontainers.image.version="1.2.0" Use OCI label schema for tooling
ONBUILD Trigger for child images ONBUILD COPY . /app Surprising inheritance; rare today
SHELL Override default shell for RUN SHELL ["/bin/bash","-c"] Affects only shell-form RUN

FROM — base image and stages

Every Dockerfile begins with FROM. Special bases: scratch (empty—used for static Go binaries), distroless (minimal runtime, no shell). Always pin to digest in production: FROM debian:bookworm-slim@sha256:….

RUN — shell form vs exec form

Shell form: RUN npm install → runs as /bin/sh -c "npm install". Exec form: RUN ["npm", "install"] — no shell, no variable expansion. Chain commands with && in one RUN to avoid extra layers and ensure fail-fast.

dockerfile
RUN apt-get update \
 && apt-get install -y --no-install-recommends curl ca-certificates \
 && rm -rf /var/lib/apt/lists/*

COPY vs ADD

COPY is explicit—files from build context only. ADD additionally auto-extracts local tar archives and can fetch URLs (without cache benefits of COPY). Docker official best practice: use COPY unless you specifically need tar extraction.

ARG vs ENV — build-time vs runtime

ARG exists only during docker build—not in running containers. ENV persists into runtime and appears in docker inspect. Neither is safe for secrets: ARG shows in history, ENV shows in inspect and child images.

ENTRYPOINT vs CMD — PID 1 semantics

Form Example PID 1 process SIGTERM behavior
Exec ENTRYPOINT ENTRYPOINT ["java","-jar","app.jar"] java JVM receives SIGTERM → graceful shutdown
Shell CMD CMD java -jar app.jar /bin/sh SIGTERM to sh; java may not exit cleanly → SIGKILL after 10s
ENTRYPOINT + CMD ENTRYPOINT ["java","-jar"] + CMD ["app.jar"] java CMD args append; override at runtime
⚠️ Pitfall

Shell form CMD/ENTRYPOINT wraps your app in /bin/sh -c. The shell becomes PID 1—it does not forward signals to children. Spring Boot graceful shutdown requires exec form or tini/--init.

USER — non-root by default in production

Set USER before the final CMD/ENTRYPOINT. Prefer numeric UID (USER 1001) over username to avoid base-image-specific user tables. Use COPY --chown=1001:1001 during build—don't chown at runtime.

HEALTHCHECK

Defines how Docker (and Compose/K8s via translation) determines if the container is healthy. Parameters: --interval, --timeout, --retries, --start-period. Unhealthy status appears in docker ps STATUS column.

🔒 Security

Secrets in layers: ARG API_KEY=xxx and ENV API_KEY=xxx persist in image history forever. Use BuildKit --mount=type=secret at build time and runtime secret injection (Vault, K8s Secrets) for credentials.

Interactive layer explorer

Click each Dockerfile instruction to watch layers stack—like git commits building on each other. Notice how dependency layers cache independently from source code.

# Node.js production build — cache-optimized
FROM node:20-alpine
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci --omit=dev
COPY . .
RUN npm run build
USER node
CMD ["npm", "start"]

Image layers (bottom → top)

Click a Dockerfile instruction to see layers stack up (like git commits).

Multi-stage builds

Problem: build tools (Maven, gcc, npm devDependencies) bloat production images and expand attack surface. Solution: compile in a builder stage, copy only artifacts into a minimal runtime stage.

Named stages and COPY --from

Each FROM begins a new stage. Name stages with AS builder. COPY --from=builder pulls files from a previous stage—not from the build context. Only the final stage becomes the tagged image (unless you --target a specific stage).

Java — Maven build → JRE runtime

dockerfile
# syntax=docker/dockerfile:1.6
FROM maven:3.9-eclipse-temurin-21 AS builder
WORKDIR /build
COPY pom.xml .
RUN mvn -B dependency:go-offline
COPY src ./src
RUN mvn -B package -DskipTests

FROM eclipse-temurin:21-jre-alpine
WORKDIR /app
COPY --from=builder /build/target/*.jar app.jar
USER 1001
ENTRYPOINT ["java", "-jar", "app.jar"]

Node.js — build → alpine runtime

dockerfile
FROM node:20 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
USER node
CMD ["node", "dist/server.js"]

Go — static binary → scratch

dockerfile
FROM golang:1.22-alpine AS builder
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /app .

FROM scratch
COPY --from=builder /app /app
USER 65532:65532
ENTRYPOINT ["/app"]

Test stage — fail build if tests fail

dockerfile
FROM builder AS test
RUN mvn -B test

FROM eclipse-temurin:21-jre-alpine AS runtime
COPY --from=builder /build/target/*.jar /app.jar

CI runs docker build --target test to execute tests in the build graph without shipping the test stage.

💡 Pro Tip

Spring Boot 2.3+ supports layered JARs—extract dependencies, spring-boot-loader, and application into separate COPY layers so dependency changes don't invalidate app code layers. See Production Dockerfile Patterns.

Base image selection

Your base image is the floor for size, CVE count, and compatibility. The wrong base costs weeks of musl/glibc debugging or bloated scans. Choose deliberately per workload—not reflexively "Alpine because small."

Base Approx size Best for Watch out
ubuntu:22.04 / debian:bookworm ~78 MB General purpose, apt packages, glibc Large; many packages = more CVEs
debian:bookworm-slim ~74 MB Debian compatibility, fewer packages Still glibc; needs apt hygiene in RUN
alpine:3.19 ~7 MB Static binaries, Go, Node without native addons musl ≠ glibc — Java/native libs may break
gcr.io/distroless/* ~20–50 MB Production Java/Node/Go — no shell, minimal CVEs No shell for debugging; use debug variants
eclipse-temurin / amazoncorretto Varies by tag JVM apps with vendor support Use -jre not -jdk in runtime stage
Red Hat UBI ~80 MB Enterprise/OpenShift, RHEL-compatible Subscription not required to run; good governance
scratch 0 B Static Go/Rust binaries only No libc, no CA certs unless you COPY them

Alpine and musl gotchas

Alpine uses musl libc instead of glibc. Many prebuilt native binaries (Oracle JDK, some Python wheels, Node native modules) assume glibc. Java on Alpine needs a musl-aware build or an extra glibc compatibility layer— often negating size wins. For JVM production, prefer distroless/java21 or eclipse-temurin:21-jre-alpine with tested native deps.

Distroless — Google's minimal production base

Distroless images contain only your app and runtime dependencies—no shell, no package manager, no curl. Attack surface shrinks dramatically. Debug with :debug tags (include busybox shell) during development only.

⚖️ Trade-off

Image size vs compatibility: Alpine saves MB but costs engineering time when native deps break. Distroless saves security review time but complicates ad-hoc docker exec debugging. Full Debian/Ubuntu maximizes compatibility at scan and transfer cost.

📦 Real World

Google runs distroless for most internal services. Netflix maintains curated base images with pre-approved packages and automated CVE patching. Platform teams often publish one blessed base per language—developers inherit governance by default.

Image size optimization

Smaller images pull faster, scan faster, and deploy faster. Measure first—optimize what actually matters on the critical path.

Measure before optimizing

bash
# List images by size
docker image ls --format 'table {{.Repository}}\t{{.Tag}}\t{{.Size}}'

# Per-layer breakdown
docker history --human --no-trunc myapp:latest

# Deep analysis (install dive: https://github.com/wagoodman/dive)
dive myapp:latest

Techniques ranked by impact

Technique Typical savings Example
Multi-stage builds 50–90% (removes build tools) Maven builder → JRE runtime stage
Slim/distroless base 30–70% vs full OS gcr.io/distroless/java21-debian12
Combine RUN commands Fewer layers; remove apt cache rm -rf /var/lib/apt/lists/* in same RUN
.dockerignore Smaller context + faster cache hash Exclude .git, node_modules, target/
COPY --chown Avoids extra chown layer COPY --chown=1001:1001 app.jar .
BuildKit --squash Merges layers (loses cache granularity) Experimental; rarely needed with multi-stage

.dockerignore essentials

text
.git
.gitignore
node_modules
npm-debug.log
target/
*.md
.env
.env.*
coverage/
.idea/
.vscode/
**/*_test.go
Dockerfile*
⚠️ Pitfall

apt-get without cache cleanup: RUN apt-get install -y curl leaves /var/lib/apt/lists/* in the layer forever—often 50+ MB. Always delete package manager caches in the same RUN layer.

💡 Pro Tip

dive shows layer efficiency score and wasted space. Aim for >95% efficiency in production images. If one RUN layer adds 200 MB, that's your optimization target—not shaving 1 MB off a label.

BuildKit

BuildKit is Docker's modern build engine (default Docker 23+). It parallelizes independent stages, mounts caches and secrets without polluting layers, and exports build artifacts flexibly.

Enable BuildKit

bash
# Per-build
DOCKER_BUILDKIT=1 docker build -t myapp .

# Permanent (daemon.json)
# { "features": { "buildkit": true } }

# Dockerfile syntax directive (unlocks latest features)
# syntax=docker/dockerfile:1.6

Cache mounts — persist across builds

Unlike regular layers, cache mounts are not committed to the image. Maven, npm, and Go module caches survive between builds without bloating the final image.

dockerfile
# syntax=docker/dockerfile:1.6
FROM maven:3.9-eclipse-temurin-21 AS builder
WORKDIR /build
COPY pom.xml .
RUN --mount=type=cache,target=/root/.m2 \
    mvn -B dependency:go-offline
COPY src ./src
RUN --mount=type=cache,target=/root/.m2 \
    mvn -B package -DskipTests

Secret mounts — never in layers

bash
# Build with secret (not stored in image)
docker build --secret id=npmrc,src=$HOME/.npmrc -t myapp .

# Dockerfile
# RUN --mount=type=secret,id=npmrc,target=/root/.npmrc npm ci

SSH mounts — private git clones

RUN --mount=type=ssh forwards your SSH agent into the build—clone private repos without embedding keys in the image. Build with docker build --ssh default.

Registry cache — fast CI

bash
docker buildx build \
  --cache-from type=registry,ref=myregistry/myapp:buildcache \
  --cache-to type=registry,ref=myregistry/myapp:buildcache,mode=max \
  --push -t myregistry/myapp:latest .

BuildKit features reference

Feature Purpose Example
Parallel stages Independent stages build concurrently Frontend + backend multi-stage in one Dockerfile
--mount=type=cache Persistent package manager caches target=/root/.m2, /root/.npm
--mount=type=secret Build-time credentials NPM token, pip index password
--mount=type=ssh SSH agent forwarding Private GitHub dependencies
--output type=local Export files without image dest=./dist for static sites
Inline cache Embed cache metadata in pushed image --cache-to type=inline
Provenance/SBOM Supply chain attestations --attest type=sbom
flowchart LR
  CTX[Build context]
  BK[BuildKit solver]
  S1[Stage: builder]
  S2[Stage: runtime]
  CACHE[(Cache mounts\n.m2 / npm)]
  SEC[Secret mounts]
  REG[(Registry cache)]
  IMG[Final image]
  CTX --> BK
  BK --> S1
  BK --> S2
  S1 --> CACHE
  S1 --> SEC
  BK --> REG
  S2 --> IMG
🔬 Under the Hood

BuildKit uses a DAG solver—only rebuilds nodes whose inputs changed. Legacy builder was linear instruction-by-instruction. That's why independent stages and cache mounts dramatically outperform old docker build on large projects.

🎯 Interview Tip

"How do you pass secrets to a Docker build?" — Wrong: ARG/ENV. Right: BuildKit --mount=type=secret (build) + runtime secret managers (deploy). Mention secrets never appear in docker history.