Docker in Production

When to use Docker standalone vs Kubernetes

Docker Engine plus Compose is a legitimate production stack—not a stepping stone you must outgrow. The question is whether you need cluster APIs (scheduling, service discovery, rolling updates at scale) or whether a single well-run host (or a tiny Swarm) delivers enough reliability for your SLA.

Decision signals

Stay on Compose / single server — one or two services, predictable traffic, team < 5, no multi-AZ requirement, deploy cadence weekly or slower
Consider Docker Swarm — 3–10 nodes, need HA without hiring a platform team, rolling updates across a small fleet
Move to Kubernetes — dozens of services, autoscaling, CRDs/operators, multi-tenant namespaces, GitOps as standard, or compliance mandates centralized policy

Compose / single server vs Swarm vs Kubernetes

Dimension	Compose / single server	Docker Swarm	Kubernetes
Operational complexity	Low — one docker compose up	Medium — managers + workers, overlay network	High — control plane, CNI, ingress, upgrades
HA / multi-node	Manual (second VPS + load balancer)	Built-in replication + rolling updates	Native scheduling, PDBs, multi-AZ
Service discovery	Compose DNS on one host	Swarm VIP / DNSRR	kube-dns / CoreDNS, headless Services
Secrets / config	Env files, Docker secrets (Swarm mode)	Swarm secrets (encrypted at rest)	Secrets, ConfigMaps, external operators
Autoscaling	Vertical resize or manual second instance	Limited (replica count only)	HPA, VPA, cluster autoscaler
Ecosystem	Traefik, nginx, Portainer	Declining third-party focus	CNCF landscape, cloud-managed control planes
Cost (small scale)	One $20–80/mo VPS often suffices	3+ nodes minimum for quorum	Managed K8s + node pool overhead
Same OCI images?	Yes	Yes	Yes — runtime changes, image does not

flowchart TD
  A["Need production containers?"]
  B["Single host / VPS\nCompose + reverse proxy"]
  C["3–10 nodes, small team\nDocker Swarm"]
  D["Platform team / multi-service\nKubernetes"]
  A --> Q1{"Multi-AZ or\nautoscaling required?"}
  Q1 -->|No| Q2{"More than one\nphysical host?"}
  Q1 -->|Yes| D
  Q2 -->|No| B
  Q2 -->|Yes| Q3{"Want K8s-level\necosystem?"}
  Q3 -->|No| C
  Q3 -->|Yes| D

⚖️ Trade-off

Compose is not "toy infra." Many SaaS products run profitably on a single hardened host with automated backups, external monitoring, and immutable deploys. Kubernetes shines when coordination cost of many services exceeds platform cost—not when resume-driven architecture demands it.

🎯 Interview Tip

"When would you not use Kubernetes?" — Strong answer: bounded service count, team lacks K8s ops depth, SLA achievable with Compose + LB + health checks, or cost/latency of a control plane outweighs benefits. Mention you still use OCI images and the same CI pipeline.

Logging strategy

Containers are ephemeral; logs must not be. The 12-factor rule applies: treat logs as event streams—write to stdout/stderr, let the runtime capture them. Never bake log files into the image or rely on the container writable layer for retention.

The 12-factor logging contract

Application writes structured (JSON) or plain text lines to stdout/stderr
Docker logging driver captures the stream per container
Host agent or driver forwards to centralized store (Loki, CloudWatch, ELK, Datadog)
Retention, search, and alerting live outside the container lifecycle

Default: json-file driver

Unless configured otherwise, Docker uses the json-file driver—each log line becomes a JSON object in /var/lib/docker/containers/<id>/<id>-json.log. Convenient for docker logs, dangerous without rotation (disk fills → node down).

{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "5",
    "labels": "service,env"
  }
}

Set globally in /etc/docker/daemon.json or per-container with --log-opt max-size=10m --log-opt max-file=5. max-size rotates when a file exceeds the threshold; max-file keeps N rotated files (roughly 50 MB total in the example above).

Alternative logging drivers

Driver	Destination	When to use	Caveat
json-file	Local rotated files on host	Default; dev and small prod with log shipper sidecar/agent	Disk pressure if rotation omitted
journald	systemd journal	RHEL/Fedora hosts already centralized on journald	docker logs still works; journal quota must be set
fluentd	Fluentd / Fluent Bit forward	Direct push to aggregation pipeline	Driver blocks if Fluentd unreachable—use async buffer
awslogs	Amazon CloudWatch Logs	ECS/EKS-adjacent EC2, AWS-native ops	Requires IAM role on host; per-group costs
syslog	Remote syslog receiver	Legacy SIEM integration	No structured metadata unless app logs JSON

Per-service override in Compose

services:
  api:
    image: myorg/api:1.4.2
    logging:
      driver: json-file
      options:
        max-size: "20m"
        max-file: "3"
        tag: "{{.Name}}/{{.ID}}"

⚠️ Pitfall

Logging inside the container filesystem (/var/log/app.log) survives restarts only if you mount a volume—and you still need a shipper. Prefer stdout. If you must file-log, mount a volume and run Promtail/Fluent Bit as a sidecar or host agent with a volume scrape config.

💡 Pro Tip

Emit one JSON object per line with timestamp, level, trace_id, and msg. Plain text logs are fine for dev; production search/filtering at scale assumes structure. Correlate with container labels added at deploy time (service=api,env=prod).

📦 Real World

Common pattern on a Compose VPS: json-file + rotation on the host, Promtail tailing /var/lib/docker/containers/*/*-json.log, shipping to Grafana Loki. No app changes—only daemon.json rotation and a agent unit file.

Health checks in production

A process running inside a container is not the same as a healthy service. Health checks tell the orchestrator when to route traffic, when to restart, and when to wait during startup. Docker's built-in HEALTHCHECK maps loosely to Kubernetes liveness; readiness is usually an external concern (load balancer or K8s probe).

HEALTHCHECK parameters

Parameter	Default	Meaning
--interval	30s	Time between probe attempts
--timeout	30s	Max time for a single probe to complete
--start-period	0s	Grace period after start; failures don't count toward retries
--retries	3	Consecutive failures before marking unhealthy

Probe types

Type	Example	Best for
HTTP	CMD curl -f http://localhost:8080/health \|\| exit 1	REST APIs with a dedicated health endpoint
TCP	CMD nc -z localhost 5432 \|\| exit 1	Databases, brokers—port open but no HTTP
Exec	CMD pg_isready -U app \|\| exit 1	Vendor-provided CLI health tools
None	HEALTHCHECK NONE	Disable inherited check from base image

Dockerfile and Compose examples

# Lightweight HTTP check — install curl in builder only if needed
HEALTHCHECK --interval=15s --timeout=3s --start-period=30s --retries=3 \
  CMD curl -fsS http://127.0.0.1:8080/actuator/health || exit 1

services:
  api:
    image: myorg/api:1.4.2
    healthcheck:
      test: ["CMD", "curl", "-fsS", "http://localhost:8080/health"]
      interval: 15s
      timeout: 5s
      retries: 3
      start_period: 40s

Liveness vs readiness

Probe	Question it answers	Failure action	Docker / Compose	Kubernetes
Liveness	Is the process deadlocked or stuck?	Restart the container	HEALTHCHECK → unhealthy → restart policy	livenessProbe
Readiness	Can this instance accept traffic?	Remove from load balancer / Service endpoints	Compose depends_on: condition: service_healthy; LB upstream checks	readinessProbe
Startup	Has slow boot finished?	Suppress liveness kills during boot	--start-period	startupProbe

$ docker inspect --format '{{json .State.Health}}' api
{"Status":"healthy","FailingStreak":0,"Log":[{"ExitCode":0,...}]}

$ docker ps --filter health=unhealthy
CONTAINER ID   IMAGE          STATUS
a1b2c3d4e5f6   myorg/api:1.4   Up 2 min (unhealthy)

⚠️ Pitfall

Liveness checks that hit dependencies (DB, cache) cause restart loops when a downstream blips. Liveness should verify the process itself. Readiness may check dependencies—failure removes traffic, not the pod/container.

🔬 Under the Hood

Docker runs health check commands in the container namespace but outside your main PID 1 process. A passing check only means the probe command exited 0—it does not guarantee your app thread pool isn't exhausted. Combine health endpoints with saturation metrics (queue depth, thread pool active count).

Zero-downtime updates

Replacing a container stops the old process—unless something else holds connections open and shifts traffic only after the new instance is ready. Strategies range from Compose recreate with health gates to blue-green pairs behind nginx or Traefik weight labels.

Compose: pull and up (rolling recreate)

docker compose pull && docker compose up -d recreates containers whose image digest changed. With depends_on: condition: service_healthy, dependent services wait for upstream health. Default behavior briefly drops connections during container swap—acceptable for internal tools, risky for public APIs without a reverse proxy buffer.

# Typical immutable deploy on a single host
docker compose pull
docker compose up -d --remove-orphans --wait

# --wait (Compose v2.23+): blocks until healthchecks pass

Blue-green with nginx

Run two identical stacks (blue and green) on different host ports or Docker networks. nginx upstream points at the active color; deploy updates the idle color, health-check it, then flip proxy_pass and reload nginx (nginx -s reload is graceful—existing connections drain).

sequenceDiagram
  participant LB as nginx
  participant Blue as api-blue :8081
  participant Green as api-green :8082
  LB->>Blue: 100% traffic
  Note over Green: Deploy new version
  Green->>Green: healthcheck passes
  LB->>Green: reload upstream
  LB->>Green: 100% traffic
  Note over Blue: drain then stop

upstream api_active {
  server api-blue:8080;
  # server api-green:8080;  # uncomment to switch
}

server {
  location / {
    proxy_pass http://api_active;
    proxy_next_upstream error timeout http_502;
  }
}

Traefik: labels for traffic shifting

Traefik discovers backends via Docker labels. Run blue and green services with different router priorities or use weighted services (Traefik Enterprise / experimental weighted round-robin) to shift traffic gradually.

services:
  api-green:
    image: myorg/api:1.5.0
    labels:
      - traefik.enable=true
      - traefik.http.routers.api.rule=Host(`api.example.com`)
      - traefik.http.routers.api.priority=100
      - traefik.http.services.api.loadbalancer.server.port=8080
    healthcheck:
      test: ["CMD", "curl", "-fsS", "http://localhost:8080/health"]
      start_period: 30s

  api-blue:
    image: myorg/api:1.4.2
    labels:
      - traefik.enable=true
      - traefik.http.routers.api-blue.rule=Host(`api.example.com`)
      - traefik.http.routers.api-blue.priority=50
      - traefik.http.services.api-blue.loadbalancer.server.port=8080

Strategy	Downtime risk	Complexity	Rollback
Compose recreate	Seconds during swap	Low	Re-pull previous tag
Blue-green + nginx	Near zero with health gate	Medium	Flip upstream back
Traefik priority / weights	Near zero	Medium	Lower green priority, stop green
Kubernetes RollingUpdate	Near zero at scale	High	kubectl rollout undo

💡 Pro Tip

Always deploy readiness before traffic switch. Automate the flip in CI only after curl -f https://api-green.internal/health succeeds—or use Traefik / nginx health-aware upstreams so unhealthy backends never receive new connections.

Resource governance

Unlimited containers on a shared host is a capacity incident waiting to happen. Set memory and CPU limits at deploy time, size JVM heaps to cgroup boundaries, and export metrics with docker stats, cAdvisor, and Prometheus— before OOM kills become your primary alerting signal.

Memory and CPU limits

Flag / Compose key	Effect	Production guidance
--memory / mem_limit	Hard RAM cap; OOM kill in cgroup	Always set; leave 10–15% headroom below host RAM for kernel/cache
--memory-swap	RAM + swap combined cap	Set equal to memory to disable swap for latency-sensitive apps
--cpus / cpus	CFS quota (e.g. 1.5 = 150% of one core)	Prefer over --cpu-shares for predictable ceilings
--pids-limit	Max processes in container	Guard against fork bombs and thread leaks

services:
  api:
    image: myorg/api:1.4.2
    deploy:
      resources:
        limits:
          cpus: "2.0"
          memory: 2G
        reservations:
          cpus: "0.5"
          memory: 512M

JVM sizing example

A container with --memory=2g must not run a JVM that assumes host RAM. Use container-aware flags so the heap respects cgroup limits:

ENV JAVA_TOOL_OPTIONS="-XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0 -XX:InitialRAMPercentage=50.0"
# 2g limit → ~1.5g heap, ~500m for metaspace, threads, native buffers

⚠️ Pitfall

-Xmx2g with --memory=2g ignores non-heap memory—container OOMs while the JVM believes it is fine. Use MaxRAMPercentage or set -Xmx to ~70–75% of the cgroup limit.

Observability stack

Tool	Scope	Key metrics
docker stats	Live CLI per container	CPU %, MEM USAGE / LIMIT, NET I/O, BLOCK I/O
cAdvisor	Host + container metrics HTTP API	Same as stats, historical, labels; scraped by Prometheus
Prometheus	Time-series store + alerting	container_memory_usage_bytes, CPU throttling, OOM events
node_exporter	Host-level hardware	Disk, pressure stall, overall memory—context for container limits

# Quick capacity check
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"

# cAdvisor (publish metrics on :8080)
docker run -d --name=cadvisor --privileged \
  -p 8080:8080 \
  -v /:/rootfs:ro -v /var/run:/var/run:ro \
  -v /sys:/sys:ro -v /var/lib/docker/:/var/lib/docker:ro \
  gcr.io/cadvisor/cadvisor:latest

Example Prometheus alert: fire when container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.9 for 5m—action before OOM kill.

📦 Real World

On Compose hosts, a lightweight pattern is cAdvisor + Prometheus + Grafana as a monitoring stack service in the same compose file, with recording rules for CPU throttling (container_cpu_cfs_throttled_seconds_total)—high throttle rate means your --cpus ceiling is too low.

Docker on RHEL / OpenShift

Red Hat Enterprise Linux 8+ and OpenShift do not ship Docker Engine as the supported container runtime. Podman (daemonless, rootless-first), Buildah (image builds), and CRI-O (Kubernetes CRI) form the supported stack—while your OCI images and Dockerfiles remain portable.

Tooling map

Tool	Replaces	Role
Podman	docker run, docker ps	Daemonless container runtime; fork/exec per container
podman-docker	Docker CLI	Symlink package providing docker → podman compatibility
Buildah	docker build	OCI image build without a daemon; used under the hood by Podman
Skopeo	docker pull/push (inspect)	Copy images between registries without running a daemon
CRI-O	containerd + dockershim on K8s nodes	Lightweight OCI runtime for Kubernetes/OpenShift nodes

Podman vs Docker Engine

Aspect	Docker Engine	Podman (RHEL default)
Daemon	Central rootful dockerd	None—containers are child processes of your user session
Rootless	Opt-in	Default and recommended
socket exposure	/var/run/docker.sock = root equivalent	No central socket attack surface
Compose	docker compose	podman compose or podman-compose
systemd integration	Manual unit files	podman generate systemd for user units

# RHEL: install compatibility shim
sudo dnf install podman podman-docker buildah skopeo

# Familiar workflow — docker is podman
docker run -d --name api -p 8080:8080 registry.example.com/api:1.4.2
docker ps

# Rootless podman (default for normal users)
podman run -d quay.io/podman/hello

# Generate systemd user service for a container
podman generate systemd --new --name api > ~/.config/systemd/user/api.service
systemctl --user enable --now api.service

OpenShift specifics

CRI-O runs workloads on worker nodes; developers typically never SSH to nodes
ImageStreams — OpenShift-native image abstraction: tracks tags, triggers redeploys on new push, mirrors external registries
Restricted SCCs — Security Context Constraints replace ad-hoc --cap-add; containers run as arbitrary non-root UIDs
BuildConfigs — cluster-native builds (Dockerfile, S2I) producing ImageStream tags—similar goals to CI pipelines elsewhere
Routes — HAProxy-based ingress with TLS edge termination; analogous to Ingress + cert-manager on vanilla K8s

OpenShift concept	Rough Docker / K8s equivalent
ImageStream	Internal registry + tag tracking + webhook trigger
ImageStreamTag	myapp:1.4.2 pointing at a digest
DeploymentConfig	Deployment + rollout trigger on IS change (legacy; prefer Deployment)
Route	Ingress + LoadBalancer hostname
SCC	PodSecurity + capabilities policy

🔒 Security

On RHEL/OpenShift, rootless Podman maps container root to an unprivileged host UID range (/etc/subuid). Bind mounts must respect mapped ownership—files owned by host root may be unreadable inside the container unless UID namespaces align.

⚖️ Trade-off

Portable images, environment-specific runtime. Keep Dockerfiles CI-standard; let RHEL use Podman/Buildah and OpenShift use CRI-O + ImageStreams. Fighting for dockerd on RHEL means unsupported packages and SELinux friction—Podman is first-class on RHEL documentation and support contracts.