Docker in Production

Running containers in production is not the same as docker run on a laptop. You need a deliberate logging strategy, health probes that match orchestrator semantics, resource limits that prevent noisy-neighbor failures, and a deployment model that fits your scale—Compose on a VPS, Swarm for small clusters, or Kubernetes when platform APIs become the product.

devops architect Compose Podman

When to use Docker standalone vs Kubernetes

Docker Engine plus Compose is a legitimate production stack—not a stepping stone you must outgrow. The question is whether you need cluster APIs (scheduling, service discovery, rolling updates at scale) or whether a single well-run host (or a tiny Swarm) delivers enough reliability for your SLA.

Decision signals

  • Stay on Compose / single server — one or two services, predictable traffic, team < 5, no multi-AZ requirement, deploy cadence weekly or slower
  • Consider Docker Swarm — 3–10 nodes, need HA without hiring a platform team, rolling updates across a small fleet
  • Move to Kubernetes — dozens of services, autoscaling, CRDs/operators, multi-tenant namespaces, GitOps as standard, or compliance mandates centralized policy

Compose / single server vs Swarm vs Kubernetes

Dimension Compose / single server Docker Swarm Kubernetes
Operational complexity Low — one docker compose up Medium — managers + workers, overlay network High — control plane, CNI, ingress, upgrades
HA / multi-node Manual (second VPS + load balancer) Built-in replication + rolling updates Native scheduling, PDBs, multi-AZ
Service discovery Compose DNS on one host Swarm VIP / DNSRR kube-dns / CoreDNS, headless Services
Secrets / config Env files, Docker secrets (Swarm mode) Swarm secrets (encrypted at rest) Secrets, ConfigMaps, external operators
Autoscaling Vertical resize or manual second instance Limited (replica count only) HPA, VPA, cluster autoscaler
Ecosystem Traefik, nginx, Portainer Declining third-party focus CNCF landscape, cloud-managed control planes
Cost (small scale) One $20–80/mo VPS often suffices 3+ nodes minimum for quorum Managed K8s + node pool overhead
Same OCI images? Yes Yes Yes — runtime changes, image does not
flowchart TD
  A["Need production containers?"]
  B["Single host / VPS\nCompose + reverse proxy"]
  C["3–10 nodes, small team\nDocker Swarm"]
  D["Platform team / multi-service\nKubernetes"]
  A --> Q1{"Multi-AZ or\nautoscaling required?"}
  Q1 -->|No| Q2{"More than one\nphysical host?"}
  Q1 -->|Yes| D
  Q2 -->|No| B
  Q2 -->|Yes| Q3{"Want K8s-level\necosystem?"}
  Q3 -->|No| C
  Q3 -->|Yes| D
⚖️ Trade-off

Compose is not "toy infra." Many SaaS products run profitably on a single hardened host with automated backups, external monitoring, and immutable deploys. Kubernetes shines when coordination cost of many services exceeds platform cost—not when resume-driven architecture demands it.

🎯 Interview Tip

"When would you not use Kubernetes?" — Strong answer: bounded service count, team lacks K8s ops depth, SLA achievable with Compose + LB + health checks, or cost/latency of a control plane outweighs benefits. Mention you still use OCI images and the same CI pipeline.

Logging strategy

Containers are ephemeral; logs must not be. The 12-factor rule applies: treat logs as event streams—write to stdout/stderr, let the runtime capture them. Never bake log files into the image or rely on the container writable layer for retention.

The 12-factor logging contract

  1. Application writes structured (JSON) or plain text lines to stdout/stderr
  2. Docker logging driver captures the stream per container
  3. Host agent or driver forwards to centralized store (Loki, CloudWatch, ELK, Datadog)
  4. Retention, search, and alerting live outside the container lifecycle

Default: json-file driver

Unless configured otherwise, Docker uses the json-file driver—each log line becomes a JSON object in /var/lib/docker/containers/<id>/<id>-json.log. Convenient for docker logs, dangerous without rotation (disk fills → node down).

json
{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "5",
    "labels": "service,env"
  }
}

Set globally in /etc/docker/daemon.json or per-container with --log-opt max-size=10m --log-opt max-file=5. max-size rotates when a file exceeds the threshold; max-file keeps N rotated files (roughly 50 MB total in the example above).

Alternative logging drivers

Driver Destination When to use Caveat
json-file Local rotated files on host Default; dev and small prod with log shipper sidecar/agent Disk pressure if rotation omitted
journald systemd journal RHEL/Fedora hosts already centralized on journald docker logs still works; journal quota must be set
fluentd Fluentd / Fluent Bit forward Direct push to aggregation pipeline Driver blocks if Fluentd unreachable—use async buffer
awslogs Amazon CloudWatch Logs ECS/EKS-adjacent EC2, AWS-native ops Requires IAM role on host; per-group costs
syslog Remote syslog receiver Legacy SIEM integration No structured metadata unless app logs JSON

Per-service override in Compose

yaml
services:
  api:
    image: myorg/api:1.4.2
    logging:
      driver: json-file
      options:
        max-size: "20m"
        max-file: "3"
        tag: "{{.Name}}/{{.ID}}"
⚠️ Pitfall

Logging inside the container filesystem (/var/log/app.log) survives restarts only if you mount a volume—and you still need a shipper. Prefer stdout. If you must file-log, mount a volume and run Promtail/Fluent Bit as a sidecar or host agent with a volume scrape config.

💡 Pro Tip

Emit one JSON object per line with timestamp, level, trace_id, and msg. Plain text logs are fine for dev; production search/filtering at scale assumes structure. Correlate with container labels added at deploy time (service=api,env=prod).

📦 Real World

Common pattern on a Compose VPS: json-file + rotation on the host, Promtail tailing /var/lib/docker/containers/*/*-json.log, shipping to Grafana Loki. No app changes—only daemon.json rotation and a agent unit file.

Health checks in production

A process running inside a container is not the same as a healthy service. Health checks tell the orchestrator when to route traffic, when to restart, and when to wait during startup. Docker's built-in HEALTHCHECK maps loosely to Kubernetes liveness; readiness is usually an external concern (load balancer or K8s probe).

HEALTHCHECK parameters

Parameter Default Meaning
--interval 30s Time between probe attempts
--timeout 30s Max time for a single probe to complete
--start-period 0s Grace period after start; failures don't count toward retries
--retries 3 Consecutive failures before marking unhealthy

Probe types

Type Example Best for
HTTP CMD curl -f http://localhost:8080/health || exit 1 REST APIs with a dedicated health endpoint
TCP CMD nc -z localhost 5432 || exit 1 Databases, brokers—port open but no HTTP
Exec CMD pg_isready -U app || exit 1 Vendor-provided CLI health tools
None HEALTHCHECK NONE Disable inherited check from base image

Dockerfile and Compose examples

dockerfile
# Lightweight HTTP check — install curl in builder only if needed
HEALTHCHECK --interval=15s --timeout=3s --start-period=30s --retries=3 \
  CMD curl -fsS http://127.0.0.1:8080/actuator/health || exit 1
yaml
services:
  api:
    image: myorg/api:1.4.2
    healthcheck:
      test: ["CMD", "curl", "-fsS", "http://localhost:8080/health"]
      interval: 15s
      timeout: 5s
      retries: 3
      start_period: 40s

Liveness vs readiness

Probe Question it answers Failure action Docker / Compose Kubernetes
Liveness Is the process deadlocked or stuck? Restart the container HEALTHCHECK → unhealthy → restart policy livenessProbe
Readiness Can this instance accept traffic? Remove from load balancer / Service endpoints Compose depends_on: condition: service_healthy; LB upstream checks readinessProbe
Startup Has slow boot finished? Suppress liveness kills during boot --start-period startupProbe
terminal — health status
$ docker inspect --format '{{json .State.Health}}' api
{"Status":"healthy","FailingStreak":0,"Log":[{"ExitCode":0,...}]}

$ docker ps --filter health=unhealthy
CONTAINER ID   IMAGE          STATUS
a1b2c3d4e5f6   myorg/api:1.4   Up 2 min (unhealthy)
⚠️ Pitfall

Liveness checks that hit dependencies (DB, cache) cause restart loops when a downstream blips. Liveness should verify the process itself. Readiness may check dependencies—failure removes traffic, not the pod/container.

🔬 Under the Hood

Docker runs health check commands in the container namespace but outside your main PID 1 process. A passing check only means the probe command exited 0—it does not guarantee your app thread pool isn't exhausted. Combine health endpoints with saturation metrics (queue depth, thread pool active count).

Zero-downtime updates

Replacing a container stops the old process—unless something else holds connections open and shifts traffic only after the new instance is ready. Strategies range from Compose recreate with health gates to blue-green pairs behind nginx or Traefik weight labels.

Compose: pull and up (rolling recreate)

docker compose pull && docker compose up -d recreates containers whose image digest changed. With depends_on: condition: service_healthy, dependent services wait for upstream health. Default behavior briefly drops connections during container swap—acceptable for internal tools, risky for public APIs without a reverse proxy buffer.

bash
# Typical immutable deploy on a single host
docker compose pull
docker compose up -d --remove-orphans --wait

# --wait (Compose v2.23+): blocks until healthchecks pass

Blue-green with nginx

Run two identical stacks (blue and green) on different host ports or Docker networks. nginx upstream points at the active color; deploy updates the idle color, health-check it, then flip proxy_pass and reload nginx (nginx -s reload is graceful—existing connections drain).

sequenceDiagram
  participant LB as nginx
  participant Blue as api-blue :8081
  participant Green as api-green :8082
  LB->>Blue: 100% traffic
  Note over Green: Deploy new version
  Green->>Green: healthcheck passes
  LB->>Green: reload upstream
  LB->>Green: 100% traffic
  Note over Blue: drain then stop
nginx
upstream api_active {
  server api-blue:8080;
  # server api-green:8080;  # uncomment to switch
}

server {
  location / {
    proxy_pass http://api_active;
    proxy_next_upstream error timeout http_502;
  }
}

Traefik: labels for traffic shifting

Traefik discovers backends via Docker labels. Run blue and green services with different router priorities or use weighted services (Traefik Enterprise / experimental weighted round-robin) to shift traffic gradually.

yaml
services:
  api-green:
    image: myorg/api:1.5.0
    labels:
      - traefik.enable=true
      - traefik.http.routers.api.rule=Host(`api.example.com`)
      - traefik.http.routers.api.priority=100
      - traefik.http.services.api.loadbalancer.server.port=8080
    healthcheck:
      test: ["CMD", "curl", "-fsS", "http://localhost:8080/health"]
      start_period: 30s

  api-blue:
    image: myorg/api:1.4.2
    labels:
      - traefik.enable=true
      - traefik.http.routers.api-blue.rule=Host(`api.example.com`)
      - traefik.http.routers.api-blue.priority=50
      - traefik.http.services.api-blue.loadbalancer.server.port=8080
Strategy Downtime risk Complexity Rollback
Compose recreate Seconds during swap Low Re-pull previous tag
Blue-green + nginx Near zero with health gate Medium Flip upstream back
Traefik priority / weights Near zero Medium Lower green priority, stop green
Kubernetes RollingUpdate Near zero at scale High kubectl rollout undo
💡 Pro Tip

Always deploy readiness before traffic switch. Automate the flip in CI only after curl -f https://api-green.internal/health succeeds—or use Traefik / nginx health-aware upstreams so unhealthy backends never receive new connections.

Resource governance

Unlimited containers on a shared host is a capacity incident waiting to happen. Set memory and CPU limits at deploy time, size JVM heaps to cgroup boundaries, and export metrics with docker stats, cAdvisor, and Prometheus— before OOM kills become your primary alerting signal.

Memory and CPU limits

Flag / Compose key Effect Production guidance
--memory / mem_limit Hard RAM cap; OOM kill in cgroup Always set; leave 10–15% headroom below host RAM for kernel/cache
--memory-swap RAM + swap combined cap Set equal to memory to disable swap for latency-sensitive apps
--cpus / cpus CFS quota (e.g. 1.5 = 150% of one core) Prefer over --cpu-shares for predictable ceilings
--pids-limit Max processes in container Guard against fork bombs and thread leaks
yaml
services:
  api:
    image: myorg/api:1.4.2
    deploy:
      resources:
        limits:
          cpus: "2.0"
          memory: 2G
        reservations:
          cpus: "0.5"
          memory: 512M

JVM sizing example

A container with --memory=2g must not run a JVM that assumes host RAM. Use container-aware flags so the heap respects cgroup limits:

dockerfile
ENV JAVA_TOOL_OPTIONS="-XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0 -XX:InitialRAMPercentage=50.0"
# 2g limit → ~1.5g heap, ~500m for metaspace, threads, native buffers
⚠️ Pitfall

-Xmx2g with --memory=2g ignores non-heap memory—container OOMs while the JVM believes it is fine. Use MaxRAMPercentage or set -Xmx to ~70–75% of the cgroup limit.

Observability stack

Tool Scope Key metrics
docker stats Live CLI per container CPU %, MEM USAGE / LIMIT, NET I/O, BLOCK I/O
cAdvisor Host + container metrics HTTP API Same as stats, historical, labels; scraped by Prometheus
Prometheus Time-series store + alerting container_memory_usage_bytes, CPU throttling, OOM events
node_exporter Host-level hardware Disk, pressure stall, overall memory—context for container limits
bash
# Quick capacity check
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"

# cAdvisor (publish metrics on :8080)
docker run -d --name=cadvisor --privileged \
  -p 8080:8080 \
  -v /:/rootfs:ro -v /var/run:/var/run:ro \
  -v /sys:/sys:ro -v /var/lib/docker/:/var/lib/docker:ro \
  gcr.io/cadvisor/cadvisor:latest

Example Prometheus alert: fire when container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.9 for 5m—action before OOM kill.

📦 Real World

On Compose hosts, a lightweight pattern is cAdvisor + Prometheus + Grafana as a monitoring stack service in the same compose file, with recording rules for CPU throttling (container_cpu_cfs_throttled_seconds_total)—high throttle rate means your --cpus ceiling is too low.

Docker on RHEL / OpenShift

Red Hat Enterprise Linux 8+ and OpenShift do not ship Docker Engine as the supported container runtime. Podman (daemonless, rootless-first), Buildah (image builds), and CRI-O (Kubernetes CRI) form the supported stack—while your OCI images and Dockerfiles remain portable.

Tooling map

Tool Replaces Role
Podman docker run, docker ps Daemonless container runtime; fork/exec per container
podman-docker Docker CLI Symlink package providing dockerpodman compatibility
Buildah docker build OCI image build without a daemon; used under the hood by Podman
Skopeo docker pull/push (inspect) Copy images between registries without running a daemon
CRI-O containerd + dockershim on K8s nodes Lightweight OCI runtime for Kubernetes/OpenShift nodes

Podman vs Docker Engine

Aspect Docker Engine Podman (RHEL default)
Daemon Central rootful dockerd None—containers are child processes of your user session
Rootless Opt-in Default and recommended
socket exposure /var/run/docker.sock = root equivalent No central socket attack surface
Compose docker compose podman compose or podman-compose
systemd integration Manual unit files podman generate systemd for user units
bash
# RHEL: install compatibility shim
sudo dnf install podman podman-docker buildah skopeo

# Familiar workflow — docker is podman
docker run -d --name api -p 8080:8080 registry.example.com/api:1.4.2
docker ps

# Rootless podman (default for normal users)
podman run -d quay.io/podman/hello

# Generate systemd user service for a container
podman generate systemd --new --name api > ~/.config/systemd/user/api.service
systemctl --user enable --now api.service

OpenShift specifics

  • CRI-O runs workloads on worker nodes; developers typically never SSH to nodes
  • ImageStreams — OpenShift-native image abstraction: tracks tags, triggers redeploys on new push, mirrors external registries
  • Restricted SCCs — Security Context Constraints replace ad-hoc --cap-add; containers run as arbitrary non-root UIDs
  • BuildConfigs — cluster-native builds (Dockerfile, S2I) producing ImageStream tags—similar goals to CI pipelines elsewhere
  • Routes — HAProxy-based ingress with TLS edge termination; analogous to Ingress + cert-manager on vanilla K8s
OpenShift concept Rough Docker / K8s equivalent
ImageStream Internal registry + tag tracking + webhook trigger
ImageStreamTag myapp:1.4.2 pointing at a digest
DeploymentConfig Deployment + rollout trigger on IS change (legacy; prefer Deployment)
Route Ingress + LoadBalancer hostname
SCC PodSecurity + capabilities policy
🔒 Security

On RHEL/OpenShift, rootless Podman maps container root to an unprivileged host UID range (/etc/subuid). Bind mounts must respect mapped ownership—files owned by host root may be unreadable inside the container unless UID namespaces align.

⚖️ Trade-off

Portable images, environment-specific runtime. Keep Dockerfiles CI-standard; let RHEL use Podman/Buildah and OpenShift use CRI-O + ImageStreams. Fighting for dockerd on RHEL means unsupported packages and SELinux friction—Podman is first-class on RHEL documentation and support contracts.