Docker in Production
Running containers in production is not the same as docker run on a laptop. You need a deliberate logging strategy, health probes that match orchestrator semantics, resource limits that prevent noisy-neighbor failures, and a deployment model that fits your scale—Compose on a VPS, Swarm for small clusters, or Kubernetes when platform APIs become the product.
When to use Docker standalone vs Kubernetes
Docker Engine plus Compose is a legitimate production stack—not a stepping stone you must outgrow. The question is whether you need cluster APIs (scheduling, service discovery, rolling updates at scale) or whether a single well-run host (or a tiny Swarm) delivers enough reliability for your SLA.
Decision signals
- Stay on Compose / single server — one or two services, predictable traffic, team < 5, no multi-AZ requirement, deploy cadence weekly or slower
- Consider Docker Swarm — 3–10 nodes, need HA without hiring a platform team, rolling updates across a small fleet
- Move to Kubernetes — dozens of services, autoscaling, CRDs/operators, multi-tenant namespaces, GitOps as standard, or compliance mandates centralized policy
Compose / single server vs Swarm vs Kubernetes
| Dimension | Compose / single server | Docker Swarm | Kubernetes |
|---|---|---|---|
| Operational complexity | Low — one docker compose up | Medium — managers + workers, overlay network | High — control plane, CNI, ingress, upgrades |
| HA / multi-node | Manual (second VPS + load balancer) | Built-in replication + rolling updates | Native scheduling, PDBs, multi-AZ |
| Service discovery | Compose DNS on one host | Swarm VIP / DNSRR | kube-dns / CoreDNS, headless Services |
| Secrets / config | Env files, Docker secrets (Swarm mode) | Swarm secrets (encrypted at rest) | Secrets, ConfigMaps, external operators |
| Autoscaling | Vertical resize or manual second instance | Limited (replica count only) | HPA, VPA, cluster autoscaler |
| Ecosystem | Traefik, nginx, Portainer | Declining third-party focus | CNCF landscape, cloud-managed control planes |
| Cost (small scale) | One $20–80/mo VPS often suffices | 3+ nodes minimum for quorum | Managed K8s + node pool overhead |
| Same OCI images? | Yes | Yes | Yes — runtime changes, image does not |
flowchart TD
A["Need production containers?"]
B["Single host / VPS\nCompose + reverse proxy"]
C["3–10 nodes, small team\nDocker Swarm"]
D["Platform team / multi-service\nKubernetes"]
A --> Q1{"Multi-AZ or\nautoscaling required?"}
Q1 -->|No| Q2{"More than one\nphysical host?"}
Q1 -->|Yes| D
Q2 -->|No| B
Q2 -->|Yes| Q3{"Want K8s-level\necosystem?"}
Q3 -->|No| C
Q3 -->|Yes| D
Compose is not "toy infra." Many SaaS products run profitably on a single hardened host with automated backups, external monitoring, and immutable deploys. Kubernetes shines when coordination cost of many services exceeds platform cost—not when resume-driven architecture demands it.
"When would you not use Kubernetes?" — Strong answer: bounded service count, team lacks K8s ops depth, SLA achievable with Compose + LB + health checks, or cost/latency of a control plane outweighs benefits. Mention you still use OCI images and the same CI pipeline.
Logging strategy
Containers are ephemeral; logs must not be. The 12-factor rule applies: treat logs as event streams—write to stdout/stderr, let the runtime capture them. Never bake log files into the image or rely on the container writable layer for retention.
The 12-factor logging contract
- Application writes structured (JSON) or plain text lines to stdout/stderr
- Docker logging driver captures the stream per container
- Host agent or driver forwards to centralized store (Loki, CloudWatch, ELK, Datadog)
- Retention, search, and alerting live outside the container lifecycle
Default: json-file driver
Unless configured otherwise, Docker uses the json-file driver—each log line becomes a JSON object in /var/lib/docker/containers/<id>/<id>-json.log. Convenient for docker logs, dangerous without rotation (disk fills → node down).
{
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "5",
"labels": "service,env"
}
}
Set globally in /etc/docker/daemon.json or per-container with --log-opt max-size=10m --log-opt max-file=5. max-size rotates when a file exceeds the threshold; max-file keeps N rotated files (roughly 50 MB total in the example above).
Alternative logging drivers
| Driver | Destination | When to use | Caveat |
|---|---|---|---|
| json-file | Local rotated files on host | Default; dev and small prod with log shipper sidecar/agent | Disk pressure if rotation omitted |
| journald | systemd journal | RHEL/Fedora hosts already centralized on journald | docker logs still works; journal quota must be set |
| fluentd | Fluentd / Fluent Bit forward | Direct push to aggregation pipeline | Driver blocks if Fluentd unreachable—use async buffer |
| awslogs | Amazon CloudWatch Logs | ECS/EKS-adjacent EC2, AWS-native ops | Requires IAM role on host; per-group costs |
| syslog | Remote syslog receiver | Legacy SIEM integration | No structured metadata unless app logs JSON |
Per-service override in Compose
services:
api:
image: myorg/api:1.4.2
logging:
driver: json-file
options:
max-size: "20m"
max-file: "3"
tag: "{{.Name}}/{{.ID}}"
Logging inside the container filesystem (/var/log/app.log) survives restarts only if you mount a volume—and you still need a shipper. Prefer stdout. If you must file-log, mount a volume and run Promtail/Fluent Bit as a sidecar or host agent with a volume scrape config.
Emit one JSON object per line with timestamp, level, trace_id, and msg. Plain text logs are fine for dev; production search/filtering at scale assumes structure. Correlate with container labels added at deploy time (service=api,env=prod).
Common pattern on a Compose VPS: json-file + rotation on the host, Promtail tailing /var/lib/docker/containers/*/*-json.log, shipping to Grafana Loki. No app changes—only daemon.json rotation and a agent unit file.
Health checks in production
A process running inside a container is not the same as a healthy service. Health checks tell the orchestrator when to route traffic, when to restart, and when to wait during startup. Docker's built-in HEALTHCHECK maps loosely to Kubernetes liveness; readiness is usually an external concern (load balancer or K8s probe).
HEALTHCHECK parameters
| Parameter | Default | Meaning |
|---|---|---|
| --interval | 30s | Time between probe attempts |
| --timeout | 30s | Max time for a single probe to complete |
| --start-period | 0s | Grace period after start; failures don't count toward retries |
| --retries | 3 | Consecutive failures before marking unhealthy |
Probe types
| Type | Example | Best for |
|---|---|---|
| HTTP | CMD curl -f http://localhost:8080/health || exit 1 | REST APIs with a dedicated health endpoint |
| TCP | CMD nc -z localhost 5432 || exit 1 | Databases, brokers—port open but no HTTP |
| Exec | CMD pg_isready -U app || exit 1 | Vendor-provided CLI health tools |
| None | HEALTHCHECK NONE | Disable inherited check from base image |
Dockerfile and Compose examples
# Lightweight HTTP check — install curl in builder only if needed
HEALTHCHECK --interval=15s --timeout=3s --start-period=30s --retries=3 \
CMD curl -fsS http://127.0.0.1:8080/actuator/health || exit 1
services:
api:
image: myorg/api:1.4.2
healthcheck:
test: ["CMD", "curl", "-fsS", "http://localhost:8080/health"]
interval: 15s
timeout: 5s
retries: 3
start_period: 40s
Liveness vs readiness
| Probe | Question it answers | Failure action | Docker / Compose | Kubernetes |
|---|---|---|---|---|
| Liveness | Is the process deadlocked or stuck? | Restart the container | HEALTHCHECK → unhealthy → restart policy | livenessProbe |
| Readiness | Can this instance accept traffic? | Remove from load balancer / Service endpoints | Compose depends_on: condition: service_healthy; LB upstream checks | readinessProbe |
| Startup | Has slow boot finished? | Suppress liveness kills during boot | --start-period | startupProbe |
$ docker inspect --format '{{json .State.Health}}' api {"Status":"healthy","FailingStreak":0,"Log":[{"ExitCode":0,...}]} $ docker ps --filter health=unhealthy CONTAINER ID IMAGE STATUS a1b2c3d4e5f6 myorg/api:1.4 Up 2 min (unhealthy)
Liveness checks that hit dependencies (DB, cache) cause restart loops when a downstream blips. Liveness should verify the process itself. Readiness may check dependencies—failure removes traffic, not the pod/container.
Docker runs health check commands in the container namespace but outside your main PID 1 process. A passing check only means the probe command exited 0—it does not guarantee your app thread pool isn't exhausted. Combine health endpoints with saturation metrics (queue depth, thread pool active count).
Zero-downtime updates
Replacing a container stops the old process—unless something else holds connections open and shifts traffic only after the new instance is ready. Strategies range from Compose recreate with health gates to blue-green pairs behind nginx or Traefik weight labels.
Compose: pull and up (rolling recreate)
docker compose pull && docker compose up -d recreates containers whose image digest changed. With depends_on: condition: service_healthy, dependent services wait for upstream health. Default behavior briefly drops connections during container swap—acceptable for internal tools, risky for public APIs without a reverse proxy buffer.
# Typical immutable deploy on a single host
docker compose pull
docker compose up -d --remove-orphans --wait
# --wait (Compose v2.23+): blocks until healthchecks pass
Blue-green with nginx
Run two identical stacks (blue and green) on different host ports or Docker networks. nginx upstream points at the active color; deploy updates the idle color, health-check it, then flip proxy_pass and reload nginx (nginx -s reload is graceful—existing connections drain).
sequenceDiagram participant LB as nginx participant Blue as api-blue :8081 participant Green as api-green :8082 LB->>Blue: 100% traffic Note over Green: Deploy new version Green->>Green: healthcheck passes LB->>Green: reload upstream LB->>Green: 100% traffic Note over Blue: drain then stop
upstream api_active {
server api-blue:8080;
# server api-green:8080; # uncomment to switch
}
server {
location / {
proxy_pass http://api_active;
proxy_next_upstream error timeout http_502;
}
}
Traefik: labels for traffic shifting
Traefik discovers backends via Docker labels. Run blue and green services with different router priorities or use weighted services (Traefik Enterprise / experimental weighted round-robin) to shift traffic gradually.
services:
api-green:
image: myorg/api:1.5.0
labels:
- traefik.enable=true
- traefik.http.routers.api.rule=Host(`api.example.com`)
- traefik.http.routers.api.priority=100
- traefik.http.services.api.loadbalancer.server.port=8080
healthcheck:
test: ["CMD", "curl", "-fsS", "http://localhost:8080/health"]
start_period: 30s
api-blue:
image: myorg/api:1.4.2
labels:
- traefik.enable=true
- traefik.http.routers.api-blue.rule=Host(`api.example.com`)
- traefik.http.routers.api-blue.priority=50
- traefik.http.services.api-blue.loadbalancer.server.port=8080
| Strategy | Downtime risk | Complexity | Rollback |
|---|---|---|---|
| Compose recreate | Seconds during swap | Low | Re-pull previous tag |
| Blue-green + nginx | Near zero with health gate | Medium | Flip upstream back |
| Traefik priority / weights | Near zero | Medium | Lower green priority, stop green |
| Kubernetes RollingUpdate | Near zero at scale | High | kubectl rollout undo |
Always deploy readiness before traffic switch. Automate the flip in CI only after curl -f https://api-green.internal/health succeeds—or use Traefik / nginx health-aware upstreams so unhealthy backends never receive new connections.
Resource governance
Unlimited containers on a shared host is a capacity incident waiting to happen. Set memory and CPU limits at deploy time, size JVM heaps to cgroup boundaries, and export metrics with docker stats, cAdvisor, and Prometheus— before OOM kills become your primary alerting signal.
Memory and CPU limits
| Flag / Compose key | Effect | Production guidance |
|---|---|---|
| --memory / mem_limit | Hard RAM cap; OOM kill in cgroup | Always set; leave 10–15% headroom below host RAM for kernel/cache |
| --memory-swap | RAM + swap combined cap | Set equal to memory to disable swap for latency-sensitive apps |
| --cpus / cpus | CFS quota (e.g. 1.5 = 150% of one core) | Prefer over --cpu-shares for predictable ceilings |
| --pids-limit | Max processes in container | Guard against fork bombs and thread leaks |
services:
api:
image: myorg/api:1.4.2
deploy:
resources:
limits:
cpus: "2.0"
memory: 2G
reservations:
cpus: "0.5"
memory: 512M
JVM sizing example
A container with --memory=2g must not run a JVM that assumes host RAM. Use container-aware flags so the heap respects cgroup limits:
ENV JAVA_TOOL_OPTIONS="-XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0 -XX:InitialRAMPercentage=50.0"
# 2g limit → ~1.5g heap, ~500m for metaspace, threads, native buffers
-Xmx2g with --memory=2g ignores non-heap memory—container OOMs while the JVM believes it is fine. Use MaxRAMPercentage or set -Xmx to ~70–75% of the cgroup limit.
Observability stack
| Tool | Scope | Key metrics |
|---|---|---|
| docker stats | Live CLI per container | CPU %, MEM USAGE / LIMIT, NET I/O, BLOCK I/O |
| cAdvisor | Host + container metrics HTTP API | Same as stats, historical, labels; scraped by Prometheus |
| Prometheus | Time-series store + alerting | container_memory_usage_bytes, CPU throttling, OOM events |
| node_exporter | Host-level hardware | Disk, pressure stall, overall memory—context for container limits |
# Quick capacity check
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"
# cAdvisor (publish metrics on :8080)
docker run -d --name=cadvisor --privileged \
-p 8080:8080 \
-v /:/rootfs:ro -v /var/run:/var/run:ro \
-v /sys:/sys:ro -v /var/lib/docker/:/var/lib/docker:ro \
gcr.io/cadvisor/cadvisor:latest
Example Prometheus alert: fire when container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.9 for 5m—action before OOM kill.
On Compose hosts, a lightweight pattern is cAdvisor + Prometheus + Grafana as a monitoring stack service in the same compose file, with recording rules for CPU throttling (container_cpu_cfs_throttled_seconds_total)—high throttle rate means your --cpus ceiling is too low.
Docker on RHEL / OpenShift
Red Hat Enterprise Linux 8+ and OpenShift do not ship Docker Engine as the supported container runtime. Podman (daemonless, rootless-first), Buildah (image builds), and CRI-O (Kubernetes CRI) form the supported stack—while your OCI images and Dockerfiles remain portable.
Tooling map
| Tool | Replaces | Role |
|---|---|---|
| Podman | docker run, docker ps | Daemonless container runtime; fork/exec per container |
| podman-docker | Docker CLI | Symlink package providing docker → podman compatibility |
| Buildah | docker build | OCI image build without a daemon; used under the hood by Podman |
| Skopeo | docker pull/push (inspect) | Copy images between registries without running a daemon |
| CRI-O | containerd + dockershim on K8s nodes | Lightweight OCI runtime for Kubernetes/OpenShift nodes |
Podman vs Docker Engine
| Aspect | Docker Engine | Podman (RHEL default) |
|---|---|---|
| Daemon | Central rootful dockerd | None—containers are child processes of your user session |
| Rootless | Opt-in | Default and recommended |
| socket exposure | /var/run/docker.sock = root equivalent | No central socket attack surface |
| Compose | docker compose | podman compose or podman-compose |
| systemd integration | Manual unit files | podman generate systemd for user units |
# RHEL: install compatibility shim
sudo dnf install podman podman-docker buildah skopeo
# Familiar workflow — docker is podman
docker run -d --name api -p 8080:8080 registry.example.com/api:1.4.2
docker ps
# Rootless podman (default for normal users)
podman run -d quay.io/podman/hello
# Generate systemd user service for a container
podman generate systemd --new --name api > ~/.config/systemd/user/api.service
systemctl --user enable --now api.service
OpenShift specifics
- CRI-O runs workloads on worker nodes; developers typically never SSH to nodes
- ImageStreams — OpenShift-native image abstraction: tracks tags, triggers redeploys on new push, mirrors external registries
- Restricted SCCs — Security Context Constraints replace ad-hoc --cap-add; containers run as arbitrary non-root UIDs
- BuildConfigs — cluster-native builds (Dockerfile, S2I) producing ImageStream tags—similar goals to CI pipelines elsewhere
- Routes — HAProxy-based ingress with TLS edge termination; analogous to Ingress + cert-manager on vanilla K8s
| OpenShift concept | Rough Docker / K8s equivalent |
|---|---|
| ImageStream | Internal registry + tag tracking + webhook trigger |
| ImageStreamTag | myapp:1.4.2 pointing at a digest |
| DeploymentConfig | Deployment + rollout trigger on IS change (legacy; prefer Deployment) |
| Route | Ingress + LoadBalancer hostname |
| SCC | PodSecurity + capabilities policy |
On RHEL/OpenShift, rootless Podman maps container root to an unprivileged host UID range (/etc/subuid). Bind mounts must respect mapped ownership—files owned by host root may be unreadable inside the container unless UID namespaces align.
Portable images, environment-specific runtime. Keep Dockerfiles CI-standard; let RHEL use Podman/Buildah and OpenShift use CRI-O + ImageStreams. Fighting for dockerd on RHEL means unsupported packages and SELinux friction—Podman is first-class on RHEL documentation and support contracts.