Docker in CI/CD
CI is where images are born—every merge should produce one immutable, scanned, signed artifact that staging and production promote, never rebuild. This guide covers GitHub Actions and GitLab CI pipelines, layer cache strategies for cold runners, Dockerfile linting with hadolint, and the promotion pattern that keeps prod deploys trustworthy.
Docker in GitHub Actions
GitHub-hosted runners ship with Docker pre-installed, but production pipelines use docker/setup-buildx-action for BuildKit features: parallel stages, cache mounts, multi-platform builds, and registry-backed layer cache. The canonical flow is lint → build → scan → push → sign—fail fast before an image ever reaches a registry.
Core actions in order
| Action | Purpose | Why it matters |
|---|---|---|
| docker/setup-buildx-action | Creates a BuildKit builder instance | Enables GHA/registry cache, multi-arch, and advanced Dockerfile features |
| docker/login-action | Authenticates to GHCR, ECR, Docker Hub, etc. | Required before push: true; use OIDC for cloud registries |
| docker/metadata-action | Generates tags and OCI labels from git context | Consistent sha- tags, semver, branch pointers |
| docker/build-push-action | Builds and optionally pushes in one step | Outputs image digest for scan/sign steps downstream |
Registry login
For GHCR, the built-in GITHUB_TOKEN works when the workflow has packages: write. For ECR or GAR, prefer OIDC federation—no long-lived access keys in repository secrets.
permissions:
contents: read
packages: write
id-token: write # required for cosign keyless / OIDC
- name: Log in to GHCR
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
Metadata and tagging
The metadata action turns git events into tag strings. Pin production deploys to type=sha tags; use type=raw,value=main only as a moving pointer for dev environments.
- name: Docker meta
id: meta
uses: docker/metadata-action@v5
with:
images: ghcr.io/${{ github.repository }}
tags: |
type=sha,prefix=sha-,format=short
type=semver,pattern={{version}}
type=raw,value=main,enable={{is_default_branch}}
Build, cache, and multi-arch
cache-from: type=gha and cache-to: type=gha,mode=max store BuildKit cache in GitHub's cache service—fast for single-repo workflows. For cross-runner or cross-workflow reuse at scale, add registry cache (type=registry,ref=…:buildcache).
Multi-arch builds set platforms: linux/amd64,linux/arm64. Apple Silicon developers and Graviton/ARM nodes in AWS both need arm64 images in the same manifest list.
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build and push
id: build
uses: docker/build-push-action@v6
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
platforms: linux/amd64,linux/arm64
cache-from: |
type=gha
type=registry,ref=ghcr.io/${{ github.repository }}:buildcache
cache-to: |
type=gha,mode=max
type=registry,ref=ghcr.io/${{ github.repository }}:buildcache,mode=max
Full pipeline: hadolint → build → trivy → push → cosign
The workflow below is a production-ready template. PRs lint and build without pushing; merges to main publish a SHA-tagged image, block on critical CVEs, and sign with Sigstore keyless cosign so clusters can enforce signature policy.
name: Container CI
on:
push:
branches: [main]
tags: ['v*']
pull_request:
branches: [main]
permissions:
contents: read
packages: write
id-token: write
attestations: write
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Lint Dockerfile
uses: hadolint/hadolint-action@v3.1.0
with:
dockerfile: Dockerfile
config: .hadolint.yaml
failure-threshold: warning
build-scan-push:
runs-on: ubuntu-latest
needs: [lint]
if: github.event_name == 'push'
steps:
- uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Log in to GHCR
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Docker meta
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=sha,prefix=sha-,format=short
type=semver,pattern={{version}}
type=raw,value=main,enable={{is_default_branch}}
- name: Build and push
id: build
uses: docker/build-push-action@v6
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
platforms: linux/amd64,linux/arm64
cache-from: |
type=gha
type=registry,ref=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:buildcache
cache-to: |
type=gha,mode=max
type=registry,ref=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:buildcache,mode=max
- name: Scan image with Trivy
uses: aquasecurity/trivy-action@0.28.0
with:
image-ref: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:sha-${{ github.sha }}
format: table
exit-code: 1
severity: CRITICAL,HIGH
ignore-unfixed: true
- name: Sign image with cosign
uses: sigstore/cosign-installer@v3
- name: Cosign sign
run: |
cosign sign --yes \
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}@${{ steps.build.outputs.digest }}
- name: Attest build provenance
uses: actions/attest-build-provenance@v2
with:
subject-name: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
subject-digest: ${{ steps.build.outputs.digest }}
push-to-registry: true
pr-build:
runs-on: ubuntu-latest
needs: [lint]
if: github.event_name == 'pull_request'
steps:
- uses: actions/checkout@v4
- uses: docker/setup-buildx-action@v3
- name: Build (no push)
uses: docker/build-push-action@v6
with:
context: .
push: false
tags: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:pr-${{ github.event.pull_request.number }}
cache-from: type=gha
cache-to: type=gha,mode=max
- name: Scan PR image
uses: aquasecurity/trivy-action@0.28.0
with:
image-ref: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:pr-${{ github.event.pull_request.number }}
exit-code: 1
severity: CRITICAL,HIGH
Always scan by digest (${{ steps.build.outputs.digest }}) when possible—tags are mutable pointers. Trivy's action accepts image-ref with digest for exact artifact verification.
Cosign keyless signing binds the image to the GitHub OIDC identity of the workflow. Combined with Trivy gate and provenance attestation, you get a supply-chain trail: who built it, from which commit, with what vulnerabilities at publish time.
"How do you speed up Docker builds in CI?" — Mention layer cache (GHA or registry), Dockerfile ordering (deps before source), multi-stage builds, and BuildKit cache mounts for package managers. Cold runners without cache rebuild every layer from scratch.
Docker in GitLab CI
GitLab CI offers three common Docker build strategies: Docker-in-Docker (dind), socket binding to the host daemon, and Kaniko (daemonless). Each trades security, speed, and compatibility differently—the platform team's choice affects every team's pipeline.
Docker-in-Docker (dind)
The job runs inside a docker:24 image and talks to a sibling docker:24-dind service. The dind container runs a nested Docker daemon— isolated from the host, but the job typically needs privileged: true on the runner.
build-image:
image: docker:24-cli
services:
- docker:24-dind
variables:
DOCKER_TLS_CERTDIR: '/certs'
DOCKER_HOST: tcp://docker:2376
DOCKER_CERT_PATH: '$DOCKER_TLS_CERTDIR/client'
DOCKER_DRIVER: overlay2
before_script:
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
script:
- docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
- docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
Privileged dind on shared runners is a security concern—a malicious Dockerfile or build script can escape the nested daemon in some configurations. Prefer Kaniko or dedicated isolated runners for untrusted code (fork MRs).
Socket binding risks
Mounting /var/run/docker.sock into the job container gives direct access to the host Docker daemon. Any command in the job can start privileged containers, mount host paths, or read secrets from other containers on the same host. This is fast and simple—but equivalent to root on the runner.
| Approach | Pros | Cons |
|---|---|---|
| dind service | Full Docker CLI; BuildKit; familiar | Privileged runner; slower startup; TLS setup |
| Socket bind | Fast; no nested daemon | Host root equivalent; multi-tenant unsafe |
| Kaniko | Daemonless; no privileged flag | No docker run in same job; fewer BuildKit features |
Never mount docker.sock on shared GitLab runners. If you must use socket binding, restrict to dedicated single-tenant runners with MR pipelines from forks disabled or isolated review environments.
Kaniko (daemonless builds)
Kaniko executes each Dockerfile instruction in userspace, pushing layers directly to a registry—no Docker daemon required. Ideal for Kubernetes executors and security-conscious platforms.
build-kaniko:
image:
name: gcr.io/kaniko-project/executor:v1.23.2
entrypoint: ['']
script:
- /kaniko/executor
--context $CI_PROJECT_DIR
--dockerfile $CI_PROJECT_DIR/Dockerfile
--destination $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
--cache=true
--cache-repo=$CI_REGISTRY_IMAGE/cache
GitLab Container Registry
Every GitLab project gets a built-in OCI registry at registry.gitlab.com/<group>/<project>. CI jobs receive CI_REGISTRY_USER, CI_REGISTRY_PASSWORD, and CI_REGISTRY_IMAGE automatically—no extra secret wiring for basic push/pull.
| Variable | Value | Use |
|---|---|---|
| CI_REGISTRY | registry.gitlab.com | Login target |
| CI_REGISTRY_IMAGE | Full image path for project | Tag and push destination |
| CI_COMMIT_SHA | Git commit hash | Immutable image tag (promotion source) |
GitHub Actions + buildx vs GitLab + Kaniko: GHA's ecosystem (metadata, GHA cache, attestations) is richer for multi-repo orgs. GitLab's integrated registry and Kaniko default suit self-hosted, security-first platforms. Both should produce the same artifact contract: SHA-tagged, scanned, promoted—not rebuilt per environment.
Layer cache in CI
CI runners are ephemeral—every job starts with an empty local layer store unless you explicitly import cache. Without cache, a five-minute local build becomes a twenty-minute pipeline. The fix is choosing the right cache backend and ordering Dockerfile instructions so the slow layers change least often.
Cache backends compared
| Backend | Mechanism | Best for | Limitation |
|---|---|---|---|
| GHA cache | type=gha — GitHub Actions cache API | Single-repo GHA workflows; simple setup | 10 GB limit per repo; evicted after 7 days inactive |
| Registry cache | type=registry,ref=…:buildcache | Cross-runner, cross-workflow, self-hosted runners | Requires registry write; cache image grows over time |
| Inline cache | Cache metadata embedded in pushed image manifest | Reuse cache from last production image tag | Only works if you push with cache-to: inline |
| Local (none) | Runner's Docker layer store only | Self-hosted runners with persistent disks | Lost on every GitHub-hosted runner spin-up |
Registry cache
Push a dedicated :buildcache tag (or separate cache repo) alongside your app image. Subsequent builds pull cache metadata before compiling layers—runners share cache even when GHA cache is cold or evicted.
cache-from: type=registry,ref=ghcr.io/my-org/api:buildcache
cache-to: type=registry,ref=ghcr.io/my-org/api:buildcache,mode=max
Inline cache
Inline cache stores BuildKit cache metadata in the image you already push. The next build references that image as cache source—useful when you cannot maintain a separate cache tag.
cache-from: type=registry,ref=ghcr.io/my-org/api:main
cache-to: type=inline
GHA cache
Zero extra infrastructure—add type=gha to build-push-action. Use mode=max to cache all intermediate layers, not just final stage. Combine with registry cache for resilience when GHA cache misses.
Dockerfile ordering for cache hits
Docker invalidates cache from the first changed instruction downward. Order instructions from least frequent change to most frequent change:
1. FROM (pin digest for reproducibility) 2. RUN apt/apk install system deps ← changes rarely 3. COPY package.json / lock files ← changes when deps change 4. RUN npm ci / pip install ← cache hit if lock unchanged 5. COPY source code ← changes every commit 6. RUN build / compile ← invalidated with source 7. COPY --from=builder (multi-stage) ← final slim image
Put COPY . . as late as possible. A single early COPY . . before RUN npm ci busts the dependency layer on every source edit—one of the most common CI slowdowns.
BuildKit cache is not the same as layer reuse in classic builder. Cache entries store LLB (low-level build) snapshots—including RUN --mount=type=cache mount contents—that can be imported across machines when exported to GHA or registry backends.
Hadolint
Hadolint is a Dockerfile linter that encodes Docker and ShellCheck best practices as rule codes. Run it in CI before docker build—a bad Dockerfile wastes runner minutes and ships anti-patterns (unpinned packages, root user, curl|bash) into production images.
Key rules: DL3008 and DL3025
| Rule | Message | Why it matters | Fix |
|---|---|---|---|
| DL3008 | Pin versions in apt-get install | Unpinned apt install pulls whatever is latest—non-reproducible builds, surprise breakages | apt-get install -y curl=7.88.1-10+deb12u5 or use version pinning in apt preferences |
| DL3025 | Use arguments JSON notation for CMD and ENTRYPOINT | Shell form (CMD npm start) wraps in /bin/sh -c—signals don't reach the app, PID 1 issues | CMD ["npm", "start"] exec form |
Other high-value rules
- DL3006 — Always tag the FROM image explicitly (no implicit latest)
- DL3009 — Delete apt cache after install to reduce layer size
- DL3018 — Pin versions in apk add
- DL3020 — Use COPY instead of ADD for plain files
- DL3045 — Do not use USER root without justification
- SC2015 — ShellCheck: avoid A && B || C trap patterns in RUN
.hadolint.yaml configuration
Place at repo root. Ignore rules that conflict with your base image strategy, but document why— silent ignores accumulate into tech debt.
failure-threshold: warning
ignored:
- DL3018 # alpine: we pin in a separate lock step
override:
error:
- DL3008
- DL3025
- DL3006
trustedRegistries:
- docker.io
- ghcr.io
- gcr.io
CI integration
Run hadolint as the first job—fail the pipeline before any build consumes cache or registry quota.
# Local
docker run --rm -i hadolint/hadolint < Dockerfile
# GitHub Actions
- uses: hadolint/hadolint-action@v3.1.0
with:
dockerfile: Dockerfile
config: .hadolint.yaml
# GitLab CI
hadolint:
image: hadolint/hadolint:latest-debian
script:
- hadolint --config .hadolint.yaml Dockerfile
Teams often set failure-threshold: warning during adoption, then tighten to error once existing Dockerfiles are cleaned up. Override DL3008/DL3025 to error immediately—they directly affect reproducibility and signal handling in production.
Image promotion pattern
The cardinal rule of container CD: build once, promote many. CI produces a single image tagged with the git SHA. Staging and production retag that exact digest— they never run docker build again. What ran in staging is byte-identical to prod.
Why rebuild in prod is an anti-pattern
| Rebuild per env | Build once + promote |
|---|---|
| Different layer hashes per environment | Same digest everywhere |
| Staging test ≠ production artifact | Staging validates the exact prod binary |
| Cannot trace prod image to CI run | SHA tag maps 1:1 to git commit + CI job |
| Re-scan/re-sign on every env deploy | Scan and sign once at publish |
Build once with SHA tag
On merge to main, CI pushes ghcr.io/my-org/api:sha-7f3a2b1 (and optionally records the digest). This is the golden artifact—all downstream deploys reference it.
git push main
│
▼
┌─────────────┐ tag: sha-7f3a2b1
│ CI build │ ──► ghcr.io/my-org/api:sha-7f3a2b1 (digest abc123…)
└─────────────┘
│
│ promote (retag same digest, no rebuild)
▼
┌─────────────┐ tag: staging-7f3a2b1 or staging (pointer)
│ Staging │ ──► deploy sha-7f3a2b1
└─────────────┘
│ smoke / integration tests pass
▼
┌─────────────┐ tag: prod-7f3a2b1 or v1.4.2
│ Production │ ──► deploy sha-7f3a2b1 (SAME digest abc123…)
└─────────────┘
Retag staging → prod
Promotion is a registry metadata operation—copy manifest from SHA tag to environment tag. No compiler runs; no apt-get update executes; no supply-chain surface is reintroduced.
# Promote by retagging (same digest, new pointer)
SOURCE=ghcr.io/my-org/api:sha-7f3a2b1
TARGET=ghcr.io/my-org/api:prod-7f3a2b1
docker buildx imagetools create -t $TARGET $SOURCE
# Or with crane (no local docker daemon)
crane copy ghcr.io/my-org/api:sha-7f3a2b1 ghcr.io/my-org/api:v1.4.2
# Deploy job only updates the image reference — never builds
# kubectl set image deploy/api api=ghcr.io/my-org/api:sha-7f3a2b1
GitHub Actions promotion job
promote-prod:
runs-on: ubuntu-latest
needs: [deploy-staging]
environment: production
steps:
- uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Promote SHA image to prod tag
run: |
docker buildx imagetools create \
-t ghcr.io/${{ github.repository }}:prod-${{ github.sha }} \
-t ghcr.io/${{ github.repository }}:v${{ github.ref_name }} \
ghcr.io/${{ github.repository }}:sha-${{ github.sha }}
Using mutable tags like :latest or :prod as the only deploy reference makes rollbacks ambiguous—you cannot know which SHA is running. Use mutable tags as convenience pointers, but pin deploy manifests to SHA or digest.
Mature teams gate promotion on staging health checks + manual approval (GitHub environment protection rules). The promote job is a one-liner imagetools copy—no Dockerfile, no build-args, no "prod-only" config baked at build time. Environment-specific config belongs in runtime (env vars, ConfigMaps, secrets), not in a second image build.