Docker in CI/CD

CI is where images are born—every merge should produce one immutable, scanned, signed artifact that staging and production promote, never rebuild. This guide covers GitHub Actions and GitLab CI pipelines, layer cache strategies for cold runners, Dockerfile linting with hadolint, and the promotion pattern that keeps prod deploys trustworthy.

developer devops BuildKit buildx OCI

Docker in GitHub Actions

GitHub-hosted runners ship with Docker pre-installed, but production pipelines use docker/setup-buildx-action for BuildKit features: parallel stages, cache mounts, multi-platform builds, and registry-backed layer cache. The canonical flow is lint → build → scan → push → sign—fail fast before an image ever reaches a registry.

Core actions in order

Action Purpose Why it matters
docker/setup-buildx-action Creates a BuildKit builder instance Enables GHA/registry cache, multi-arch, and advanced Dockerfile features
docker/login-action Authenticates to GHCR, ECR, Docker Hub, etc. Required before push: true; use OIDC for cloud registries
docker/metadata-action Generates tags and OCI labels from git context Consistent sha- tags, semver, branch pointers
docker/build-push-action Builds and optionally pushes in one step Outputs image digest for scan/sign steps downstream

Registry login

For GHCR, the built-in GITHUB_TOKEN works when the workflow has packages: write. For ECR or GAR, prefer OIDC federation—no long-lived access keys in repository secrets.

yaml
permissions:
  contents: read
  packages: write
  id-token: write   # required for cosign keyless / OIDC

- name: Log in to GHCR
  uses: docker/login-action@v3
  with:
    registry: ghcr.io
    username: ${{ github.actor }}
    password: ${{ secrets.GITHUB_TOKEN }}

Metadata and tagging

The metadata action turns git events into tag strings. Pin production deploys to type=sha tags; use type=raw,value=main only as a moving pointer for dev environments.

yaml
- name: Docker meta
  id: meta
  uses: docker/metadata-action@v5
  with:
    images: ghcr.io/${{ github.repository }}
    tags: |
      type=sha,prefix=sha-,format=short
      type=semver,pattern={{version}}
      type=raw,value=main,enable={{is_default_branch}}

Build, cache, and multi-arch

cache-from: type=gha and cache-to: type=gha,mode=max store BuildKit cache in GitHub's cache service—fast for single-repo workflows. For cross-runner or cross-workflow reuse at scale, add registry cache (type=registry,ref=…:buildcache).

Multi-arch builds set platforms: linux/amd64,linux/arm64. Apple Silicon developers and Graviton/ARM nodes in AWS both need arm64 images in the same manifest list.

yaml
- name: Set up Docker Buildx
  uses: docker/setup-buildx-action@v3

- name: Build and push
  id: build
  uses: docker/build-push-action@v6
  with:
    context: .
    push: true
    tags: ${{ steps.meta.outputs.tags }}
    labels: ${{ steps.meta.outputs.labels }}
    platforms: linux/amd64,linux/arm64
    cache-from: |
      type=gha
      type=registry,ref=ghcr.io/${{ github.repository }}:buildcache
    cache-to: |
      type=gha,mode=max
      type=registry,ref=ghcr.io/${{ github.repository }}:buildcache,mode=max

Full pipeline: hadolint → build → trivy → push → cosign

The workflow below is a production-ready template. PRs lint and build without pushing; merges to main publish a SHA-tagged image, block on critical CVEs, and sign with Sigstore keyless cosign so clusters can enforce signature policy.

yaml
name: Container CI

on:
  push:
    branches: [main]
    tags: ['v*']
  pull_request:
    branches: [main]

permissions:
  contents: read
  packages: write
  id-token: write
  attestations: write

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Lint Dockerfile
        uses: hadolint/hadolint-action@v3.1.0
        with:
          dockerfile: Dockerfile
          config: .hadolint.yaml
          failure-threshold: warning

  build-scan-push:
    runs-on: ubuntu-latest
    needs: [lint]
    if: github.event_name == 'push'
    steps:
      - uses: actions/checkout@v4

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Log in to GHCR
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Docker meta
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=sha,prefix=sha-,format=short
            type=semver,pattern={{version}}
            type=raw,value=main,enable={{is_default_branch}}

      - name: Build and push
        id: build
        uses: docker/build-push-action@v6
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          platforms: linux/amd64,linux/arm64
          cache-from: |
            type=gha
            type=registry,ref=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:buildcache
          cache-to: |
            type=gha,mode=max
            type=registry,ref=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:buildcache,mode=max

      - name: Scan image with Trivy
        uses: aquasecurity/trivy-action@0.28.0
        with:
          image-ref: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:sha-${{ github.sha }}
          format: table
          exit-code: 1
          severity: CRITICAL,HIGH
          ignore-unfixed: true

      - name: Sign image with cosign
        uses: sigstore/cosign-installer@v3
      - name: Cosign sign
        run: |
          cosign sign --yes \
            ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}@${{ steps.build.outputs.digest }}

      - name: Attest build provenance
        uses: actions/attest-build-provenance@v2
        with:
          subject-name: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          subject-digest: ${{ steps.build.outputs.digest }}
          push-to-registry: true

  pr-build:
    runs-on: ubuntu-latest
    needs: [lint]
    if: github.event_name == 'pull_request'
    steps:
      - uses: actions/checkout@v4
      - uses: docker/setup-buildx-action@v3
      - name: Build (no push)
        uses: docker/build-push-action@v6
        with:
          context: .
          push: false
          tags: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:pr-${{ github.event.pull_request.number }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
      - name: Scan PR image
        uses: aquasecurity/trivy-action@0.28.0
        with:
          image-ref: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:pr-${{ github.event.pull_request.number }}
          exit-code: 1
          severity: CRITICAL,HIGH
💡 Pro Tip

Always scan by digest (${{ steps.build.outputs.digest }}) when possible—tags are mutable pointers. Trivy's action accepts image-ref with digest for exact artifact verification.

🔒 Security

Cosign keyless signing binds the image to the GitHub OIDC identity of the workflow. Combined with Trivy gate and provenance attestation, you get a supply-chain trail: who built it, from which commit, with what vulnerabilities at publish time.

🎯 Interview Tip

"How do you speed up Docker builds in CI?" — Mention layer cache (GHA or registry), Dockerfile ordering (deps before source), multi-stage builds, and BuildKit cache mounts for package managers. Cold runners without cache rebuild every layer from scratch.

Docker in GitLab CI

GitLab CI offers three common Docker build strategies: Docker-in-Docker (dind), socket binding to the host daemon, and Kaniko (daemonless). Each trades security, speed, and compatibility differently—the platform team's choice affects every team's pipeline.

Docker-in-Docker (dind)

The job runs inside a docker:24 image and talks to a sibling docker:24-dind service. The dind container runs a nested Docker daemon— isolated from the host, but the job typically needs privileged: true on the runner.

yaml
build-image:
  image: docker:24-cli
  services:
    - docker:24-dind
  variables:
    DOCKER_TLS_CERTDIR: '/certs'
    DOCKER_HOST: tcp://docker:2376
    DOCKER_CERT_PATH: '$DOCKER_TLS_CERTDIR/client'
    DOCKER_DRIVER: overlay2
  before_script:
    - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
  script:
    - docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
⚠️ Pitfall

Privileged dind on shared runners is a security concern—a malicious Dockerfile or build script can escape the nested daemon in some configurations. Prefer Kaniko or dedicated isolated runners for untrusted code (fork MRs).

Socket binding risks

Mounting /var/run/docker.sock into the job container gives direct access to the host Docker daemon. Any command in the job can start privileged containers, mount host paths, or read secrets from other containers on the same host. This is fast and simple—but equivalent to root on the runner.

Approach Pros Cons
dind service Full Docker CLI; BuildKit; familiar Privileged runner; slower startup; TLS setup
Socket bind Fast; no nested daemon Host root equivalent; multi-tenant unsafe
Kaniko Daemonless; no privileged flag No docker run in same job; fewer BuildKit features
🔒 Security

Never mount docker.sock on shared GitLab runners. If you must use socket binding, restrict to dedicated single-tenant runners with MR pipelines from forks disabled or isolated review environments.

Kaniko (daemonless builds)

Kaniko executes each Dockerfile instruction in userspace, pushing layers directly to a registry—no Docker daemon required. Ideal for Kubernetes executors and security-conscious platforms.

yaml
build-kaniko:
  image:
    name: gcr.io/kaniko-project/executor:v1.23.2
    entrypoint: ['']
  script:
    - /kaniko/executor
        --context $CI_PROJECT_DIR
        --dockerfile $CI_PROJECT_DIR/Dockerfile
        --destination $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
        --cache=true
        --cache-repo=$CI_REGISTRY_IMAGE/cache

GitLab Container Registry

Every GitLab project gets a built-in OCI registry at registry.gitlab.com/<group>/<project>. CI jobs receive CI_REGISTRY_USER, CI_REGISTRY_PASSWORD, and CI_REGISTRY_IMAGE automatically—no extra secret wiring for basic push/pull.

Variable Value Use
CI_REGISTRY registry.gitlab.com Login target
CI_REGISTRY_IMAGE Full image path for project Tag and push destination
CI_COMMIT_SHA Git commit hash Immutable image tag (promotion source)
⚖️ Trade-off

GitHub Actions + buildx vs GitLab + Kaniko: GHA's ecosystem (metadata, GHA cache, attestations) is richer for multi-repo orgs. GitLab's integrated registry and Kaniko default suit self-hosted, security-first platforms. Both should produce the same artifact contract: SHA-tagged, scanned, promoted—not rebuilt per environment.

Layer cache in CI

CI runners are ephemeral—every job starts with an empty local layer store unless you explicitly import cache. Without cache, a five-minute local build becomes a twenty-minute pipeline. The fix is choosing the right cache backend and ordering Dockerfile instructions so the slow layers change least often.

Cache backends compared

Backend Mechanism Best for Limitation
GHA cache type=gha — GitHub Actions cache API Single-repo GHA workflows; simple setup 10 GB limit per repo; evicted after 7 days inactive
Registry cache type=registry,ref=…:buildcache Cross-runner, cross-workflow, self-hosted runners Requires registry write; cache image grows over time
Inline cache Cache metadata embedded in pushed image manifest Reuse cache from last production image tag Only works if you push with cache-to: inline
Local (none) Runner's Docker layer store only Self-hosted runners with persistent disks Lost on every GitHub-hosted runner spin-up

Registry cache

Push a dedicated :buildcache tag (or separate cache repo) alongside your app image. Subsequent builds pull cache metadata before compiling layers—runners share cache even when GHA cache is cold or evicted.

yaml
cache-from: type=registry,ref=ghcr.io/my-org/api:buildcache
cache-to: type=registry,ref=ghcr.io/my-org/api:buildcache,mode=max

Inline cache

Inline cache stores BuildKit cache metadata in the image you already push. The next build references that image as cache source—useful when you cannot maintain a separate cache tag.

yaml
cache-from: type=registry,ref=ghcr.io/my-org/api:main
cache-to: type=inline

GHA cache

Zero extra infrastructure—add type=gha to build-push-action. Use mode=max to cache all intermediate layers, not just final stage. Combine with registry cache for resilience when GHA cache misses.

Dockerfile ordering for cache hits

Docker invalidates cache from the first changed instruction downward. Order instructions from least frequent change to most frequent change:

💡 Pro Tip

Put COPY . . as late as possible. A single early COPY . . before RUN npm ci busts the dependency layer on every source edit—one of the most common CI slowdowns.

🔬 Under the Hood

BuildKit cache is not the same as layer reuse in classic builder. Cache entries store LLB (low-level build) snapshots—including RUN --mount=type=cache mount contents—that can be imported across machines when exported to GHA or registry backends.

Hadolint

Hadolint is a Dockerfile linter that encodes Docker and ShellCheck best practices as rule codes. Run it in CI before docker build—a bad Dockerfile wastes runner minutes and ships anti-patterns (unpinned packages, root user, curl|bash) into production images.

Key rules: DL3008 and DL3025

Rule Message Why it matters Fix
DL3008 Pin versions in apt-get install Unpinned apt install pulls whatever is latest—non-reproducible builds, surprise breakages apt-get install -y curl=7.88.1-10+deb12u5 or use version pinning in apt preferences
DL3025 Use arguments JSON notation for CMD and ENTRYPOINT Shell form (CMD npm start) wraps in /bin/sh -c—signals don't reach the app, PID 1 issues CMD ["npm", "start"] exec form

Other high-value rules

  • DL3006 — Always tag the FROM image explicitly (no implicit latest)
  • DL3009 — Delete apt cache after install to reduce layer size
  • DL3018 — Pin versions in apk add
  • DL3020 — Use COPY instead of ADD for plain files
  • DL3045 — Do not use USER root without justification
  • SC2015 — ShellCheck: avoid A && B || C trap patterns in RUN

.hadolint.yaml configuration

Place at repo root. Ignore rules that conflict with your base image strategy, but document why— silent ignores accumulate into tech debt.

yaml
failure-threshold: warning

ignored:
  - DL3018   # alpine: we pin in a separate lock step

override:
  error:
    - DL3008
    - DL3025
    - DL3006

trustedRegistries:
  - docker.io
  - ghcr.io
  - gcr.io

CI integration

Run hadolint as the first job—fail the pipeline before any build consumes cache or registry quota.

bash
# Local
docker run --rm -i hadolint/hadolint < Dockerfile

# GitHub Actions
- uses: hadolint/hadolint-action@v3.1.0
  with:
    dockerfile: Dockerfile
    config: .hadolint.yaml

# GitLab CI
hadolint:
  image: hadolint/hadolint:latest-debian
  script:
    - hadolint --config .hadolint.yaml Dockerfile
📦 Real World

Teams often set failure-threshold: warning during adoption, then tighten to error once existing Dockerfiles are cleaned up. Override DL3008/DL3025 to error immediately—they directly affect reproducibility and signal handling in production.

Image promotion pattern

The cardinal rule of container CD: build once, promote many. CI produces a single image tagged with the git SHA. Staging and production retag that exact digest— they never run docker build again. What ran in staging is byte-identical to prod.

Why rebuild in prod is an anti-pattern

Rebuild per env Build once + promote
Different layer hashes per environment Same digest everywhere
Staging test ≠ production artifact Staging validates the exact prod binary
Cannot trace prod image to CI run SHA tag maps 1:1 to git commit + CI job
Re-scan/re-sign on every env deploy Scan and sign once at publish

Build once with SHA tag

On merge to main, CI pushes ghcr.io/my-org/api:sha-7f3a2b1 (and optionally records the digest). This is the golden artifact—all downstream deploys reference it.

Retag staging → prod

Promotion is a registry metadata operation—copy manifest from SHA tag to environment tag. No compiler runs; no apt-get update executes; no supply-chain surface is reintroduced.

bash
# Promote by retagging (same digest, new pointer)
SOURCE=ghcr.io/my-org/api:sha-7f3a2b1
TARGET=ghcr.io/my-org/api:prod-7f3a2b1

docker buildx imagetools create -t $TARGET $SOURCE

# Or with crane (no local docker daemon)
crane copy ghcr.io/my-org/api:sha-7f3a2b1 ghcr.io/my-org/api:v1.4.2

# Deploy job only updates the image reference — never builds
# kubectl set image deploy/api api=ghcr.io/my-org/api:sha-7f3a2b1

GitHub Actions promotion job

yaml
promote-prod:
  runs-on: ubuntu-latest
  needs: [deploy-staging]
  environment: production
  steps:
    - uses: docker/login-action@v3
      with:
        registry: ghcr.io
        username: ${{ github.actor }}
        password: ${{ secrets.GITHUB_TOKEN }}
    - name: Promote SHA image to prod tag
      run: |
        docker buildx imagetools create \
          -t ghcr.io/${{ github.repository }}:prod-${{ github.sha }} \
          -t ghcr.io/${{ github.repository }}:v${{ github.ref_name }} \
          ghcr.io/${{ github.repository }}:sha-${{ github.sha }}
⚠️ Pitfall

Using mutable tags like :latest or :prod as the only deploy reference makes rollbacks ambiguous—you cannot know which SHA is running. Use mutable tags as convenience pointers, but pin deploy manifests to SHA or digest.

📦 Real World

Mature teams gate promotion on staging health checks + manual approval (GitHub environment protection rules). The promote job is a one-liner imagetools copy—no Dockerfile, no build-args, no "prod-only" config baked at build time. Environment-specific config belongs in runtime (env vars, ConfigMaps, secrets), not in a second image build.