Docker in CI/CD

Docker in GitHub Actions

GitHub-hosted runners ship with Docker pre-installed, but production pipelines use docker/setup-buildx-action for BuildKit features: parallel stages, cache mounts, multi-platform builds, and registry-backed layer cache. The canonical flow is lint → build → scan → push → sign—fail fast before an image ever reaches a registry.

Core actions in order

Action	Purpose	Why it matters
docker/setup-buildx-action	Creates a BuildKit builder instance	Enables GHA/registry cache, multi-arch, and advanced Dockerfile features
docker/login-action	Authenticates to GHCR, ECR, Docker Hub, etc.	Required before push: true; use OIDC for cloud registries
docker/metadata-action	Generates tags and OCI labels from git context	Consistent sha- tags, semver, branch pointers
docker/build-push-action	Builds and optionally pushes in one step	Outputs image digest for scan/sign steps downstream

Registry login

For GHCR, the built-in GITHUB_TOKEN works when the workflow has packages: write. For ECR or GAR, prefer OIDC federation—no long-lived access keys in repository secrets.

permissions:
  contents: read
  packages: write
  id-token: write   # required for cosign keyless / OIDC

- name: Log in to GHCR
  uses: docker/login-action@v3
  with:
    registry: ghcr.io
    username: ${{ github.actor }}
    password: ${{ secrets.GITHUB_TOKEN }}

Metadata and tagging

The metadata action turns git events into tag strings. Pin production deploys to type=sha tags; use type=raw,value=main only as a moving pointer for dev environments.

- name: Docker meta
  id: meta
  uses: docker/metadata-action@v5
  with:
    images: ghcr.io/${{ github.repository }}
    tags: |
      type=sha,prefix=sha-,format=short
      type=semver,pattern={{version}}
      type=raw,value=main,enable={{is_default_branch}}

Build, cache, and multi-arch

cache-from: type=gha and cache-to: type=gha,mode=max store BuildKit cache in GitHub's cache service—fast for single-repo workflows. For cross-runner or cross-workflow reuse at scale, add registry cache (type=registry,ref=…:buildcache).

Multi-arch builds set platforms: linux/amd64,linux/arm64. Apple Silicon developers and Graviton/ARM nodes in AWS both need arm64 images in the same manifest list.

- name: Set up Docker Buildx
  uses: docker/setup-buildx-action@v3

- name: Build and push
  id: build
  uses: docker/build-push-action@v6
  with:
    context: .
    push: true
    tags: ${{ steps.meta.outputs.tags }}
    labels: ${{ steps.meta.outputs.labels }}
    platforms: linux/amd64,linux/arm64
    cache-from: |
      type=gha
      type=registry,ref=ghcr.io/${{ github.repository }}:buildcache
    cache-to: |
      type=gha,mode=max
      type=registry,ref=ghcr.io/${{ github.repository }}:buildcache,mode=max

Full pipeline: hadolint → build → trivy → push → cosign

The workflow below is a production-ready template. PRs lint and build without pushing; merges to main publish a SHA-tagged image, block on critical CVEs, and sign with Sigstore keyless cosign so clusters can enforce signature policy.

name: Container CI

on:
  push:
    branches: [main]
    tags: ['v*']
  pull_request:
    branches: [main]

permissions:
  contents: read
  packages: write
  id-token: write
  attestations: write

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Lint Dockerfile
        uses: hadolint/hadolint-action@v3.1.0
        with:
          dockerfile: Dockerfile
          config: .hadolint.yaml
          failure-threshold: warning

  build-scan-push:
    runs-on: ubuntu-latest
    needs: [lint]
    if: github.event_name == 'push'
    steps:
      - uses: actions/checkout@v4

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Log in to GHCR
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Docker meta
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=sha,prefix=sha-,format=short
            type=semver,pattern={{version}}
            type=raw,value=main,enable={{is_default_branch}}

      - name: Build and push
        id: build
        uses: docker/build-push-action@v6
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          platforms: linux/amd64,linux/arm64
          cache-from: |
            type=gha
            type=registry,ref=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:buildcache
          cache-to: |
            type=gha,mode=max
            type=registry,ref=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:buildcache,mode=max

      - name: Scan image with Trivy
        uses: aquasecurity/trivy-action@0.28.0
        with:
          image-ref: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:sha-${{ github.sha }}
          format: table
          exit-code: 1
          severity: CRITICAL,HIGH
          ignore-unfixed: true

      - name: Sign image with cosign
        uses: sigstore/cosign-installer@v3
      - name: Cosign sign
        run: |
          cosign sign --yes \
            ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}@${{ steps.build.outputs.digest }}

      - name: Attest build provenance
        uses: actions/attest-build-provenance@v2
        with:
          subject-name: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          subject-digest: ${{ steps.build.outputs.digest }}
          push-to-registry: true

  pr-build:
    runs-on: ubuntu-latest
    needs: [lint]
    if: github.event_name == 'pull_request'
    steps:
      - uses: actions/checkout@v4
      - uses: docker/setup-buildx-action@v3
      - name: Build (no push)
        uses: docker/build-push-action@v6
        with:
          context: .
          push: false
          tags: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:pr-${{ github.event.pull_request.number }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
      - name: Scan PR image
        uses: aquasecurity/trivy-action@0.28.0
        with:
          image-ref: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:pr-${{ github.event.pull_request.number }}
          exit-code: 1
          severity: CRITICAL,HIGH

💡 Pro Tip

Always scan by digest (${{ steps.build.outputs.digest }}) when possible—tags are mutable pointers. Trivy's action accepts image-ref with digest for exact artifact verification.

🔒 Security

Cosign keyless signing binds the image to the GitHub OIDC identity of the workflow. Combined with Trivy gate and provenance attestation, you get a supply-chain trail: who built it, from which commit, with what vulnerabilities at publish time.

🎯 Interview Tip

"How do you speed up Docker builds in CI?" — Mention layer cache (GHA or registry), Dockerfile ordering (deps before source), multi-stage builds, and BuildKit cache mounts for package managers. Cold runners without cache rebuild every layer from scratch.

Docker in GitLab CI

GitLab CI offers three common Docker build strategies: Docker-in-Docker (dind), socket binding to the host daemon, and Kaniko (daemonless). Each trades security, speed, and compatibility differently—the platform team's choice affects every team's pipeline.

Docker-in-Docker (dind)

The job runs inside a docker:24 image and talks to a sibling docker:24-dind service. The dind container runs a nested Docker daemon— isolated from the host, but the job typically needs privileged: true on the runner.

build-image:
  image: docker:24-cli
  services:
    - docker:24-dind
  variables:
    DOCKER_TLS_CERTDIR: '/certs'
    DOCKER_HOST: tcp://docker:2376
    DOCKER_CERT_PATH: '$DOCKER_TLS_CERTDIR/client'
    DOCKER_DRIVER: overlay2
  before_script:
    - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
  script:
    - docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA

⚠️ Pitfall

Privileged dind on shared runners is a security concern—a malicious Dockerfile or build script can escape the nested daemon in some configurations. Prefer Kaniko or dedicated isolated runners for untrusted code (fork MRs).

Socket binding risks

Mounting /var/run/docker.sock into the job container gives direct access to the host Docker daemon. Any command in the job can start privileged containers, mount host paths, or read secrets from other containers on the same host. This is fast and simple—but equivalent to root on the runner.

Approach	Pros	Cons
dind service	Full Docker CLI; BuildKit; familiar	Privileged runner; slower startup; TLS setup
Socket bind	Fast; no nested daemon	Host root equivalent; multi-tenant unsafe
Kaniko	Daemonless; no privileged flag	No docker run in same job; fewer BuildKit features

🔒 Security

Never mount docker.sock on shared GitLab runners. If you must use socket binding, restrict to dedicated single-tenant runners with MR pipelines from forks disabled or isolated review environments.

Kaniko (daemonless builds)

Kaniko executes each Dockerfile instruction in userspace, pushing layers directly to a registry—no Docker daemon required. Ideal for Kubernetes executors and security-conscious platforms.

build-kaniko:
  image:
    name: gcr.io/kaniko-project/executor:v1.23.2
    entrypoint: ['']
  script:
    - /kaniko/executor
        --context $CI_PROJECT_DIR
        --dockerfile $CI_PROJECT_DIR/Dockerfile
        --destination $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
        --cache=true
        --cache-repo=$CI_REGISTRY_IMAGE/cache

GitLab Container Registry

Every GitLab project gets a built-in OCI registry at registry.gitlab.com/<group>/<project>. CI jobs receive CI_REGISTRY_USER, CI_REGISTRY_PASSWORD, and CI_REGISTRY_IMAGE automatically—no extra secret wiring for basic push/pull.

Variable	Value	Use
CI_REGISTRY	registry.gitlab.com	Login target
CI_REGISTRY_IMAGE	Full image path for project	Tag and push destination
CI_COMMIT_SHA	Git commit hash	Immutable image tag (promotion source)

⚖️ Trade-off

GitHub Actions + buildx vs GitLab + Kaniko: GHA's ecosystem (metadata, GHA cache, attestations) is richer for multi-repo orgs. GitLab's integrated registry and Kaniko default suit self-hosted, security-first platforms. Both should produce the same artifact contract: SHA-tagged, scanned, promoted—not rebuilt per environment.

Layer cache in CI

CI runners are ephemeral—every job starts with an empty local layer store unless you explicitly import cache. Without cache, a five-minute local build becomes a twenty-minute pipeline. The fix is choosing the right cache backend and ordering Dockerfile instructions so the slow layers change least often.

Cache backends compared

Backend	Mechanism	Best for	Limitation
GHA cache	type=gha — GitHub Actions cache API	Single-repo GHA workflows; simple setup	10 GB limit per repo; evicted after 7 days inactive
Registry cache	type=registry,ref=…:buildcache	Cross-runner, cross-workflow, self-hosted runners	Requires registry write; cache image grows over time
Inline cache	Cache metadata embedded in pushed image manifest	Reuse cache from last production image tag	Only works if you push with cache-to: inline
Local (none)	Runner's Docker layer store only	Self-hosted runners with persistent disks	Lost on every GitHub-hosted runner spin-up

Registry cache

Push a dedicated :buildcache tag (or separate cache repo) alongside your app image. Subsequent builds pull cache metadata before compiling layers—runners share cache even when GHA cache is cold or evicted.

cache-from: type=registry,ref=ghcr.io/my-org/api:buildcache
cache-to: type=registry,ref=ghcr.io/my-org/api:buildcache,mode=max

Inline cache

Inline cache stores BuildKit cache metadata in the image you already push. The next build references that image as cache source—useful when you cannot maintain a separate cache tag.

cache-from: type=registry,ref=ghcr.io/my-org/api:main
cache-to: type=inline

GHA cache

Zero extra infrastructure—add type=gha to build-push-action. Use mode=max to cache all intermediate layers, not just final stage. Combine with registry cache for resilience when GHA cache misses.

Dockerfile ordering for cache hits

Docker invalidates cache from the first changed instruction downward. Order instructions from least frequent change to most frequent change:

1. FROM (pin digest for reproducibility)
2. RUN apt/apk install system deps     ← changes rarely
3. COPY package.json / lock files      ← changes when deps change
4. RUN npm ci / pip install            ← cache hit if lock unchanged
5. COPY source code                    ← changes every commit
6. RUN build / compile                 ← invalidated with source
7. COPY --from=builder (multi-stage)   ← final slim image

💡 Pro Tip

Put COPY . . as late as possible. A single early COPY . . before RUN npm ci busts the dependency layer on every source edit—one of the most common CI slowdowns.

🔬 Under the Hood

BuildKit cache is not the same as layer reuse in classic builder. Cache entries store LLB (low-level build) snapshots—including RUN --mount=type=cache mount contents—that can be imported across machines when exported to GHA or registry backends.

Hadolint

Hadolint is a Dockerfile linter that encodes Docker and ShellCheck best practices as rule codes. Run it in CI before docker build—a bad Dockerfile wastes runner minutes and ships anti-patterns (unpinned packages, root user, curl|bash) into production images.

Key rules: DL3008 and DL3025

Rule	Message	Why it matters	Fix
DL3008	Pin versions in apt-get install	Unpinned apt install pulls whatever is latest—non-reproducible builds, surprise breakages	apt-get install -y curl=7.88.1-10+deb12u5 or use version pinning in apt preferences
DL3025	Use arguments JSON notation for CMD and ENTRYPOINT	Shell form (CMD npm start) wraps in /bin/sh -c—signals don't reach the app, PID 1 issues	CMD ["npm", "start"] exec form

Other high-value rules

DL3006 — Always tag the FROM image explicitly (no implicit latest)
DL3009 — Delete apt cache after install to reduce layer size
DL3018 — Pin versions in apk add
DL3020 — Use COPY instead of ADD for plain files
DL3045 — Do not use USER root without justification
SC2015 — ShellCheck: avoid A && B || C trap patterns in RUN

.hadolint.yaml configuration

Place at repo root. Ignore rules that conflict with your base image strategy, but document why— silent ignores accumulate into tech debt.

failure-threshold: warning

ignored:
  - DL3018   # alpine: we pin in a separate lock step

override:
  error:
    - DL3008
    - DL3025
    - DL3006

trustedRegistries:
  - docker.io
  - ghcr.io
  - gcr.io

CI integration

Run hadolint as the first job—fail the pipeline before any build consumes cache or registry quota.

# Local
docker run --rm -i hadolint/hadolint < Dockerfile

# GitHub Actions
- uses: hadolint/hadolint-action@v3.1.0
  with:
    dockerfile: Dockerfile
    config: .hadolint.yaml

# GitLab CI
hadolint:
  image: hadolint/hadolint:latest-debian
  script:
    - hadolint --config .hadolint.yaml Dockerfile

📦 Real World

Teams often set failure-threshold: warning during adoption, then tighten to error once existing Dockerfiles are cleaned up. Override DL3008/DL3025 to error immediately—they directly affect reproducibility and signal handling in production.

Image promotion pattern

The cardinal rule of container CD: build once, promote many. CI produces a single image tagged with the git SHA. Staging and production retag that exact digest— they never run docker build again. What ran in staging is byte-identical to prod.

Why rebuild in prod is an anti-pattern

Rebuild per env	Build once + promote
Different layer hashes per environment	Same digest everywhere
Staging test ≠ production artifact	Staging validates the exact prod binary
Cannot trace prod image to CI run	SHA tag maps 1:1 to git commit + CI job
Re-scan/re-sign on every env deploy	Scan and sign once at publish

Build once with SHA tag

On merge to main, CI pushes ghcr.io/my-org/api:sha-7f3a2b1 (and optionally records the digest). This is the golden artifact—all downstream deploys reference it.

  git push main
       │
       ▼
  ┌─────────────┐     tag: sha-7f3a2b1
  │  CI build   │ ──► ghcr.io/my-org/api:sha-7f3a2b1  (digest abc123…)
  └─────────────┘
       │
       │  promote (retag same digest, no rebuild)
       ▼
  ┌─────────────┐     tag: staging-7f3a2b1  or  staging (pointer)
  │   Staging   │ ──► deploy sha-7f3a2b1
  └─────────────┘
       │  smoke / integration tests pass
       ▼
  ┌─────────────┐     tag: prod-7f3a2b1  or  v1.4.2
  │ Production  │ ──► deploy sha-7f3a2b1  (SAME digest abc123…)
  └─────────────┘

Retag staging → prod

Promotion is a registry metadata operation—copy manifest from SHA tag to environment tag. No compiler runs; no apt-get update executes; no supply-chain surface is reintroduced.

# Promote by retagging (same digest, new pointer)
SOURCE=ghcr.io/my-org/api:sha-7f3a2b1
TARGET=ghcr.io/my-org/api:prod-7f3a2b1

docker buildx imagetools create -t $TARGET $SOURCE

# Or with crane (no local docker daemon)
crane copy ghcr.io/my-org/api:sha-7f3a2b1 ghcr.io/my-org/api:v1.4.2

# Deploy job only updates the image reference — never builds
# kubectl set image deploy/api api=ghcr.io/my-org/api:sha-7f3a2b1

GitHub Actions promotion job

promote-prod:
  runs-on: ubuntu-latest
  needs: [deploy-staging]
  environment: production
  steps:
    - uses: docker/login-action@v3
      with:
        registry: ghcr.io
        username: ${{ github.actor }}
        password: ${{ secrets.GITHUB_TOKEN }}
    - name: Promote SHA image to prod tag
      run: |
        docker buildx imagetools create \
          -t ghcr.io/${{ github.repository }}:prod-${{ github.sha }} \
          -t ghcr.io/${{ github.repository }}:v${{ github.ref_name }} \
          ghcr.io/${{ github.repository }}:sha-${{ github.sha }}

⚠️ Pitfall

Using mutable tags like :latest or :prod as the only deploy reference makes rollbacks ambiguous—you cannot know which SHA is running. Use mutable tags as convenience pointers, but pin deploy manifests to SHA or digest.

📦 Real World

Mature teams gate promotion on staging health checks + manual approval (GitHub environment protection rules). The promote job is a one-liner imagetools copy—no Dockerfile, no build-args, no "prod-only" config baked at build time. Environment-specific config belongs in runtime (env vars, ConfigMaps, secrets), not in a second image build.