Logs & centralized logging

kubectl logs is enough for one pod—for production you need every replica in one searchable place. This guide implements structured JSON logs, ships them with Promtail → Loki, queries with LogQL, and ties lines to trace_id so a Prometheus alert becomes a filtered log view in seconds.

Prerequisites: Observability explained and a service that logs to stdout (container-friendly).

After reading, you should be able to:

App JSON logs to container stdout, Promtail ships to Loki, Grafana LogQL queries by trace_id.
Apps write to stdout; the platform ships and indexes—engineers query in Grafana, not per-pod SSH.

Step 1 — Structured log schema

Pick a small, consistent set of fields. Every line should parse as JSON:

{
  "timestamp": "2026-06-05T14:22:01.123Z",
  "level": "info",
  "service": "checkout-api",
  "msg": "payment captured",
  "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
  "span_id": "00f067aa0ba902b7",
  "order_id": "ord_9182",
  "duration_ms": 142,
  "http_status": 200
}
FieldPurpose
levelFilter errors in LogQL (| json | level="error")
serviceIdentify which deployment (Loki label)
trace_idJump from alert → all services in one request
msgHuman-readable event name (stable, not free prose novels)

Step 2 — Instrument the app (JSON to stdout)

npm install pino pino-http
const pino = require("pino");
const pinoHttp = require("pino-http");
const express = require("express");

const logger = pino({
  level: process.env.LOG_LEVEL || "info",
  base: { service: "checkout-api" },
  timestamp: pino.stdTimeFunctions.isoTime,
});

const app = express();
app.use(
  pinoHttp({
    logger,
    genReqId: (req) => req.headers["x-trace-id"] || crypto.randomUUID(),
    customProps: (req) => ({ trace_id: req.id }),
  })
);

app.get("/health", (req, res) => {
  req.log.info({ order_id: "ord_demo" }, "health ok");
  res.json({ ok: true });
});

app.listen(8080);

Log to stdout only in containers—let the platform ship. Never rotate files inside the image.

Step 3 — What not to log

Redact in middleware: authorization header → [REDACTED].

Step 4 — Local stack: Loki + Promtail + Grafana

docker-compose.logging.yml

services:
  loki:
    image: grafana/loki:2.9.6
    ports: ["3100:3100"]
    command: -config.file=/etc/loki/local-config.yaml

  promtail:
    image: grafana/promtail:2.9.6
    volumes:
      - ./promtail-config.yml:/etc/promtail/config.yml
      - /var/run/docker.sock:/var/run/docker.sock
    command: -config.file=/etc/promtail/config.yml

  grafana:
    image: grafana/grafana:10.4.2
    ports: ["3000:3000"]
    environment:
      GF_SECURITY_ADMIN_PASSWORD: admin

promtail-config.yml (Docker SD):

server:
  http_listen_port: 9080

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  - job_name: docker
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
    relabel_configs:
      - source_labels: ["__meta_docker_container_name"]
        target_label: container
      - source_labels: ["__meta_docker_container_log_stream"]
        target_label: stream
docker compose -f docker-compose.logging.yml up -d
# Grafana http://localhost:3000 — add Loki data source http://loki:3100

Step 5 — LogQL queries (Grafana Explore)

{container=~".*checkout.*"} |= "error"
{service="checkout-api"} | json | level="error" | line_format "{{.msg}} order={{.order_id}}"
{service="checkout-api"} | json | trace_id="4bf92f3577b34da6a3ce929d0e0e4736"
sum(rate({service="checkout-api"} | json | level="error" [5m]))

Last query turns logs into a metric in Loki—useful for dashboards when you lack app-exposed counters yet.

Step 6 — Kubernetes log collection

Pods log to stdout/stderr; kubelet writes files under /var/log/pods. Promtail runs as a DaemonSet on each node:

# promtail-values.yaml (Helm grafana/promtail)
config:
  clients:
    - url: http://loki-gateway.monitoring.svc.cluster.local/loki/api/v1/push
  snippets:
    scrapeConfigs: |
      - job_name: kubernetes-pods
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_namespace]
            target_label: namespace
          - source_labels: [__meta_kubernetes_pod_name]
            target_label: pod
          - source_labels: [__meta_kubernetes_pod_container_name]
            target_label: container
          - source_labels: [__meta_kubernetes_pod_label_app]
            target_label: app
helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki-stack -n monitoring --set promtail.enabled=true

Match labels to how you query—app from pod labels is the usual filter.

6.1 — kubectl when centralized is down

kubectl logs deploy/checkout-api -n prod --tail=200
kubectl logs checkout-api-7f8b9c-xyz -c api --previous   # crashed container
kubectl logs -l app=checkout-api -n prod --since=10m | grep trace_id

From K8s debugging—keep these commands in the runbook even with Loki.

Step 7 — Correlate with metrics and traces

  1. Prometheus alert includes service and time range.
  2. Grafana dashboard link: {app="checkout-api"} | json | level="error" with time shift.
  3. Copy trace_id from a log line → trace backend (next guide) or search all services: {namespace="prod"} | json | trace_id="...".

Pass incoming traceparent or X-Trace-Id from ingress through every downstream call header.

Step 8 — Retention and cost

TierTypical retentionNotes
Hot (Loki default)7–30 daysFast queries, label indexes only
Warm / object storage90 daysS3-backed chunks, slower queries
Compliance archive1–7 yearsCheap storage, separate pipeline—avoid querying in incident path
# loki config fragment
limits_config:
  retention_period: 720h   # 30d
table_manager:
  retention_deletes_enabled: true

High-volume debug logs belong behind a feature flag—not in default retention.

Step 9 — Log-based alerts (sparingly)

# Loki ruler alert — use when no metric exists yet
groups:
  - name: logs
    rules:
      - alert: CheckoutErrorBurst
        expr: |
          sum(rate({app="checkout-api"} | json | level="error" [5m])) > 1
        for: 5m
        annotations:
          summary: "checkout error log rate high"

Prefer metric alerts from app counters when possible—log parsing is heavier and brittle to format changes.

Step 10 — Troubleshooting

SymptomFix
No logs in LokiPromtail targets DOWN; wrong client URL; clock skew
json parser errorsApp printed non-JSON lines (stack traces)—use | pattern or fix logger
Query timeoutTime range too wide; add label filters first
Missing podsRBAC for promtail ServiceAccount; path mounts on containerd
Duplicate timestampsBatching delay—normal; sort in UI

Step 11 — Anti-patterns

Interview phrase: “Apps emit JSON to stdout with trace_id; Promtail labels by K8s metadata ships to Loki; we query LogQL in Grafana and keep 30-day hot retention—incidents start from metrics, drill to logs by trace_id, then traces.”

The one line to remember

Structured logs on stdout, centralized by the platform, queryable by labels and trace_id—not by SSHing to a pod.