Observability

Know what your systems are doing in production—metrics for aggregates, logs for events, traces for request paths, and SLOs so alerts mean customer pain, not noise. Connects to Kubernetes day-2 and CI/CD gates. Back to DevOps.

Guides

Five guides from observability concepts through Prometheus, logs, tracing, and SLO-based on-call.

Guide
Observability explained

Metrics, logs, traces, golden signals, SLIs and SLOs, error budgets, and how observability fits CI/CD and Kubernetes.
Guide
Metrics & Prometheus

Instrument apps, scrape configs, PromQL, recording rules, and Grafana dashboards for RED metrics.
Guide
Logs & centralized logging

Structured logging, aggregation, retention, correlation with traces, and Kubernetes log collection.
Guide
Distributed tracing

OpenTelemetry SDK, context propagation, Jaeger or Tempo, and tracing HTTP services in K8s.
Guide
SLOs, alerting & on-call

Alert design, burn-rate alerts, runbooks, paging policy, and tying SLOs to deployment gates.

Observability explained

Metrics & Prometheus

Logs & centralized logging

Distributed tracing

SLOs, alerting & on-call