Metrics & Prometheus

Observability explained introduced metrics—here you implement them with Prometheus: expose a /metrics endpoint, configure scrapes, write PromQL for RED (rate, errors, duration), add recording rules, and build Grafana dashboards that on-call actually opens.

Prerequisites: a service you can run locally (or in Kubernetes) and basic HTTP familiarity.

After reading, you should be able to:

Instrument an HTTP API with counters and histograms.
Configure Prometheus static and Kubernetes service discovery scrapes.
Write PromQL for request rate, error ratio, and latency percentiles.
Add recording rules and a symptom-based alert.
Sketch a minimal RED Grafana dashboard.

Application metrics endpoint scraped by Prometheus, visualized in Grafana, alerts via Alertmanager. — Pull-based scraping: Prometheus polls your app; Grafana queries Prometheus; Alertmanager routes firing rules.

Step 1 — Naming metrics (RED-friendly)

RED	Metric type	Name example
Rate	Counter	`http_requests_total`
Errors	Counter (label `status`)	same series, filter `status=~"5.."`
Duration	Histogram	`http_request_duration_seconds`

Use labels sparingly: method, route (template path like /users/:id), status—not raw URLs with IDs.

Step 2 — Instrument the app

npm install prom-client express

const express = require("express");
const client = require("prom-client");

const register = new client.Registry();
client.collectDefaultMetrics({ register });

const httpRequests = new client.Counter({
  name: "http_requests_total",
  help: "Total HTTP requests",
  labelNames: ["method", "route", "status"],
  registers: [register],
});

const httpDuration = new client.Histogram({
  name: "http_request_duration_seconds",
  help: "HTTP latency",
  labelNames: ["method", "route", "status"],
  buckets: [0.005, 0.01, 0.05, 0.1, 0.5, 1, 2],
  registers: [register],
});

const app = express();

app.use((req, res, next) => {
  const end = httpDuration.startTimer({ method: req.method, route: req.path });
  res.on("finish", () => {
    const labels = { method: req.method, route: req.path, status: String(res.statusCode) };
    httpRequests.inc(labels);
    end(labels);
  });
  next();
});

app.get("/health", (_, res) => res.json({ ok: true }));
app.get("/metrics", async (_, res) => {
  res.set("Content-Type", register.contentType);
  res.end(await register.metrics());
});

app.listen(8080, () => console.log("listening :8080"));

pip install prometheus-client flask

from flask import Flask, request
from prometheus_client import Counter, Histogram, generate_latest, CONTENT_TYPE_LATEST

app = Flask(__name__)

REQUESTS = Counter(
    "http_requests_total",
    "HTTP requests",
    ["method", "route", "status"],
)
DURATION = Histogram(
    "http_request_duration_seconds",
    "HTTP latency",
    ["method", "route", "status"],
    buckets=(0.005, 0.01, 0.05, 0.1, 0.5, 1, 2),
)

@app.before_request
def _start():
    request._start = DURATION.labels(
        method=request.method, route=request.path, status="200"
    ).time()

@app.after_request
def _observe(resp):
    REQUESTS.labels(
        method=request.method, route=request.path, status=str(resp.status_code)
    ).inc()
    return resp

@app.get("/metrics")
def metrics():
    return generate_latest(), 200, {"Content-Type": CONTENT_TYPE_LATEST}

app.run(host="0.0.0.0", port=8080)

curl -s localhost:8080/metrics | head
# HELP http_requests_total Total HTTP requests
# TYPE http_requests_total counter

Step 3 — Local Prometheus (Docker)

prometheus.yml

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: api
    static_configs:
      - targets: ["host.docker.internal:8080"]   # Mac/Win; Linux: host IP

docker run -d --name prometheus -p 9090:9090 \
  -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml \
  prom/prometheus:v2.51.0

open http://localhost:9090/targets   # State should be UP

Step 4 — Scrape on Kubernetes

Pod annotations (works with prometheus.io/* convention many operators use):

metadata:
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8080"
    prometheus.io/path: "/metrics"

Or a ServiceMonitor (Prometheus Operator):

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: api
  labels:
    release: kube-prometheus-stack
spec:
  selector:
    matchLabels:
      app: api
  endpoints:
    - port: http
      path: /metrics
      interval: 15s

Step 5 — PromQL for RED

Open Prometheus → Graph and try these (adjust label names to match your app):

# Request rate (per second, 5m window)
sum(rate(http_requests_total[5m])) by (route)

# Error ratio (5xx / all)
sum(rate(http_requests_total{status=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m]))

# p95 latency
histogram_quantile(0.95,
  sum(rate(http_request_duration_seconds_bucket[5m])) by (le, route)
)

Histograms: Always use _bucket + histogram_quantile for percentiles—never average buckets by hand.

Step 6 — Recording rules (precompute expensive queries)

rules/api-red.yml

groups:
  - name: api_red
    interval: 30s
    rules:
      - record: job:http_requests:rate5m
        expr: sum(rate(http_requests_total[5m])) by (route)
      - record: job:http_errors:ratio5m
        expr: |
          sum(rate(http_requests_total{status=~"5.."}[5m]))
          /
          sum(rate(http_requests_total[5m]))
      - record: job:http_latency:p95_5m
        expr: |
          histogram_quantile(0.95,
            sum(rate(http_request_duration_seconds_bucket[5m])) by (le, route)
          )

Mount rules in prometheus.yml:

rule_files:
  - /etc/prometheus/rules/*.yml

Grafana panels can query job:http_latency:p95_5m—faster dashboards, consistent alert math.

Step 7 — Alert rule (symptom, not CPU)

groups:
  - name: api_alerts
    rules:
      - alert: HighErrorRate
        expr: job:http_errors:ratio5m > 0.05
        for: 5m
        labels:
          severity: page
        annotations:
          summary: "API 5xx ratio above 5% for 5m"
          runbook: "https://wiki.example/runbooks/api-5xx"

Route via Alertmanager to Slack/PagerDuty—tune threshold from SLO error budget in the SLOs & on-call guide.

Step 8 — Grafana RED dashboard (panels)

Add Prometheus data source → create dashboard with three rows:

Panel	Query	Visualization
Traffic	`sum(rate(http_requests_total[5m]))`	Time series
Errors	`job:http_errors:ratio5m`	Stat or gauge %
Latency p95	`job:http_latency:p95_5m`	Time series per route

Template variable route from label values—filter all panels. Link dashboard URL in alert annotations.

Step 9 — kube-prometheus-stack (cluster-wide)

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install monitoring prometheus-community/kube-prometheus-stack \
  -n monitoring --create-namespace

Ships Prometheus, Alertmanager, Grafana, node-exporter, and kube-state-metrics—add your ServiceMonitor to the same namespace selector the chart expects (release: monitoring label varies by install).

Step 10 — Troubleshooting

Symptom	Fix
Target DOWN	Network policy, wrong port, app not on `0.0.0.0`
`connection refused` from Docker	Use `host.docker.internal` or host network on Linux
Empty graphs	Generate traffic; check time range; verify metric names in /metrics
`histogram_quantile` NaN	No buckets scraped yet; increase traffic or wait 5m
Cardinality explosion	Stop labeling with user IDs; use bounded `route` templates

Step 11 — Anti-patterns

High-cardinality labels on counters (email, session id).
Alerting on up == 0 without excluding planned deploys (use burn-rate SLO alerts later).
Duplicating metric names per environment in the name instead of a env label.
Grafana dashboards nobody owns—stale queries after renames.

Interview phrase: “We expose Prometheus histograms for latency, counters for RED, scrape with ServiceMonitors in K8s, precompute p95 with recording rules, and page on error-ratio alerts tied to a dashboard and runbook—not raw CPU.”

The one line to remember

Instrument RED → scrape reliably → PromQL + recording rules → Grafana for humans, Alertmanager for symptoms.