Metrics and Dashboards: Prometheus and Grafana

If Logs tell you "What" happened, Metrics tell you "How much" is happening. You don't read metrics; you watch them on a Dashboard.

In this lesson, we learn to track the "Vital Signs" of our FastAPI application using the industry-standard duo: Prometheus and Grafana.

1. The RED Method

When monitoring an API, you should always track three things (The RED Method):

R (Rate): Number of requests per second.
E (Errors): Number of failed requests (4xx and 5xx).
D (Duration): How long each request takes (Latency).

2. Exposing Metrics in FastAPI

We use the prometheus-fastapi-instrumentator library. It automatically hooks into your app and adds a /metrics endpoint that a Prometheus server can "Scrape."

from prometheus_fastapi_instrumentator import Instrumentator

app = FastAPI()

# Hook it up!
Instrumentator().instrument(app).expose(app)

Now, if you visit localhost:8000/metrics, you will see a list of counters and timers that look like this: http_request_duration_seconds_count{method="GET",path="/users"} 124

3. Prometheus: The Database for Numbers

Prometheus is a special type of database (Time-Series) that stores values over time. It doesn't store text; it stores numbers. It asks your API every 15 seconds: "How many requests did you handle since the last time I asked?"

4. Grafana: The Beautiful Visualization

Grafana connects to Prometheus and turns those raw numbers into graphs, gauges, and alerts.

Alerting: You can configure Grafana to send you a Slack message if your API's error rate goes above 5% for more than 2 minutes.

Visualizing the Monitoring Pipeline

graph LR
    A["FastAPI App"] -- "/metrics" --> B["Prometheus (Storage)"]
    B -- "Queries" --> C["Grafana (Dashboard)"]
    C -- "Alert!" --> D["PagerDuty / Slack"]

Summary

Metrics: Quantitative data about your system.
RED Method: Rate, Errors, Duration.
Prometheus: The scraper and storage engine.
Grafana: The visualization and alerting layer.

In the next lesson, we’ll look at Distributed Tracing (OpenTelemetry)—the tool that allows you to follow a single request across multiple microservices.

Exercise: The Latency Alarm

Your API normally responds in 50ms. Suddenly, it starts responding in 2000ms.

Which metric in the RED method would show this change?
Why is an "Average" latency often a bad metric to watch? (Hint: Research the concept of Tail Latency or p99).