
Prometheus and Grafana: The Observability Standard
Master the data engine of DevOps. Learn to install Prometheus, write powerful PromQL queries, and build Grafana dashboards that make your AI infrastructure transparent.
Prometheus and Grafana: The Visual Control Center
In the previous lesson, we learned about the Metrics Server. It's great for seeing what is happening right now. But what if you want to know:
- "How much memory did my AI training job use at 3:00 AM last Tuesday?"
- "Is my API latency slowly increasing over the last 30 days?"
- "Which version of my prompt resulted in the highest CPU usage?"
For these questions, you need a Time-Series Database. In the Kubernetes world, that means Prometheus. And to visualize that data, you need Grafana.
Prometheus and Grafana are the "Gold Standard" of cloud-native observability. In this lesson, we will master the Pull-based Architecture of Prometheus, learn to write PromQL queries, and build a beautiful Grafana Dashboard that gives you a "God's Eye View" of your entire AI cluster.
1. The Prometheus Architecture: Why "Pull" is Better
Traditional monitoring tools (like Nagios or Zabbix) use "Push" or "Active Probes."
Prometheus is different. It uses a Pull Model.
- Your application (FastAPI) exposes a simple text page at
/metrics. - Prometheus is configured to "Scrape" that page every 15 seconds.
- Prometheus stores those numbers in its local, ultra-compressed database.
Why Pull?
- Service Discovery: Prometheus talks to the K8s API to find new pods. You don't have to manually add every new pod to your monitoring list.
- No Agent Required: You don't need a heavy agent running inside your container. Your app just needs to output some text.
- Resilience: If an app is slow, it doesn't "Drown" the monitoring system with push requests. Prometheus just pulls whenever it's ready.
2. Installing the Prometheus Operator (Kube-Prometheus-Stack)
Installing Prometheus manually is complex. We use the Prometheus Operator, which introduces a custom resource called a ServiceMonitor.
# Using Helm (The industry standard)
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack
This one command installs:
- Prometheus: The database.
- Alertmanager: To send PagerDuty/Slack alerts.
- Grafana: The dashboard UI.
- Node Exporter: To get hardware-level metrics of the physical servers.
3. Writing Your First PromQL Query
PromQL is the query language for Prometheus. It is incredibly powerful for calculating rates and averages.
Example: Average CPU Usage over 5 minutes
rate(container_cpu_usage_seconds_total[5m])
Example: Finding Memory Leaks
sum(container_memory_usage_bytes{pod=~"ai-agent-.*"}) by (pod)
4. Grafana: Turning Numbers into Insight
Grafana is the "Face" of your monitoring. It connects to Prometheus as a data source and allows you to build panels.
The "Perfect" AI Dashboard:
- Row 1: Request Rate: How many AI queries are we handling per second?
- Row 2: Latency (P99): The 99th percentile response time. (Are users waiting too long?)
- Row 3: GPU Temp & VRAM: The health of our most expensive hardware.
- Row 4: HPA Status: A graph showing the number of pods changing over time.
5. Visualizing the Scrape Loop
graph TD
subgraph "Your Application"
App["FastAPI Pod (App A)"] -- "/metrics" --> OS["Text Output"]
end
subgraph "Monitoring Namespace"
Prom["Prometheus Server"] -- "Scrape every 15s" --> OS
Dist["Prometheus TSDB"]
Prom -- "Store" --> Dist
end
UI["Grafana Dash"] -- "PromQL Query" --> Prom
Admin["Engineer"] -- "View" --> UI
6. Practical Example: Custom Metrics in Python
To monitor your LangChain agent, you need to expose custom metrics. We use the prometheus_client library.
from prometheus_client import start_http_server, Counter, Summary
import time
# Create metrics
REQUEST_COUNT = Counter('ai_request_total', 'Total AI requests processed')
LATENCY = Summary('ai_latency_seconds', 'Time spent processing AI request')
# Start the /metrics endpoint on port 8000
start_http_server(8000)
@LATENCY.time()
def process_ai_query(query):
REQUEST_COUNT.inc()
# Your LangGraph / Bedrock logic here
time.sleep(1)
return "Done"
7. AI Implementation: Predictive Alerting
In a high-stakes AI environment, you don't want an alert after the database is full. You want an alert when the database is predicted to be full in 4 hours.
The PromQL predict_linear function:
predict_linear(node_filesystem_free_bytes[1h], 4 * 3600) < 0
This query analyzes the last hour of disk usage. It draws a straight line into the future. If that line hits zero within 4 hours, Prometheus will trigger an alert. This gives your DevOps team plenty of time to expand the volume (Module 6.3) before the app crashes.
8. Summary and Key Takeaways
- Prometheus: The time-series database that pulls metrics from your pods.
- Grafana: The visualization engine for creating real-time dashboards.
- ServiceMonitor: The K8s resource that tells Prometheus which pods to watch.
- PromQL: The calculation engine for rates, averages, and predictions.
- Observability: Moving beyond "Is it up?" to "How is it performing over time?"
In the next lesson, we will look at the other side of observability: Centralized logging with Loki.
9. SEO Metadata & Keywords
Focus Keywords: Kubernetes Prometheus Grafana tutorial, installing kube-prometheus-stack helm, PromQL query examples for K8s, Python prometheus_client FastAPI, building a Kubernetes observability dashboard, predicting disk failures PromQL.
Meta Description: Master the industry-standard observability stack for Kubernetes. Learn how to install Prometheus and Grafana, write powerful PromQL queries, and build visually stunning dashboards to monitor the performance and health of your AI microservices.