Module 13 Lesson 5: Monitoring Performance Metrics
·AI & LLMs

Module 13 Lesson 5: Monitoring Performance Metrics

Visualizing the health of your cluster. Using Prometheus and Grafana to track tokens-per-second and VRAM usage.

Monitoring: The AI Dashboard

If you are running a scaled AI system for a team, you need more than just a terminal. You need to see "Live" graphs of your hardware health and your token economy.

1. Key Metrics to Track

  • Tokens Per Second (t/s): The most important metric for user satisfaction.
  • VRAM Utilization: Are you close to a crash?
  • Queue Length: How many people are currently waiting for an answer?
  • Model Distribution: Which models are being used the most (Llama vs CodeLlama)?

2. Using the /api/ps Endpoint

Ollama's hidden /api/ps endpoint provides a JSON snapshot of what is running.

  • Which models are currently in RAM?
  • How much VRAM is each model using?
  • When does the keep_alive timer expire?

You can write a simple Python script to poll this every 5 seconds and send it to a database.


3. The Prometheus + Grafana Stack

Professional AI engineers use Prometheus to collect data and Grafana to show it.

  • Ollama Exporter: There are community tools on GitHub (like ollama-exporter) that connect Prometheus directly to Ollama.
  • Visuals: You can build a dashboard that shows a "Big Green Number" for your current cluster speed.

4. Setting Up Alerting

Monitor should warn you BEFORE the system fails.

  • Alert: "VRAM > 95% for 1 minute."
  • Alert: "Average response time > 10 seconds."

This allows you to either clear the cache or tell your teammates: "The AI is under heavy load right now, expect delays."


Key Takeaways

  • Monitoring ensures your local AI cluster stays healthy and fast.
  • Tokens Per Second is your primary KPI (Key Performance Indicator).
  • The api/ps endpoint is the source of truth for runtime state.
  • Grafana is the best way to visualize AI performance for non-technical stakeholders.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn