Monitoring: The AI Dashboard

If you are running a scaled AI system for a team, you need more than just a terminal. You need to see "Live" graphs of your hardware health and your token economy.

1. Key Metrics to Track

Tokens Per Second (t/s): The most important metric for user satisfaction.
VRAM Utilization: Are you close to a crash?
Queue Length: How many people are currently waiting for an answer?
Model Distribution: Which models are being used the most (Llama vs CodeLlama)?

2. Using the `/api/ps` Endpoint

Ollama's hidden /api/ps endpoint provides a JSON snapshot of what is running.

Which models are currently in RAM?
How much VRAM is each model using?
When does the keep_alive timer expire?

You can write a simple Python script to poll this every 5 seconds and send it to a database.

3. The Prometheus + Grafana Stack

Professional AI engineers use Prometheus to collect data and Grafana to show it.

Ollama Exporter: There are community tools on GitHub (like ollama-exporter) that connect Prometheus directly to Ollama.
Visuals: You can build a dashboard that shows a "Big Green Number" for your current cluster speed.

4. Setting Up Alerting

Monitor should warn you BEFORE the system fails.

Alert: "VRAM > 95% for 1 minute."
Alert: "Average response time > 10 seconds."

This allows you to either clear the cache or tell your teammates: "The AI is under heavy load right now, expect delays."

Key Takeaways

Monitoring ensures your local AI cluster stays healthy and fast.
Tokens Per Second is your primary KPI (Key Performance Indicator).
The api/ps endpoint is the source of truth for runtime state.
Grafana is the best way to visualize AI performance for non-technical stakeholders.

Module 13 Lesson 5: Monitoring Performance Metrics

Monitoring: The AI Dashboard

1. Key Metrics to Track

2. Using the `/api/ps` Endpoint

3. The Prometheus + Grafana Stack

4. Setting Up Alerting

Key Takeaways

Subscribe to our newsletter

Monitoring: The AI Dashboard

1. Key Metrics to Track

2. Using the /api/ps Endpoint

3. The Prometheus + Grafana Stack

4. Setting Up Alerting

Key Takeaways

Subscribe to our newsletter

2. Using the `/api/ps` Endpoint