Monitoring the Vital Signs: CPU and Memory
·TechSoftware Development

Monitoring the Vital Signs: CPU and Memory

Is your server dying? Learn to read the real-time performance of your Linux system. Master 'top' and 'htop' to find resource-hungry processes, and decode the 'free' command to understand why Linux uses all your RAM for Cache.

CPU and Memory: Monitoring the Pulse of your Server

Processes in Linux are like athletes—they need "Oxygen" (CPU cycles) and "Space" (RAM) to perform. When your system feels slow or a website stops responding, it's usually because one of these athletes is exhausted or taking up too much room.

As an administrator, you need to be able to look at the system's "Heart Rate Monitor" and immediately identify the bottleneck. Is the CPU working too hard? Is the system "Swapping" because it ran out of memory?

In this lesson, we will master the real-time monitoring tools: top, htop, and free.


1. top: The Default Monitor

top is included on every Linux system. It provides a real-time, interactive view of the system's performance.

Decoding the Header:

  1. Load Average: Three numbers representing the average system load over the last 1, 5, and 15 minutes. (A load of "1.0" on a 1-core CPU means it is 100% busy).
  2. Tasks: Total number of processes.
  3. %Cpu(s):
    • us: User time (your apps).
    • sy: System time (the kernel).
    • id: Idle time (free capacity).
    • wa: I/O Wait (CPU is waiting for the hard drive). High 'wa' means your disk is too slow!

2. htop: The Professional's Choice

If top is a basic monitor, htop is an OLED dashboard. It supports color, mouse clicks, and horizontal scrolling. (You usually need to install it: sudo apt install htop).

Why htop is better:

  • Visual Bars: Instantly see CPU usage per core.
  • Process Killing: You can search for a process and kill it directly without typing its PID.
  • Tree View: See which process started which sub-process (F5).

3. free: Understanding Your RAM

The free command shows you how much memory is used and available.

free -h # Use -h for Human-readable MB/GB

The "RAM Mystery": Used vs. Available

You might see that your system has 16GB of RAM, 14GB is "Used," and only 500MB is "Free." Don't panic!

Linux is very smart. It uses spare RAM to "Cache" frequently used files from the disk because RAM is 1,000x faster than an SSD.

  • free: Memory that is literally doing nothing.
  • buff/cache: Memory used to speed up your system (can be reclaimed instantly if an app needs it).
  • available: The "True" number. It is free + cache that could be reclaimed.

4. Load Average: The "Bridge" Analogy

How do you know if a Load Average of 4.0 is bad? Imagine a bridge with 4 lanes.

  • Load 2.0: Half the lanes are full. Everything is smooth.
  • Load 4.0: The bridge is exactly full. No traffic jams, but no room for more.
  • Load 10.0: There are 4 cars on the bridge and 6 cars waiting in line. This is a bottleneck.
# Get the load average quickly without entering 'top'
uptime

5. Practical: Finding the "Memory Hog"

When your system is slow, run this command to find the top 5 memory-consuming processes:

# M sorts by memory usage in 'top'
top -b -o +%MEM | head -n 12

6. Example: A Performance Threshold Alerter (Python)

If you are running a production server, you want an alert before the RAM runs out. Here is a Python script that monitors memory usage and triggers a warning if it exceeds 90%.

import shutil
import psutil # You may need to install this: pip install psutil
import os

def check_system_vitals():
    """
    Checks CPU and Memory usage.
    """
    # 1. CPU Check
    cpu_usage = psutil.cpu_percent(interval=1)
    
    # 2. Memory Check
    mem = psutil.virtual_memory()
    mem_percent = mem.percent
    
    print(f"Current Vital Signs:")
    print(f"  CPU Usage: {cpu_usage}%")
    print(f"  RAM Usage: {mem_percent}% ({mem.available / (1024**3):.2f} GB available)")
    
    # Alert Logic
    if mem_percent > 90:
        print("\n[!!! DANGER !!!] Memory usage is critical!")
    elif mem_percent > 75:
        print("\n[!] WARNING: Memory usage is high.")
        
    if cpu_usage > 85:
        print("[!] WARNING: CPU usage is high.")

if __name__ == "__main__":
    try:
        check_system_vitals()
    except ImportError:
        print("This script requires the 'psutil' library.")
        print("Run: pip install psutil")

7. Professional Tip: Use 'nmon' for Long-term Analysis

If you need to monitor a server for an entire day to find a "hidden" performance spike, use nmon. It can record system stats to a .csv file which you can later open in Excel to see graphs of your server's health over time.


8. Summary

Performance monitoring is about distinguishing between "Healthy Stress" and "System Failure."

  • Use top for a quick check.
  • Use htop for active troubleshooting.
  • Look at the available memory, not just the free memory.
  • Monitor Load Average relative to your number of CPU cores.

In the next lesson, we will shift from temporary RAM to permanent storage as we learn to Monitor Disk Usage with df and du.

Quiz Questions

  1. What does the "wa" (I/O Wait) metric in top indicate?
  2. If a server has 8 CPU cores, is a Load Average of 6.0 cause for alarm?
  3. Why does Linux use "Free" RAM for "Cache"?

Continue to Lesson 3: Monitoring Disk Usage—df, du, and lsblk.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn