The Heart Rate Monitor: Advanced top and htop
·TechSoftware Development

The Heart Rate Monitor: Advanced top and htop

Master the most common tools in the Linux world. Go beyond basic sorting. Learn to use 'top' in batch mode, color-code your 'htop' view, and understand the difference between VIRT, RES, and SHR memory. Identify which process is truly killing your server.

Mastering top and htop: Reading the Vitals

When you type top, you are opening the dashboard of your server's soul. But most people only look at the %CPU column. If that's all you do, you are missing 90% of the information you need to fix a system.

In this lesson, we will move from "Amateur" to "Expert" at reading process lists. We will explore the modern, colorful htop and learn how to extract data from the classic top.


1. top vs. htop: Which should you use?

  • top: Pre-installed on every Linux system. Use it when you are on a strange server or when you need to "Batch" data into a script.
  • htop: Visual, colorful, and interactive. Use it as your primary dashboard for daily monitoring. It shows every CPU core individually and allows you to "Kill" processes with a single key.

2. Decoding the Memory Columns

This is the #1 area of confusion in Linux performance.

  • VIRT (Virtual): How much memory the app thinks it has. This includes stuff on the disk, shared libraries, and empty space. Ignore this number.
  • RES (Resident): How much memory the app is actually holding in your physical RAM chips right now. This is the important number.
  • SHR (Shared): Memory that is shared with other apps (like the system clock or standard libraries).

3. Advanced top Shortcuts (The Secret Menu)

While top is running, press these keys:

  • M: Sort by Memory usage (RES).
  • P: Sort by CPU usage.
  • 1: Show/Hide individual CPU core usage.
  • c: Show the full command path (to see which script is running).
  • k: Kill a process by typing its PID.

4. htop: Visual Troubleshooting

htop allows you to see the "Tree" of processes. If you see 50 php-fpm processes, htop will show you the "Parent" process that created them.

  • F5 (Tree View): Essential for finding memory leaks in multi-threaded apps.
  • F6 (Sort): Quickly toggle between sorting by "Disk Read" or "CPU."
  • F9 (Kill): Send a signal like SIGTERM (Nice kill) or SIGKILL (Force kill).

5. Practical: The "Batch Mode" Capture

What if your server gets slow at 3 AM? You aren't there to look at top. You can tell top to run in "Batch Mode" and save its output to a file every minute.

# Run top for 5 iterations and save to a file
top -b -n 5 > top_report.txt

6. Identifying "Zombie" and "Sleeping" Processes

In the header of top, you see the number of processes.

  • Running (R): Actively using the CPU.
  • Sleeping (S): Waiting for something to happen (normal).
  • Zombie (Z): A dead process that hasn't been cleaned up by its parent. If you see 1,000 Zombies, your server has a "Memory Leak" in its management code.

7. Example: A Top-Process Logger (Python)

If your server becomes stressed, you want to know which process was the culprit. Here is a Python script that takes a "Snapshot" of the top 3 offenders and identifies them by name and memory usage.

import psutil

def get_top_offenders():
    """
    Finds the 3 processes using the most RAM.
    """
    print("--- Top 3 Memory Consumers ---")
    
    processes = []
    for proc in psutil.process_iter(['pid', 'name', 'memory_info']):
        processes.append(proc.info)
        
    # Sort by RSS (Resident Set Size - actual RAM)
    top_3 = sorted(processes, key=lambda x: x['memory_info'].rss, reverse=True)[:3]
    
    for p in top_3:
        # Convert bytes to MB
        ram_mb = p['memory_info'].rss / (1024 * 1024)
        print(f"PID: {p['pid']:5} | NAME: {p['name']:15} | RAM: {ram_mb:8.2f} MB")

if __name__ == "__main__":
    get_top_offenders()

8. Professional Tip: Check 'ni' (Nice)

In the top header, look for ni. This represents "Nice" processes—tasks that you've told the system are "Low Priority." If your server is slow, but ni is high, it means the server is doing background work (like backups) but will stop the moment a real user needs the CPU. Don't panic when you see 100% CPU usage if it's all in the ni category!


9. Summary

Monitoring is about knowing which number to ignore.

  • RES is your real memory usage.
  • htop is the best interactive tool; top -b is the best automation tool.
  • Tree View helps identify parent/child issues.
  • Load Average looks at the line; %CPU looks at the desk.
  • Zombies indicate a programming bug in a parent service.

In the next lesson, we will look at the hidden bottleneck: iotop, iostat, and the Disk Killers.

Quiz Questions

  1. Why is the "VIRT" number in top often much larger than the actual RAM in the server?
  2. What is the difference between a SIGTERM (15) and a SIGKILL (9)?
  3. How do you find the "Full Path" of a command in top?

Continue to Lesson 3: Disk Performance—Finding the Disk Killers with iotop.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn