The Quiet Bottleneck: iotop and iostat
·TechSoftware Development

The Quiet Bottleneck: iotop and iostat

Solve the mysterious slowness. Master the tools for disk performance analysis. Learn to use 'iostat' to measure disk throughput and 'iotop' to identify exactly which process is 'Thrashing' your hard drive. Understand why 100% disk usage can break a 16-core CPU.

Disk I/O: Finding the Silent Killer

Have you ever looked at top and seen that your CPU is 98% Idle, but the server still feels like it's stuck in thick mud? You click a button, and it takes 10 seconds to respond.

This is almost always a Disk I/O Bottleneck.

The CPU is fast, but the Hard Drive (especially a traditional spinning HDD) is slow. If a database is trying to read 1 million records while a backup script is trying to write 10GB of data, the CPU just "Wait" for the mechanical disk to finish. This is called %iowait. In this lesson, we will learn to find the specific processes "Thrashing" your disks.


1. iostat: Seeing the Big Picture

iostat tells you what your physical disks are doing.

# -x: Extended stats, -z: Skip idle disks, 1: Refresh every 1 second
sudo iostat -xz 1

The Three Columns that Matter:

  1. %util: The most important. If this is 100%, your disk is working as fast as it physically can. Any new request will have to wait in line.
  2. await: The average time (in milliseconds) for a disk request to be completed. If await is > 10ms on an SSD or > 50ms on an HDD, you have a major problem.
  3. rkB/s and wkB/s: How many KiloBytes are being Read or Written per second.

2. iotop: Finding the Culprit

iostat tells you that the disk is busy. iotop tells you who is making it busy. It looks just like top, but it sorts processes by their disk usage.

# -o: Only show active processes, -P: Show processes (not threads)
sudo iotop -oP

Common Disk Killers:

  • Database Indexing: Look for mysqld or postgres.
  • System Updates: Look for apt, dnf, or dpkg.
  • Journal/Logs: Look for journald or rsyslog.
  • Finding Files: Look for updatedb or find.

3. The iowait Trap

When you see %wa in the top header, it means your CPU is Idle. It wants to work, but it can't. If you add more CPU cores to a server with high iowait, it won't get any faster! You need a faster disk (SSD/NVMe) or you need to reduce the disk-writing in your application.


4. Practical: Limiting the Damage with ionice

If you have a background task (like a backup) that is slowing down your website, you can tell the Linux kernel: "Only let this task use the disk if nothing else wants to." This is called ionice.

# Run a backup script with 'Idle' disk priority
# Class 3: Idle priority
sudo ionice -c 3 rsync -a /data /backup

5. Identifying "Block Device" Congestion

Linux uses a "Queue" (Scheduler) to manage disk requests. You can see how many items are currently in the queue.

# Look at the 'avgqu-sz' column in iostat. 
# Anything > 2.0 usually means your disk is "thrashing".
sudo iostat -x 1

6. Example: An I/O Hog Alert (Python)

If a process starts using more than 10MB/s of disk for more than a minute, you should be alerted. Here is a Python script that tracks per-process I/O.

import psutil
import time

def track_io_hogs():
    """
    Finds the process writing the most data to disk.
    """
    print("--- Disk I/O Watchdog ---")
    
    # Get current I/O stats for all processes
    initial_io = {p.pid: p.io_counters() for p in psutil.process_iter() if hasattr(p, 'io_counters')}
    
    time.sleep(2) # Wait 2 seconds to calculate the 'Rate'
    
    print(f"{'PID':<7} | {'NAME':<15} | {'WRITE RATE'}")
    
    for p in psutil.process_iter():
        try:
            if p.pid in initial_io:
                new_io = p.io_counters()
                # Delta = New Write Bytes - Old Write Bytes
                write_delta = (new_io.write_bytes - initial_io[p.pid].write_bytes) / 2
                write_kb = write_delta / 1024
                
                if write_kb > 100: # Show only if > 100KB/s
                    print(f"{p.pid:<7} | {p.name():<15} | {write_kb:.2f} KB/s")
        except (psutil.NoSuchProcess, psutil.AccessDenied):
            continue

if __name__ == "__main__":
    track_io_hogs()

7. Professional Tip: Check 'Dirty Pages'

When you write a file in Linux, it isn't always written to the disk immediately. It is stored in RAM first (Dirty Pages) and then written in a "Batch." If you have too many "Dirty Pages," the system will suddenly "Freeze" while it flushes them all to the disk at once. You can tune this behavior in sysctl.


8. Summary

Disk I/O is the "Friction" of your system.

  • %util at 100% means your hardware is the bottleneck.
  • await tells you the user experience (in milliseconds).
  • iotop identifies the specific "Hog."
  • ionice allows you to preserve performance for important tasks.
  • iowait in top is a signal to stop looking at the CPU and start looking at the disk.

In the next lesson, we will move to the invisible wires: Network Optimization and TCP Tuning.

Quiz Questions

  1. Why does high "Disk Wait" make the system feel slow even if CPU usage is low?
  2. What is the difference between iostat and iotop?
  3. How can you use ionice to prevent a system backup from slowing down a live database?

Continue to Lesson 4: Network Optimization—TCP Tuning and Buffers.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn