
The Immortal Disk: RAID and Redundancy
Prepare for the inevitable. Hardware fails, but your data doesn't have to. Master Linux Software RAID with 'mdadm'. Learn the differences between RAID 1 (Mirroring), RAID 5 (Parity), and RAID 10. Understand how to replace a failed drive without losing a single bit.
RAID: Redundant Array of Independent Disks
Every hard drive in existence has a 100% chance of failing eventually. It is not a matter of "If," but "When." In a high-stakes server environment, you cannot afford to wait for a backup restoration. You need the server to keep running even if a drive physically explodes.
This is the job of RAID.
RAID combines multiple physical disks into a single "Super-Disk" that can survive the death of one (or more) of its members. In Linux, we use the powerful mdadm utility to create "Software RAID."
1. The RAID Levels: Choosing Your Shield
| Level | Name | Minimal Disks | Storage | Survivability |
|---|---|---|---|---|
| 0 | Striping | 2 | 100% | Zero. If one disk dies, ALL data is gone. Fast, but dangerous. |
| 1 | Mirroring | 2 | 50% | High. Disks are identical copies. One can die without downtime. |
| 5 | Parity | 3 | 66-90% | Standard. Can lose 1 disk. Uses math (parity) to rebuild data. |
| 10 | 1 + 0 | 4 | 50% | Extreme. Best of both worlds: speed of RAID 0 and safety of RAID 1. |
2. Practical: Creating a RAID 1 Mirror
Let's imagine we have two identical 1TB drives: /dev/sdb and /dev/sdc.
# 1. Create the RAID device 'md0'
sudo mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sdb /dev/sdc
# 2. Watch the initial sync happen
cat /proc/mdstat
# 3. Format and use it!
sudo mkfs.ext4 /dev/md0
sudo mount /dev/md0 /mnt/raid_storage
3. The "Hot Spare": Automated Repair
A professional RAID setup includes a Hot Spare. This is a disk that sits in the server, spinning and waiting. If a drive in the RAID array fails, the Kernel immediately "Swaps" the spare in and starts rebuilding the data before you even realize there's a problem.
# Adding a hot spare to an existing array
sudo mdadm /dev/md0 --add /dev/sdd
4. Disaster Recovery: Replacing a Failed Drive
If a disk fails, it will be marked as "DEGRADED."
- Remove the bad disk:
sudo mdadm /dev/md0 --remove /dev/sdb - Physically swap the drive.
- Add the new disk:
sudo mdadm /dev/md0 --add /dev/sdb - The rebuild starts automatically.
5. Summary: Checking the Health
The raw "Truth" about your RAID is always found in the kernel's virtual file:
cat /proc/mdstat
If you see [UU], you are healthy. If you see [_U], one disk is missing!
6. Example: A RAID Health Notifier (Python)
You should never have to manually check /proc/mdstat. Here is a Python script that monitors the RAID status and sends a notification (or logs a critical error) if an array becomes degraded.
import time
def monitor_raid():
"""
Parses /proc/mdstat for the '[_U]' symbol indicating a failure.
"""
print("--- RAID Watchdog Active ---")
with open("/proc/mdstat", "r") as f:
content = f.read()
if "_" in content:
print("[!!!] CRITICAL: RAID ARRAY DEGRADED!")
print(" Check 'mdadm --detail /dev/md0' immediately.")
else:
print("[OK] All RAID arrays are healthy [UU].")
if __name__ == "__main__":
monitor_raid()
7. Professional Tip: Hardware vs. Software RAID
Many servers come with a "RAID Card" (Hardware RAID). While it is slightly faster, Software RAID (mdadm) is often preferred by modern sysadmins. Why? Because if the RAID card burns out, you need to find an identical, expensive card to get your data back. With Software RAID, you can move the disks to any Linux machine and they will work immediately.
8. Summary
RAID is your insurance policy against hardware failure.
- RAID 1 for absolute reliability on small disks.
- RAID 5/6 for efficient storage on large arrays.
mdadmis the master tool for management.- Hot Spares provide peace of mind.
/proc/mdstatis your diagnostic window.
In the final lesson of this module, we will learn how to predict a failure before it happens using SMART Monitoring and Disk Health.
Quiz Questions
- If you have four 2TB drives and put them in a RAID 10 array, how much usable space do you have?
- What happens if two disks fail at the same time in a RAID 5 array?
- Why is it important to save your RAID configuration once it is created?
Continue to Lesson 6: Disk Health—SMART Monitoring and Benchmarking.