The Onion FS: Understanding OverlayFS
·TechSoftware Development

The Onion FS: Understanding OverlayFS

Peel back the layers of a container. Master 'OverlayFS' and Union Filesystems. Learn how Linux stacks multiple directories to create a single 'View'. Understand 'Copy-on-Write' and why container images are so space-efficient.

OverlayFS: The Science of Stacking

When you download a Docker image, you see it downloading 5 or 10 different "Layers." When you start the container, those layers magically merge into a single, clean filesystem (/).

How does this work?

Linux uses a specific type of filesystem called a Union Filesystem, and the modern standard is OverlayFS.

Think of it like an Onion (or transparent overhead projector slides). You have a "Bottom" layer that contains the OS, a "Middle" layer that contains your app code, and a "Top" layer where your changes are saved. The kernel looks from the top down and creates a "Unified View."


2. The Three Layers of an Overlay

  1. LowerDir (Read-Only): The base layers. These are your Docker images. They never change.
  2. UpperDir (Writeable): This is where everything you "Save" inside the container goes. It exists only for that specific container.
  3. Merged (The View): This is where you actually work. It's the "Combined" view of Lower + Upper.

3. Practical: Creating a Manual Overlay

You can create an "Onion" filesystem in Linux without any container software.

# 1. Create the folders
mkdir lower upper work merged

# 2. Add a file to the 'Base' (lower) layer
echo "Hello from Base" > lower/base.txt

# 3. Mount them as an Overlay
sudo mount -t overlay overlay -o lowerdir=lower,upperdir=upper,workdir=work merged

Now, look in the merged folder. You will see base.txt. If you create a new file test.txt in merged, it will actually be saved in the upper folder! The lower folder remains untouched.


4. Copy-on-Write (CoW) Logic

If you try to Edit a file that is in the lower layer:

  1. The kernel notices you are trying to write to a read-only layer.
  2. It quickly Copies the file from lower to upper.
  3. It performs your edit on the copy in upper.
  4. From that moment on, the "Unified View" shows the upper version, effectively "Hiding" the original.

5. Whiteouts: Deleting from a Read-Only Layer

If you can't touch the lower layer, how do you delete a file that is in it? The kernel creates a special "Marker" file in the upper directory (called a Whiteout). This marker tells the filesystem: "I know this file exists in the base layer, but treat it as if it's dead."


6. Identifying "Storage Bloat"

In Docker, if you run a container for 1 month and install 10GB of data, that 10GB is sitting in your UpperDir. If you delete the container, that data is gone (unless it's in a Volume).

You can find where these layers live on your disk:

# On most systems using Docker
ls /var/lib/docker/overlay2

7. Example: An Overlay Layer Tracker (Python)

If your disk is filling up, you need to know which container has the largest "Upper" layer (i.e., which container has modified the most files). Here is a Python script that finds the "Top Layers" on your system.

import os
import subprocess

def find_largest_layers():
    """
    Finds the largest upper directories in the Docker state.
    """
    docker_path = "/var/lib/docker/overlay2"
    print("--- Docker Layer Space Audit ---")
    
    if not os.path.exists(docker_path):
        print("Docker overlay directory not found.")
        return

    # We use 'du' to calculate the size of each sub-folder
    try:
        res = subprocess.run(["sudo", "du", "-sh", f"{docker_path}/*"], 
                             capture_output=True, text=True)
        
        # Sort and show the top 10 largest layers
        lines = res.stdout.splitlines()
        for line in sorted(lines, key=lambda x: x.split('\t')[0], reverse=True)[:10]:
            print(line)
            
    except Exception as e:
        print(f"Error checking space: {e}")

if __name__ == "__main__":
    find_largest_layers()

8. Professional Tip: Why 'Volumes' are Faster

An OverlayFS has a "Performance Tax" because every time you read a file, the kernel has to look through multiple layers to find the "Correct" version. For high-performance databases, you should never store data in the container layer. Always use a Volume, which maps a folder directly to the host's native filesystem, bypassing the overlay logic.


9. Summary

OverlayFS is the secret to efficient container images.

  • LowerDir is the immutable history.
  • UpperDir is the ephemeral present.
  • Copy-on-Write ensures base images are never corrupted.
  • Whiteouts handle the illusion of deletion.
  • Volumes are the solution for when overlay performance isn't enough.

In the next lesson, we will look at the evolution of these technologies: From chroot to Docker.

Quiz Questions

  1. Why is it called an "Overlay" filesystem?
  2. What happens to the "Upper" directory when a Docker container is deleted?
  3. What is a "Whiteout" file and why is it necessary?

Continue to Lesson 4: Container Runtimes—From chroot to Docker.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn