
The Titanium Shield: Container Security
Lock down your containers for production. Master Linux 'Capabilities' and 'Seccomp' profiles. Learn to remove the 'Root' power from within a container and restrict which kernel functions a process is allowed to call.
Container Security: Hardening the Runtime
In previous modules, we learned about standard permissions and MAC (SELinux). But containers have a special problem: they share the same Kernel as the host.
If a process in a container can talk to a dangerous part of the kernel (like the part that manages hardware drivers), it could potentially "Break Out" and take over the whole server.
To prevent this, Linux uses two advanced filtering technologies:
- Capabilities: Splitting the power of "Root" into 40 small pieces.
- Seccomp: A firewall for the kernel's brain (System Calls).
1. Linux Capabilities: Root is not Binary
In the old days, you were either a normal user (0 power) or Root (100% power). This was dangerous. A web server only needs the power to "Bind to Port 80." It doesn't need the power to "Load a kernel module" or "Reboot the server."
Linux Capabilities allow you to grant a process exactly what it needs and NO more.
Common Capabilities:
CAP_NET_BIND_SERVICE: Can listen on ports < 1024.CAP_CHOWN: Can change file owners.CAP_SYS_REBOOT: Can reboot the system.
# Docker by default REMOVES dangerous capabilities like SYS_REBOOT
# You can manually drop all and only add one:
docker run --cap-drop=ALL --cap-add=NET_BIND_SERVICE nginx
2. Seccomp: The System Call Firewall
We learned in Module 17 that programs talk to the kernel via System Calls (strace).
Seccomp (Secure Computing mode) allows you to create a "Whitelist" of system calls. If a containerized app tries to use a syscall that isn't on the list (even if it's running as root!), the kernel kills the process.
Docker comes with a Default Seccomp Profile that blocks about 50 dangerous syscalls (like mount, reboot, and swapon).
3. Practical: Testing for Security Gaps
How do you know if your container is "Too Powerful"? You can use the capsh or getcap tools.
# See which capabilities your current process has
capsh --print
4. The 'No-New-Privileges' Flag
This is a vital security setting. It prevents a process (and all its children) from ever gaining more power than they have right now. It disables SUID bits and Capability increases.
docker run --security-opt=no-new-privileges nginx
5. Identifying a Seccomp Blocking
If an app works on your laptop but fails in a container with a mysterious "Operation not permitted" error (even when running as root!), check for Seccomp.
# Search for Seccomp violations in the system log
sudo grep -i "seccomp" /var/log/syslog
6. Example: A Capability Auditor (Python)
If a hacker gains a shell in your container, the first thing they will check is which "Dangerous" capabilities you left enabled. Here is a Python script that flags high-risk capabilities.
import subprocess
import os
def audit_container_caps():
"""
Checks the current process for dangerous kernel capabilities.
"""
print("--- Container Capability Audit ---")
try:
res = subprocess.run(["capsh", "--print"], capture_output=True, text=True)
content = res.stdout
dangerous = ["cap_sys_admin", "cap_sys_module", "cap_sys_rawio", "cap_net_admin"]
has_danger = False
for cap in dangerous:
if cap in content.lower():
print(f"[!!!] DANGER: Process has {cap.upper()}! Escape is possible.")
has_danger = True
if not has_danger:
print("[OK] No high-risk administrative capabilities detected.")
except FileNotFoundError:
print("Tool 'capsh' not found. Is it a minimal container?")
if __name__ == "__main__":
audit_container_caps()
7. Professional Tip: Use 'ReadOnly' RootFS
One of the best security moves for a container is to use a Read-Only Filesystem. If a hacker manages to run code, they can't save a "Backdoor" or a "Malware" file to the disk because the disk is immutable.
docker run --read-only nginx
8. Summary
Container security is about "Restricting the Kernel surface."
- Capabilities break the power of root into granular pieces.
- Seccomp acts as a firewall for system calls.
--cap-drop=ALLis the best starting point for any secure container.- No-New-Privileges stops privilege escalation.
- Read-Only filesystems prevent persistent malware.
This concludes Module 18: Containers and Linux Internals. You now understand the deep kernel magic that makes modern devops possible.
In the next module, we will explore Linux for DevOps and Cloud—Automation at Scale.
Quiz Questions
- Why is
CAP_SYS_ADMINconsidered the most dangerous capability to grant a container? - What happens when a process tries to use a system call that is blocked by Seccomp?
- What is the benefit of the
--cap-drop=ALLstrategy?
End of Module 18. Proceed to Module 19: Linux for DevOps and Cloud.