
Hard Limits: Resource Quotas and Runtime Security
Master the operational safety of AI agents. Learn how to implement CPU, Memory, and Disk quotas to protect your infrastructure from runaway processes.
Resource Limits and Security
In a production environment, an agent is just another process running on your server. But unlike a standard web server, an agent's resource usage is Unpredictable. An agent might suddenly decide it needs to process a 10GB CSV file or run a Python script that calculates Pi to a trillion digits.
Without Resource Limits, a single confused agent can take down your entire cluster. In this lesson, we will learn how to set "Hard Borders" on what an agent can consume.
1. The Three Dimensions of Resource Control
To safe-guard your host, you must limit three things:
A. Compute (CPU)
- The Risk: An infinite loop
while True: passwill consume 100% of a CPU core, slowing down all other users. - The Limit: We assign fractional CPUs (e.g.,
0.2or200min K8s). - The Result: The agent runs slightly slower, but it never "Steals" cycles from the rest of the application.
B. Memory (RAM)
- The Risk: Loading a massive dataset into memory can trigger an OOM (Out Of Memory) crash. If the host OOMs, the entire operating system might reboot.
- The Limit: We set a hard RAM ceiling (e.g.,
256MB). - The Result: If the agent goes over, ONLY that agent's container is killed. The rest of your system stays alive.
C. Disk (Storage)
- The Risk: An agent writes millions of lines of logs to a
.txtfile until the disk is full. - The Limit: We use ephemeral storage or
tmpfs(RAM-based filesystems) to limit write capacity.
2. Implementing Limits in Docker
When you launch an isolated agent, you use these flags:
docker run \
--cpus=".5" \ # Max 50% of one core
--memory="512m" \ # Max 512MB RAM
--memory-swap="512m" \ # Disable swapping to disk (Slows things down)
--pids-limit=50 \ # Prevent "Fork Bombs" (Creating thousands of processes)
--ulimit nofile=1024 \ # Limit open file handles
my-agent-runtime
3. The "Watchdog" Pattern
Even with hard limits, an agent might "hang" (consume its allowed CPU but never finish). For this, we use a Timeout Watchdog.
Implementation Strategy
- The Orchestrator starts a timer.
- If the tool/container doesn't return a result in 30 seconds, the Orchestrator sends a
SIGKILLto the container. - The Orchestrator informs the LLM: "ERROR: The task took too long and was terminated. Please simplify your approach."
4. Syscalls and Kernel Security
A "Container" is not a 100% perfect security barrier. It shares the same Kernel as the host. Advanced agents might attempt to exploit kernel vulnerabilities.
Seccomp and AppArmor
In high-security environments, we use:
- Seccomp Filters: Restrict which "System Calls" the agent can make. For example, prevent the agent from using the
mountorptracecalls. - Rootless Docker: Running the entire Docker daemon as a non-privileged user.
5. Network Egress: The "Wall"
An agent should never have open access to your local network (192.168.x.x or 10.0.x.x).
The Default-Deny Policy
- Deny ALL network traffic from the agent container.
- Specifically Whitelist only the APIs it needs (e.g.,
api.openai.com,google-search.graphql.com). - This prevents a "compromised" agent from scanning your internal databases for vulnerabilities.
6. The "Burner" Principle
Every agent session should use a unique, fresh identity.
- Do not reuse containers across users.
- Do not reuse temporary volumes.
- Security Goal: Treat every agent session as a "Burner Phone." Use it once, break it, and throw it away.
Summary and Mental Model
Think of Resource Limits as Insurance.
- You hope the agent behaves well.
- But if it goes "Insane," you have a system in place that protects your house (The Host) from the fire.
As a Production AI Engineer, your value is not just in making the AI smart, but in making it Manageable.
Exercise: Limit Design
- Memory Allocation: You are building an agent that uses the
Pandaslibrary to process a 50MB CSV file.- Would you set the memory limit to 50MB? (Hint: Think about the size of the Python runtime and the libraries themselves).
- CPU Management: How would you detect if an agent is in an "Infinite Loop"?
- List two metrics you would monitor in your dashboard (Module 16).
- Security Policy: Why is it safer to "Block All Internet" by default and only open specific ports?
- Give an example of a "Domain Whitelist" for a travel agent. Ready to handle the secrets? Let's move to Environment Config.