Resource Requests and Limits: The Economics of the Cluster

In a shared Kubernetes cluster, resources like CPU and Memory are your currency. If you are too generous, you waste money on idle servers. If you are too stingy, your applications will crash, slow down, or—even worse—cause other applications on the same server to crash.

How do you guarantee that your FastAPI backend always has enough "Brainpower" to process an AI query? How do you stop a buggy Next.js frontend from consuming 100% of a server's RAM?

The solution is a two-tiered boundary system: Requests and Limits. In this lesson, we will master the art of resource sizing. We will look at how the Scheduler uses requests to find a home for your pods, how the Kubelet enforces limits to prevent "Noisy Neighbor" syndrome, and the different Quality of Service (QoS) classes that K8s uses to decide who to kill first when the cluster runs out of memory.

1. Requests vs. Limits: The "Guaranteed" vs. The "Ceiling"

Kubernetes asks you to define two numbers for both CPU and Memory.

A. Resource Requests (The "Contract")

The Request is the minimum amount of resources that Kubernetes guarantees your container.

Impact on Scheduling: When you run a pod with a request of 512Mi of RAM, the Scheduler will only look for worker nodes that have at least 512Mi of unallocated RAM.
Reservation: Even if your app is only using 10Mi currently, K8s "reserves" the full 512Mi for you. It treats that space as "Taken" so no other pod can ever steal it.

B. Resource Limits (The "Ceiling")

The Limit is the maximum amount of resources your container is allowed to consume.

Impact on CPU: If your app tries to use more CPU than its limit, it is Throttled (it slows down, but keeps running).
Impact on Memory: If your app tries to use more RAM than its limit, it is Killed (OOMKill - Out of Memory Kill).

2. Managing CPU: Millicores and Throttling

CPU in Kubernetes is measured in millicores (m).

1.0 or 1000m = One full core on the underlying server.
500m = Half a core.

The Throttling Mechanism

CPU is a "Compressible" resource. Unlike memory, you can "squeeze" CPU usage. If your app hits its limit, the Linux kernel simply stops giving it CPU cycles for a few milliseconds at a time.

Effect: Your API response might take 500ms instead of 50ms, but your app won't crash.

3. Managing Memory: Megabytes and OOMKills

Memory is an "Incompressible" resource. If your container needs 1 byte more than its limit, and there is no more room, the system has only one choice: Kill the process.

The "Noisy Neighbor" Problem

If you don't set a limit, a single buggy pod can consume ALL the RAM on a worker node. When the node hits 100% usage, the OOM Killer starts killing the most "guilty" processes. This could include your database, your API, or even critical cluster components. Always set limits.

4. QoS Classes: The "Social Hierarchy" of Pods

Kubernetes categorizes every Pod into one of three Quality of Service (QoS) classes based on its requests and limits. This classification determines which pods K8s will kill first during a resource shortage.

A. Guaranteed (The VIPs)

A Pod is "Guaranteed" if every container has Requests == Limits for both CPU and Memory.

Priority: Highest. K8s will only kill these pods as a last resort.

B. Burstable (The Middle Class)

A Pod is "Burstable" if it has at least one request set, but requests are lower than limits.

Priority: Medium. These pods can "burst" into unused space, but they will be sacrificed to protect "Guaranteed" pods.

C. BestEffort (The Peasants)

A Pod is "BestEffort" if it has zero requests and zero limits.

Priority: Lowest. K8s will kill these pods first if the node gets even slightly stressed.

5. Visualizing Resource Allocation

graph TD
    subgraph "Node Capacity (1000m CPU / 2Gi RAM)"
        G1["Pod A (Guaranteed): Req 250m / Lim 250m"]
        B1["Pod B (Burstable): Req 250m / Lim 500m"]
        BE1["Pod C (BestEffort): No limits"]
        Free["Unallocated Pool"]
    end
    
    style G1 fill:#9f9,stroke:#333
    style B1 fill:#ff9,stroke:#333Msg
    style BE1 fill:#f99,stroke:#333Msg

6. Practical Example: Sizing a FastAPI AI Agent

AI inference is CPU/GPU intensive. Here is a recommended configuration for a LangChain backend:

spec:
  containers:
  - name: ai-agent
    image: myrepo/ai-agent:latest
    resources:
      requests:
        cpu: "1000m" # Guarantee 1 full core
        memory: "2Gi"  # Guarantee 2GB RAM
      limits:
        cpu: "2000m" # Allow bursting to 2 cores for heavy queries
        memory: "4Gi"  # Don't let a memory leak pass 4GB

7. AI Implementation: Finding the "Optimal" Sizing

How do you know what to set for requests? If you guess too high, you waste money. If you guess too low, you get OOMKills.

The Strategy:

Deployment: Start with generous limits and small requests.
Load Test: Send 1,000 queries to your AI agent.
Monitor: Run kubectl top pods.
Analyze:
- If your app consistently uses 800m CPU and 1.5Gi RAM, set your Requests at that level.
- Set your Limits ~50% higher to handle unexpected spikes.

8. Summary and Key Takeaways

Requests: Guaranteed resources. Used by the Scheduler for placement.
Limits: Hard ceilings. CPU causes throttling; Memory causes OOMKills.
QoS Classes:
- Guaranteed: Req == Lim (Safest).
- Burstable: Req < Lim (Flexible).
- BestEffort: No Req/Lim (Dangerous).
Stability: Always define requests and limits to ensure your cluster remains predictable and healthy.

Congratulations!

You have completed Module 4: Deploying Applications. You now have the operational skills to deploy software at scale, manage versions, organize resources, and protect your hardware.

Next Stop: In Module 5: Networking in Kubernetes, we will move beyond basic services and master Ingress, Network Policies, and Cross-Cluster Routing.

9. SEO Metadata & Keywords

Focus Keywords: Kubernetes resource requests vs limits, K8s CPU millicores tutorial, memory limits and OOMKills, QoS classes Guaranteed Burstable BestEffort, kubectl top pods tutorial, Kubernetes sizing best practices.

Meta Description: Master the resource management system of Kubernetes. Learn how to use requests and limits to safeguard your cluster, understand the impact of CPU throttling and OOMKills, and ensure your production AI and web services have the power they need to perform.

Resource requests and limits