Module 8 Exercises: Advanced Autoscaling

In Module 8, we learned how to make our infrastructure responsive to the real world. You learned how to scale Pods (HPA/VPA) and Nodes (Cluster Autoscaler). These exercises will walk you through setting up a truly elastic environment.

Exercise 1: Triggering the HPA Surge

Deployment: Create a Deployment named load-test with 1 replica and a CPU request of 100m.
HPA: Create an HPA for that deployment with minReplicas: 1, maxReplicas: 10, and a target CPU of 50%.
Stress Test: Run a "Load Generator" pod (e.g., busybox running while true; do wget -q -O- http://load-test; done).
Observation:
- Watch the HPA status with kubectl get hpa -w.
- How long did it take for the first new pod to be created?
- Once the new pods were running, did the "CPU %" in kubectl get hpa drop or rise? Why?

Exercise 2: Analyzing VPA Advice

Deployment: Create a Pod that intentionally has "Stingy" resources (e.g. Request 50m CPU and 50Mi RAM) but runs a heavy workload.
VPA: Create a VerticalPodAutoscaler with updateMode: Off.
Observation: Wait 10 minutes. Run kubectl describe vpa.
Action: What are the "Target" recommendations? If you were to switch the VPA to Auto mode, would the pod restart immediately?

Exercise 3: Cluster "Headroom" Analysis

Scenario: You have 3 nodes, each with 4 Cores. Your pods are currently requesting a total of 10 Cores.
Question: If you decide to scale your Deployment to 50 replicas, each requiring 200m CPU, what will happen?
- Total requested CPU = 50 * 200m = 10 Cores.
- Existing Pods = 10 Cores.
- Total = 20 Cores.
Prediction: Which component will catch this? What will be the final state of the 50 new pods?
Action: Describe the "Scale Up" event you would see in the Cluster Autoscaler logs.

Exercise 4: Scaling for "Zero" (Idling)

Goal: Configure an HPA that can scale your deployment down to 0 replicas when there is no traffic.
Investigation: By default, does minReplicas: 0 work in the standard Kubernetes HPA? (Hint: Check Lesson 1).
Solution: If the standard HPA only supports minReplicas: 1, what external project would you use to allow "Scale to Zero"? (Hint: It starts with 'K').

Solutions (Self-Check)

Exercise 1 Answer:

It usually takes 30-60 seconds to trigger.
The "CPU %" will drop. As more pods are added, the average utilization of the group decreases, which is the goal of the HPA.

Exercise 2 Hint:

If you switch to Auto, the pod will Restart immediately only if the current requested resources are significantly different (usually >10%) from the recommendation.

Exercise 3 Logic:

The Cluster Autoscaler will catch the "Pending" status of the new pods. It will see that the cluster needs 8 more cores and will trigger the cloud provider to spin up 2-3 additional worker nodes.

Exercise 4 Solution:

The standard HPA only scales down to 1.
To scale to 0, you need KEDA (Kubernetes Event-Driven Autoscaling). KEDA is the industry standard for serverless-style scaling on K8s.

Summary of Module 8

You have built an elastic system.

You can handle traffic surges automatically with HPA.
You can save cloud costs and prevent OOMKills with VPA.
You can grow the physical cluster with the Cluster Autoscaler.
You understand the math of the scaling algorithm.

In Module 9: Logging, Monitoring, and Observability, we will learn how to build the dashboards that visualize this dynamic behavior.