
API server, etcd, controller manager, and scheduler
Master the internals of the Kubernetes control plane. Understand how these four services collaborate to maintain the desired state of your global infrastructure.
The Anatomy of the Brain: API Server, etcd, Controller Manager, and Scheduler
If you want to move from being someone who "uses" Kubernetes to being a Kubernetes "Expert," you must understand the internal machinery of the Control Plane. These four components—the API Server, etcd, Controller Manager, and Scheduler—are the gears and pistons that power every high-availability cluster on the planet.
In this lesson, we will perform a "Virtual Dissection" of these four services. We will look at how they talk, how they crash, and how they recover. We will explore the algorithms the Scheduler uses to make decisions, the consistency models etcd uses to protect your data, and the loop-logic that keeps the Controller Manager running 24/7.
1. kube-apiserver: The Gateway and the Gatekeeper
The kube-apiserver is the central management hub of Kubernetes. No matter how you interact with your cluster—whether it’s through the kubectl CLI, a web dashboard, or an automated CI/CD pipeline—every single request goes through the API Server.
Its Primary Functions:
- Authentication: Who are you? (Certificates, Tokens, or OIDC).
- Authorization: What can you do? (RBAC - Role Based Access Control).
- Admission Control: Should this request be allowed based on cluster-wide policies? (e.g., "Don't allow pods without a 'team' label").
- Schema Validation: Is your YAML actually valid Kubernetes syntax?
- Synchronization: Coordinating the state changes between the user and the storage (etcd).
The Lifecycle of an API Request
Imagine you run kubectl apply -f my-app.yaml. Here is exactly what the API Server does:
graph TD
A["HTTP Request (POST/PUT)"] --> B["Authentication (Cert/Token)"]
B --> C["Authorization (RBAC)"]
C --> D["Mutating Admission Controllers (Modify the YAML)"]
D --> E["Schema Validation (JSON/YAML check)"]
E --> F["Validating Admission Controllers (Policy check)"]
F --> G["Persistence (Write to etcd)"]
G --> H["Success Response (201 Created)"]
Why It Scales
The API Server is Stateless. This is a critical design choice. Because it doesn't store data locally (it uses etcd), you can run multiple copies of the API Server behind a load balancer. If one crashes, the others pick up the slack without losing a single bit of information.
2. etcd: The Infinite Memory (The Cluster's Soul)
etcd is the most important component in your cluster. It is a distributed, reliable key-value store used to store all of Kubernetes' internal state.
The "Single Source of Truth"
Every pod, every secret, every deployment, and every node's health status is stored in etcd. If you delete etcd, you delete the "Memory" of the cluster. Your containers will still be running on their nodes, but the Control Plane will have no idea they exist, how to scale them, or how to route traffic to them.
Consistency vs. Availability (The CAP Theorem)
etcd prioritize Consistency over Availability. It uses the Raft Consensus Algorithm.
- In a cluster of 3 etcd nodes, you must have at least 2 healthy nodes to maintain a "Quorum."
- If 2 nodes out of 3 die, etcd stops accepting writes. It would rather fail than risk giving you out-of-date or "Dirty" data.
Security Consideration
Because etcd contains every Secret (usernames, passwords, AWS keys) in your cluster, it is the #1 target for attackers.
- Best Practice: Always enable Encryption at Rest for etcd and ensure the Control Plane communication uses mTLS (Mutual TLS).
3. kube-controller-manager: The Eternal Loop
The Controller Manager is the "Enforcer." It is responsible for making sure the Actual State of the cluster matches the Desired State.
The Loop Logic (Control Loops)
Inside the Controller Manager, there are dozens of smaller "Controllers." Each one runs a simple, infinite loop that looks like this:
- Observe: Look at the current state (Query the API Server).
- Compare: How does this differ from the Desired State (YAML)?
- Act: If different, take the smallest possible step to fix it.
Major Controllers Explained:
- Node Controller: Monitors the health of worker nodes. If a node stops sending heartbeats for 40 seconds, it marks the node as "Unreachable." If it stays gone for 5 minutes, it orders the pods on that node to be recreated elsewhere.
- Replication Controller: Does the count match? If you asked for 10 replicas and there are only 8, it creates 2 more.
- Endpoint Controller: This is the bridge between Pods and Services. When a pod is born, the Endpoint Controller adds its IP to the Service's list so it can receive traffic.
Example: Building a Custom Controller with FastAPI
As a developer, you can actually build your own controllers! Imagine you want an "Auto-Cleanup" controller that deletes any pod that has been running for more than 24 hours.
# A conceptual logic for a custom K8s controller using Python
from kubernetes import client, config, watch
import datetime
def run_cleanup_controller():
config.load_kube_config() # Or load_incluster_config()
v1 = client.CoreV1Api()
# "Watch" for pod events in real-time
w = watch.Watch()
for event in w.stream(v1.list_pod_for_all_namespaces):
pod = event['object']
start_time = pod.status.start_time
# Calculate age
if start_time:
age = datetime.datetime.now(datetime.timezone.utc) - start_time
if age.days >= 1:
print(f"Pod {pod.metadata.name} is too old. Deleting...")
v1.delete_namespaced_pod(pod.metadata.name, pod.metadata.namespace)
4. kube-scheduler: The Matchmaker
The Scheduler has one job: To find a home for new Pods. When a pod is created without a nodeName assigned to it, the Scheduler wakes up and begins its decision-making process.
The Two-Phase Scheduling Algorithm
The Scheduler uses a sophisticated algorithm to pick the "Winning" node for your pod.
Phase 1: Filtering (Predicates)
First, it filters out all nodes that cannot run the pod.
- Resource Check: Does the node have enough free CPU and RAM?
- Port Check: Is the port the pod needs already taken on this node?
- Node Selectors: Did the user specifically ask for a node with a GPU?
Phase 2: Scoring (Priorities)
Once it has a list of "Feasible" nodes, it scores them based on dozens of criteria to find the best fit.
- Least Requested: Prefers nodes that are mostly empty (spreading out the load).
- Most Requested: Prefers nodes that are already somewhat full (compacting for cost savings - "Bin Packing").
- Affinity: Prefers nodes where other related pods are already running (for low-latency communication).
Visualizing the Scoring Logic
graph TD
A["New Pod created"] --> B["Filtering Phase"]
B --> B1["Node 1: Not enough RAM (Fail)"]
B --> B2["Node 2: Fits (Feasible)"]
B --> B3["Node 3: Fits (Feasible)"]
B2 & B3 --> C["Scoring Phase"]
C --> C1["Node 2 Score: 8/10 (Better CPU)"]
C --> C2["Node 3 Score: 5/10 (High Latency)"]
C1 --> D["Binding: Assign Pod to Node 2"]
5. Integrating the Control Plane into your Dev Workflow
As a developer, you might wonder: "Why do I need to know this for my React or Next.js frontend?"
Because you can build Internal Developer Platforms (IDP) that talk to the API Server. Imagine building a dashboard for your company where a manager can click a button to "Deploy AI Agent." Your Next.js app sends a request to your FastAPI backend, which then uses the Kubernetes Python Client to talk to the API Server.
The API Server then delegates to the etcd for storage, the Scheduler for assignment, and the Controller Manager for maintenance. You aren't just writing an app; you are orchestrating an entire system.
6. Failure Scenarios and Recovery
What happens when things break?
- API Server goes down: You can't change anything.
kubectlwill error. But your existing applications will keep running because the Kubelet on the worker nodes already knows what to do. - Scheduler goes down: New pods will stay in a "Pending" state because no one is there to assign them to nodes.
- Controller Manager goes down: If a pod crashes, no one will notice or restart it. Your cluster loses its "Self-Healing" ability.
- etcd goes down: Total catastrophe. The cluster is unresponsive and cannot be recovered without a backup.
7. AI Implementation: High-Performance GPU Scheduling
In the world of LLMs and AWS Bedrock, resource scheduling is critical.
If you have a Python app using LangChain that requires 16GB of VRAM to run a local embedding model, you must define Resource Requests.
# How the Scheduler sees your AI pod requirements
resources:
requests:
nvidia.com/gpu: 1
memory: "16Gi"
cpu: "4000m"
The Scheduler reads this and ignores any worker nodes that don't have a high-end GPU or enough available RAM. This ensures your AI workloads never end up on "Burstable" instances that would crash under the load.
8. Summary and Key Takeaways
- kube-apiserver: Front door, stateless, highly available.
- etcd: The cluster's database, strict consistency, the source of truth.
- kube-controller-manager: The enforcer of the "Desired State" through constant loops.
- kube-scheduler: The matchmaker that uses filtering and scoring to place pods on the best hardware.
Together, these four services create the "Autonomous Orchestration" that makes Kubernetes the choice for planetary-scale applications.
In the next lesson, we will look at the components that sit on the Worker Nodes: the Kubelet, kube-proxy, and Container Runtime.
9. SEO Metadata & Keywords
Focus Keywords: Kubernetes control plane components, kube-apiserver internals, etcd consensus algorithm Raft, kube-scheduler filtering and scoring, Kubernetes failure recovery scenarios, Python Kubernetes controller example.
Meta Description: Deep dive into the four internal services of the Kubernetes Control Plane. Learn how the API Server, etcd, Controller Manager, and Scheduler work together to provide self-healing, automated orchestration for your global cloud applications and AI services.