Services and Endpoints: Decoding the Kernel-Level Load Balancer

In Module 3, we learned how to use a Service. We learned that it provides a stable IP for our Pods. But have you ever wondered how it actually works?

When you ping a Service IP (e.g., 10.96.0.1), where does that packet go? There is no "Service Container" running that IP. There is no physical load balancer appliance inside the cluster. Instead, Kubernetes performs a bit of "Networking Magic" by manipulating the Linux kernel of every single worker node.

In this lesson, we will look "Under the Hood." We will master the concepts of iptables and IPVS, understand the role of the kube-proxy, and learn how EndpointSlices allow Kubernetes to scale to thousands of pods without breaking the network.

1. The Virtual IP (VIP)

A Service IP is not a "Real" IP. If you run ifconfig on a worker node, you won't see the Service IP assigned to any network interface.

It is a Virtual IP. It only exists in the "Routing Rules" of the host operating system. When a packet leaves your FastAPI container and heads for the Service IP, the Linux kernel intercepts it and says: "I know where this is actually supposed to go!" and redirects it to a live Pod IP.

2. The Engine: kube-proxy

The kube-proxy is the component responsible for this interception. It runs on every single worker node.

How it works:

It watches the API Server for new Services or new Pods.
When a service is created, the kube-proxy writes a series of rules to the node's kernel.
These rules say: "If any traffic hits IP X, pick one of these Pod IPs (Y1, Y2, Y3) and send it there."

kube-proxy Modes:

iptables Mode (The Standard): Uses the built-in Linux firewall (iptables) to handle routing. It is simple but can get slow if you have thousands of services.
IPVS Mode (The High-Perf Option): Uses the In-Kernel Load Balancer (IPVS). This is much faster and supports better load-balancing algorithms (like "Least Connections").
KernelSpace (Windows): For Windows-based worker nodes.

3. The "Brain" of Discovery: EndpointSlices

A Service needs to know which Pods are healthy. It does this through the Endpoints (or the modern EndpointSlices) object.

When a pod:

Starts up successfully.
Passes its readinessProbe.
Has a label that matches the Service's selector.

...it is added to the EndpointSlice. Conversely, as soon as a pod starts shutting down, it is removed. This ensures that the kube-proxy never sends a user to a dead container.

4. Visualizing the Traffic Interception

graph LR
    P_App["Application Container"] -- "Call 10.96.0.1" --> NetStack["Linux Kernel Network Stack"]
    
    subgraph "The Kernel (The Invisible Switch)"
        NetStack -- "Matches Iptables Rule" --> Redir["Redirect to 10.244.1.5"]
    end
    
    Redir --> P_Target["Target Pod (A)"]
    
    KubeProxy["kube-proxy"] -- "Update Rules" --> NetStack

5. Load Balancing Algorithms

How does the Service decide which Pod to send the traffic to?

Round Robin: The default. Traffic is shared equally.
Session Affinity: Ensuring the same client stays on the same pod (Lesson 3.3).
Topology-Aware Routing: (Modern K8s) Telling the service to prefer pods on the Same Node or Same Availability Zone to save on cloud cross-zone data costs. This is critical for high-performance AI deployments.

6. Practical Example: Debugging the Iptables

If you ever suspect that your Service is "Lying" to you, you can look at the actual kernel rules on a worker node.

# SSH into a node and run:
iptables-save | grep MY-SERVICE-NAME

You will see many rules that look like this: -A KUBE-SVC-XXX -m statistic --mode random --probability 0.333 -j KUBE-SEP-YYY. This is the kernel saying: "For this service, there's a 33% chance I'll send you to Pod IP YYY."

7. AI Implementation: Real-time Endpoint Scaling

For an application like a Video Transcoding Agent or a Large Language Model (LLM) server, starting a pod is slow.

If your Service has 50 Endpoints, updating the kube-proxy rules across 100 nodes can take a few seconds. This is why EndpointSlices were invented. Instead of sending the entire list of 50 IPs every time one pod changes, K8s sends tiny "Slices" of the updates.

Why it matters: It allows your AI system to scale from 10 to 1,000 pods without the cluster's network becoming a bottleneck.

8. Summary and Key Takeaways

Virtual IPs: Service IPs are logical, not physical.
kube-proxy: The worker node agent that manages traffic redirection.
iptables vs IPVS: Use IPVS for massive scale (1000+ services).
EndpointSlices: The dynamic list of healthy pods used to update the kernel.
Topology Awareness: Optimize traffic to stay in the misma location to reduce latency and cost.

In the next lesson, we will transition from internal cluster networking to the public face of your app: Ingress Controllers and Rules.

9. SEO Metadata & Keywords

Focus Keywords: Kubernetes service internals explained, kube-proxy iptables vs ipvs, K8s endpoint vs endpointslices, how kubernetes load balancing works internally, debugging kubernetes iptables rules, topology aware routing K8s.

Meta Description: Go under the hood of Kubernetes networking. Master the internal mechanics of Services, kube-proxy, and EndpointSlices to understand how traffic is routed and load-balanced at the Linux kernel level for your high-scale AI and web applications.

Services and endpoints