The Tale of Kube-proxy: An Epic Exploration of Networking in Kubernetes Clusters

Yes, back with a technical topic!

Aug 02, 2023

In the land of cloud orchestration, where containers dwell, and clusters flourish, a tale unfolds—one that beckons the minds of engineers far and wide. Dear readers, lend me your ears, for the road ahead winds back to the technical heartland, away from the personal musings and challenges of learning and writing that have lately filled these pages (a delightful detour, yet a detour nonetheless).

In the bustling marketplaces and grand conference halls, where technologies clash and meld, your humble narrator has been tasked with a noble pursuit. As a Sales Engineer at Isovalent, I find myself heralding the virtues of replacing kube-proxy with Cilium's innovative implementation. A task, mind you, that has led to inquiries aplenty! (Such is the way of curiosity among the wise, you know, haha!)

A few days passed, and under the watchful eyes of the virtual cloud, I encountered a knowledgeable steward keen to understand the mysteries of kube-proxy. "As you now know, kube-proxy has some scalability issues on large Kubernetes clusters," I began, only to be met with a swift retort: "No, I don't know, actually, tell me more." Ah, dear friends, that simple inquiry spiraled into a three-hour adventure—a dance through the intricacies of kube-proxy and a spirited exploration of networking in their majestic cluster.

As the storytellers of yore spun epics around firesides, so did we delve into the workings of kube-proxy. The time flew like an eagle over the mountains, and our discussion became a tapestry of understanding woven with insight and inquiry. This post shall unfold from the notes of that fateful meeting, a treasure trove from a journey into the very heart of Kubernetes.

May these words illuminate the path for those seeking to grasp how kube-proxy operates, the impact it may bestow upon your clusters, and the wisdom gleaned from venturing deep into the implementation's core. Join me, for the adventure, has only just begun.

The Land of Kubernetes and the Role of Kube-proxy

Kubernetes—you've heard of it, right? If not, I'll forgive you this once, but only because you're here to embark on an epic exploration with me. If you find yourself eager to dig into the vast landscape of Kubernetes, look no further than the tome penned by Nigel Poulton. His book is a veritable treasure trove for those seeking to conquer Kubernetes. But enough of that, let's get to the real adventure here: kube-proxy.

What's kube-proxy, You Ask?

In the bustling realm of Kubernetes, kube-proxy is like the traffic cop of the network, directing the flow of data and ensuring everything gets where it needs to go. It's a vital part of Kubernetes networking, taking care of the Service abstraction by managing all the intricate rules and routes that keep your cluster running smoothly.

Three Hats of kube-proxy: Userspace, iptables, IPVS

Kube-proxy has been known to wear a few hats, depending on the occasion. You might see it using the Userspace mode, iptables, or IPVS. Each of these has its quirks and features, but in this tale, we'll focus on the iptables mode, as that's where the real action happens.

Iptables: The Heart of the Matter

Iptables is kube-proxy's go-to tool for managing how packets move within the cluster. Think of it as the wizard behind the curtain, conjuring up rules to guide the traffic and perform Network Address Translation (NAT). It's like a magical GPS for your data, ensuring every packet finds its way to the right Pod.

But Why kube-proxy?

Kube-proxy might seem like just another part of the Kubernetes machinery, but be aware. Its role in translating Service IPs to Pod IPs is crucial for enabling communication within the cluster. Without kube-proxy, your Pods would be like ships lost at sea, unable to find their way.

A Hint of Things to Come

In this article, we won't just scratch the surface. Oh no, we'll dive deep into the workings of kube-proxy, uncovering the secrets that lie beneath. From creating rules to the nitty-gritty of packet capturing, we'll journey through the network wilderness together.

So grab your virtual hiking boots, and let's begin the epic exploration of networking in Kubernetes clusters. The tale of kube-proxy awaits, and trust me, you won't want to miss a beat.

The Three Roads: Userspace, iptables, and IPVS

In the bustling city of Kubernetes networking, kube-proxy stands as a vigilant gatekeeper, orchestrating traffic flow with finesse and precision. But like any seasoned conductor, kube-proxy has more than one way to lead the orchestra. It has three distinct modes of operation, each with its unique characteristics. Let's break them down:

Userspace Mode: The Old Guard

In the early days, kube-proxy trod the path of Userspace. It was a simpler time, with kube-proxy managing connections by proxying them through a user-level process. While Userspace mode has charm and simplicity, it's like riding a bike with training wheels. Functional but less efficient than the other methods.

iptables Mode: The Middle Path

Enter iptables, the protagonist of our tale. The iptables mode pushes the heavy lifting down into the kernel. By crafting rules and using the kernel's netfilter framework, kube-proxy weaves a web of routes that guide packets like a skilled navigator.

- Rule Creation: Think of this as drawing the map. Rules are laid down to ensure that every packet knows its path.

- Packet Capturing: Here's where the magic happens. As packets roll in, they are examined, sorted, and sent on their way, all in the blink of an eye.

Stay tuned; we will venture deep into this forest of iptables rules in the next chapter.

IPVS Mode: The Road Less Traveled

The latest entrant to the kube-proxy modes is IPVS, short for IP Virtual Server. Sleek, fast, and built for high performance, IPVS leverages advanced data structures like hash tables to accelerate routing decisions.

- Built for Scale: If iptables is a seasoned explorer, IPVS is a rocket ship, designed to zoom through large-scale clusters without breaking a sweat.

- Kernel-Based: Like iptables, IPVS operates within the kernel but adds a touch of elegance and speed to the routing game.

While IPVS Mode is a beacon of performance and scalability, its path is not trodden by all. Its reliance on specific kernel features may create compatibility roadblocks, and the complexity of its setup can daunt even seasoned Kubernetes navigators. Transitioning from other modes like iptables can resemble a winding, challenging trail rather than a straightforward journey. Furthermore, the availability of community support, tooling, and education around IPVS might be more limited. For those managing smaller clusters or not grappling with the high demands of large-scale routing, the trusted iptables mode may still be the favored guidepost. It's a less traveled road that offers a unique landscape for those who venture down its path.

Choose Your Adventure

Each of these modes has a place in the grand story of Kubernetes, and choosing the right one is akin to picking the right tool for the job. But fear not, dear reader, for our journey will guide you through the intricate details of the iptables mode, unlocking secrets and gaining wisdom along the way.

The customer asks: How can I know which mode I am using?

You can determine the mode that kube-proxy is running in by checking the configuration on any of your cluster nodes.

1. SSH into one of your Kubernetes nodes where kube-proxy runs.

2. Find the kube-proxy Pod name in the kube-system namespace by running:

kubectl get pods -n kube-system -l k8s-app=kube-proxy

3. Describe the kube-proxy Pod to see its configuration. Replace `<pod-name>` with the actual Pod name:

kubectl describe pod <pod-name> -n kube-system

4. Look for the `--proxy-mode` flag in the command-line arguments. The value of this flag will tell you the mode that kube-proxy is using. If the flag is not set, kube-proxy defaults to using iptables mode.

5. Alternatively, you can also check the config map used by kube-proxy:

kubectl get configmap kube-proxy -n kube-system -o yaml

Look for the `mode` field under the `config.conf` key. If it's not set, you can assume the iptables mode is being used as it's the default.

iptables in kube-proxy: A Deep Dive into Implementation

kube-proxy plays a central role in Kubernetes networking, where iptables is one of the modes used to implement Service abstraction. This deep dive explores how kube-proxy leverages iptables to perform its functions.

1. Initialization of kube-proxy:

kube-proxy Start: When kube-proxy starts, it initializes itself with the current state of Services and Endpoints from the Kubernetes API Server.

Connecting to the API Server: kube-proxy uses client libraries to connect to the Kubernetes API Server.
Requesting Services and Endpoints: It constructs requests to fetch the Services and Endpoints. This is done using specific API paths like /api/v1/services and /api/v1/endpoints.
Handling API Responses: The response from the API Server will include the details of all the Services and Endpoints, often in JSON format. kube-proxy will parse this information.
Mapping to Internal Structures: After parsing the JSON, kube-proxy maps the data into its internal data structures representing Services and Endpoints.

Exploring the actual kube-proxy code in the Kubernetes codebase would provide a comprehensive view of all these details. You can find the kube-proxy code in the Kubernetes GitHub repository, particularly within the pkg/proxy directory.

Iptables Flushing: If kube-proxy is set to use iptables mode, it will flush existing iptables rules related to Kubernetes services, preparing to create new rules.

Iptables Interface Initialization: kube-proxy utilizes a specific interface to interact with iptables. This interface is defined in the package k8s.io/kubernetes/pkg/util/iptables and is used to create an object that can execute iptables commands.
Flushing Specific Chains: kube-proxy ensures that certain chains are flushed. It doesn't flush all chains but focuses on those that it manages. For example, in the iptables proxier (pkg/proxy/iptables/proxier.go), it might flush chains like KUBE-SERVICES, KUBE-EXTERNAL-SERVICES, etc.
Chain Flushing Implementation: The actual flushing of a chain is done by calling the FlushChain method on the iptables interface. This method takes the table and the chain name as parameters.

2. Configuration and Rule Creation:

Configuration Loading: kube-proxy loads its configuration, including the desired mode (e.g., iptables).

Service and Endpoint Discovery: kube-proxy watches the Kubernetes API for changes in Services and Endpoints. (more info in the section 3)

Rule Generation: Based on the discovered Services and Endpoints, kube-proxy generates iptables rules to handle traffic routing, load balancing, NAT, health checking, and more.

Determine the Appropriate Chains and Tables: Rule generation starts with determining the relevant chains and tables (e.g., nat table or filter table). These are often defined as constants within the kube-proxy code.
Build Rules for Services: For each service, kube-proxy creates rules to handle the traffic. This includes handling different service types (ClusterIP, NodePort, etc.) and features like load balancing.
Use iptables Interface: kube-proxy uses the iptables interface to execute the commands. More info also in the proxier

Custom Chains Creation: To optimize rule processing, kube-proxy can create custom chains in iptables.

When kube-proxy creates rules through iptables, it's actually defining rules within Netfilter's tables inside the Linux kernel.

3. Synchronization and Event Handling:

Synchronization Loop: kube-proxy maintains a continuous synchronization loop to ensure that iptables rules align with the current state of Services and Endpoints. You can find this loop in the SyncLoop method in the pkg/proxy/iptables/proxier.go file.

Event-Driven Updates: Changes in Services and Endpoints trigger events in kube-proxy, leading to immediate updates in iptables rules.

Periodic Reconciliation: Regular checks ensure that iptables rules are consistent with the desired state.

When kube-proxy synchronizes iptables rules, it's managing Netfilter's in-kernel structures.

4. Utilizing Iptables Command Line Interface (CLI):

Iptables Invocation: kube-proxy invokes the iptables CLI to add, delete, or update rules in the kernel's packet filtering subsystem.

Transaction-Based Changes: Changes are often done in a transactional manner, ensuring atomicity and consistency.

The iptables CLI is a user-space utility that communicates with the Netfilter subsystem in the kernel. When kube-proxy invokes iptables commands, it's interacting with Netfilter.

The invocation of iptables within kube-proxy happens through a series of function calls and interactions. It makes use of the `iptables.Interface` to manipulate iptables rules. Here's a breakdown of how this happens:

Iptables Interface Creation: kube-proxy creates an interface to iptables, often utilizing the `pkg/util/iptables` package. This interface provides methods for interacting with iptables.
Building Rules: kube-proxy builds iptables rules as strings using various methods and functions in the `pkg/proxy/iptables` package.
Adding, Deleting, and Checking Rules: Through the iptables interface, kube-proxy can add, delete, and check the existence of rules.
Committing Changes: After building the rules, kube-proxy commits them to iptables. This is done through the iptables interface.

5. Complex Networking Tasks:

NAT: kube-proxy programs iptables to handle DNAT and SNAT/Masquerading.

Load Balancing: Service-specific rules are created to distribute traffic among available Pods.

Connection Tracking: Utilizing the kernel's conntrack module, rules are created to remember and handle the state of individual connections.

Conntrack Utility Invocation: kube-proxy might invoke the conntrack utility to manage connection tracking entries. This is handled through functions in the pkg/proxy/conntrack
Conntrack Rules: In iptables, connection tracking rules can be created using the -m conntrack and -m state modules. You'll find these rules being created within the pkg/proxy/iptables package.
Deleting Stale Connections: When endpoints change, kube-proxy might need to delete stale connections to ensure that packets are not sent to old endpoints.

As packets travel through the system, Netfilter's hooks in the kernel are responsible for triggering the appropriate iptables rules programmed by kube-proxy. NAT, Load Balancing, and Connection Tracking are implemented through Netfilter hooks at different stages of the packet processing pathway

The customer asks: Knowing the components, what does that mean for my packets?

Arrival at Node: A packet arrives at a Kubernetes node destined for a Service IP.

Once the packet is received by the kernel's network stack, it is passed to the Netfilter subsystem, where the rules and chains defined by kube-proxy (through iptables) come into play.

Pre-routing and NAT Rules: The packet hits the `PREROUTING` chain in the `nat` table.

DNAT Rule Evaluation: If the destination matches a service's virtual IP, a DNAT rule translates it to an endpoint's Pod IP.

Custom Chains: The packet might traverse custom chains created for specific services or endpoints, ensuring efficient rule processing.

Routing Decision: The kernel makes a routing decision based on the translated destination IP.

Local Pod: If destined for a local Pod, it's routed to the `INPUT` chain.

Remote Pod: If destined for a Pod on another node, it's routed to the `FORWARD` chain.

Connection Tracking: The conntrack module evaluates the packet.

New Connections: The module identifies new connections and tracks their state.

Existing Connections: If part of an existing connection, the packet may be fast-tracked to the appropriate rule, aiding in session affinity.

Service and Endpoint Handling: The packet traverses rules related to service and endpoint selection.

Load Balancing: Rules implement simple load balancing, deciding the destination Pod.

Health Checking: Rules evaluating the health of endpoints, avoiding unhealthy ones.

Session Affinity Handling: If session affinity is configured, rules ensure the packet reaches the Pod previously selected for the session.

SNAT/Masquerading: If the packet leaves the node (for a remote Pod), a rule in the `POSTROUTING` chain may alter the source IP.

Final Delivery:

- Local Pod: The packet is delivered locally to the Pod.

- Remote Pod: The packet is forwarded to another node, where steps 2-7 repeat.

Return Traffic: The response follows a similar path in reverse. Connection tracking aids in ensuring the return packet follows the correct path back.

The customer asks: You mentioned Network Policies; where would they be used in this scenario?

Service and Endpoint Handling:

Pod-to-Pod Communication: Network policies are crucial at this stage when defining which Pods can communicate with each other. They can specify rules to allow or deny traffic based on namespace, Pod labels, or IP ranges.
Ingress and Egress Rules: Policies can be defined to control incoming (ingress) and outgoing (egress) traffic to and from Pods, enforcing specific security requirements for different parts of the application.

Final Delivery to a Local Pod:

Policy Enforcement: Before the packet is delivered to the destination Pod, network policies are applied to decide if the packet should be allowed or dropped. This decision is based on the policies that match the source and destination Pods, the protocols used, and the ports involved.
Isolation: By default, Pods accept traffic from any source. Network policies enable administrators to change this behavior, allowing specific communication paths and isolating others.
Flexibility and Control: Network policies offer fine-grained control, allowing exceptions and tailoring communication paths based on application needs.

Return Traffic:

Bidirectional Rules: Network policies are not only unidirectional. They can be used to enforce rules on return traffic, ensuring that response packets follow the same policy constraints, and maintaining consistent security postures.

The customer asks: What would happen if I delete kube-proxy?

If you delete kube-proxy from a Kubernetes node, it can have several significant impacts on the cluster's networking. Here's what could happen:

1. Loss of Service Routing: kube-proxy is responsible for maintaining the iptables or IPVS rules that handle routing traffic to the correct pods for services within the cluster. Without kube-proxy, these rules won't be updated or maintained, and new rules won't be created for new services.

2. Existing Connections May Continue: Depending on the exact setup and what was in place at the time kube-proxy was deleted, existing connections might continue to work for a while. However, this would likely be temporary, and issues could arise over time.

3. Failure to Handle Endpoint Changes: kube-proxy watches for changes to endpoints (such as Pods scaling up or down) and updates the rules accordingly. Without kube-proxy, these updates would cease, and the routing information would become stale, leading to incorrect routing.

4. Potential Impact on Network Policies: Depending on the CNI plugin being used and how it interacts with kube-proxy, there could also be impacts on network policies and how they are enforced.

Existing Network Policies: If the CNI plugin relies on kube-proxy's behavior or its management of iptables/IPVS for enforcing network policies, deleting kube-proxy could disrupt that enforcement. Existing network policies might not be applied correctly, leading to unexpected network behavior.
New Network Policies: Without kube-proxy, the creation or modification of network policies might also be affected. If the rules for new network policies are dependent on the state managed by kube-proxy, those rules might not be implemented correctly or at all.
No Impact in Some Cases: In some configurations and with certain CNI plugins, deleting kube-proxy might not impact network policies at all. If the CNI plugin handles network policies entirely separately from kube-proxy's management of service routing, then network policies (both existing and new) might continue to function normally.

5. Replacement with Other Solutions: If you're intentionally deleting kube-proxy as part of replacing it with another solution (like Cilium, as mentioned earlier in your context), the effects would depend on how that replacement is handled. Ideally, the new solution would take over the responsibilities of kube-proxy, and the transition would be managed smoothly.

6. Potential Cluster Instability: Deleting kube-proxy without a proper replacement or understanding of how your specific cluster is configured could lead to instability and unpredictable behavior in the cluster's networking. Troubleshooting and recovery might be complex.

7. Node-Level Impact: It's worth noting that kube-proxy runs on each node, so deleting it from a single node would impact only that node's ability to route service traffic correctly. Deleting it cluster-wide would have a more widespread effect.

kube-proxy and iptables: Performance Challenges in a Symbiotic Relationship

The collaboration between kube-proxy and iptables in Kubernetes is a vital aspect of cluster networking, ensuring that service requests are accurately routed to the right pods. However, this partnership is not without its challenges, particularly when it comes to performance. Here's a closer look at what can cause performance bottlenecks:

Large Number of Services and Endpoints

As the Kubernetes cluster grows, so does the complexity of the iptables rules that kube-proxy must manage. This growth in rules, processed linearly, can slow down packet processing, leading to latency.

Frequent Changes to Services or Endpoints

A dynamic cluster with frequent updates to services or endpoints means constant updates to iptables rules. This continual churn, including full resynchronizations in some cases, can degrade performance.

Usage of Specific Features

Features like connection tracking and session affinity, while powerful, add overhead. This additional complexity can lead to performance issues, especially with many connections or complex configurations.

Kernel-Level Overhead

The netfilter subsystem in the kernel, interfacing with iptables, might become a bottleneck. This situation is particularly true under high loads or with extensive rule sets, leading to delays.

Cluster-Wide Synchronization Issues

Ensuring consistency across nodes for the iptables rules can become challenging, particularly in large or highly dynamic clusters. This inconsistency can translate into unpredictable behavior and performance degradation.

Potential Misconfigurations or Bugs

Inefficient rules or specific versions of kube-proxy with known performance bugs can exacerbate these challenges, leading to further slowdowns.

These performance problems underline the importance of proper tuning, monitoring, and understanding of your specific cluster's behavior. They also make a case for considering alternative solutions or modes, such as IPVS mode in kube-proxy, particularly in complex or large-scale deployments. Being aware of these potential bottlenecks enables a more robust and responsive Kubernetes networking infrastructure, tailored to the unique demands of your applications and services.

THE END

As we journey back from our extensive exploration of kube-proxy, I feel a sense of accomplishment and excitement. This deep dive into technical terrain marks a return to the roots of what so many of you have asked for – the nitty-gritty, the code, the inner workings of Kubernetes networking.

From the mysterious lands of iptables to the busy highways of connection tracking, we've unraveled the secrets that lie beneath the Kubernetes clusters we interact with daily. It's been an adventure filled with code, concepts, and even some Tolkien-inspired storytelling (My customer is a big fan, sorry for those who didn't like the intro!).

For those of you who know me, my conversation with the customer wouldn't have been complete without some scribbles and drawings. If you're curious to see my hand-drawn diagrams that accompanied this technical discussion, feel free to reach out. They might add another layer of understanding to this intricate subject.

In closing, I hope this article has shed light on kube-proxy's implementation and its impact on your Kubernetes clusters. The world of technology is vast and ever-changing, and it's invigorating to delve into these depths. Whether you're a seasoned Kubernetes veteran or just starting your journey, I trust that this exploration has enriched your knowledge.

Thank you for accompanying me on this epic quest. Until our next technical adventure, keep exploring, keep learning, and never hesitate to dive into the code.

KubeStory

Discussion about this post