High Availability

High availability (HA) ensures your Kubernetes cluster continues operating even when individual components fail. In a single-node control plane, if that node fails, the entire cluster becomes unusable. High availability distributes control plane components across multiple nodes, so the failure of one node doesn’t bring down the cluster.

Think of high availability like having multiple engines on an airplane. If one engine fails, the others keep the plane flying. Similarly, with multiple control plane nodes, if one fails, the others continue serving the cluster.

What Is High Availability?

High availability in Kubernetes means:

  • Multiple Control Plane Nodes - Run API server, controller manager, and scheduler on multiple nodes
  • Clustered etcd - Run etcd as a distributed cluster (typically 3 or 5 nodes)
  • Load Balanced API Traffic - Distribute API server requests across control plane nodes
  • Node Redundancy - Run worker nodes across multiple availability zones
  • Automatic Failover - Components automatically use healthy nodes when others fail
graph TB subgraph "High Availability Cluster" subgraph "Control Plane" A1[API Server 1] A2[API Server 2] A3[API Server 3] E1[etcd 1] E2[etcd 2] E3[etcd 3] C1[Controller Manager] C2[Controller Manager] C3[Controller Manager] S1[Scheduler] S2[Scheduler] S3[Scheduler] end LB[Load Balancer] subgraph "Worker Nodes" W1[Node 1<br/>Zone A] W2[Node 2<br/>Zone B] W3[Node 3<br/>Zone C] end end LB --> A1 LB --> A2 LB --> A3 A1 --> E1 A2 --> E2 A3 --> E3 E1 <--> E2 E2 <--> E3 E3 <--> E1 W1 --> LB W2 --> LB W3 --> LB style LB fill:#e1f5ff style A1 fill:#fff4e1 style A2 fill:#fff4e1 style A3 fill:#fff4e1 style E1 fill:#e8f5e9 style E2 fill:#e8f5e9 style E3 fill:#e8f5e9

Control Plane High Availability

The control plane consists of several components, each with different HA requirements:

API Server

The API server is stateless and can run multiple instances. A load balancer distributes requests across all API server instances. If one API server fails, requests automatically go to the others.

etcd

etcd is stateful and requires clustering for HA. etcd uses a consensus algorithm (Raft) that requires a quorum (majority) of nodes to operate:

  • 3-node etcd - Can tolerate 1 node failure (needs 2 of 3 for quorum)
  • 5-node etcd - Can tolerate 2 node failures (needs 3 of 5 for quorum)
  • 7-node etcd - Can tolerate 3 node failures (needs 4 of 7 for quorum)

More nodes provide better fault tolerance but increase complexity and latency. Most clusters use 3-node etcd.

graph LR subgraph "3-Node etcd Cluster" E1[etcd 1] E2[etcd 2] E3[etcd 3] end E1 <-->|Raft Consensus| E2 E2 <-->|Raft Consensus| E3 E3 <-->|Raft Consensus| E1 Q[Quorum: 2 of 3<br/>Tolerates 1 failure] style E1 fill:#e8f5e9 style E2 fill:#e8f5e9 style E3 fill:#e8f5e9 style Q fill:#fff4e1

Controller Manager and Scheduler

These components use leader election—only one instance is active at a time, but multiple instances run for redundancy. If the active instance fails, another instance takes over automatically.

etcd Topologies

How etcd is deployed affects availability:

Stacked etcd Topology

etcd runs on the same nodes as control plane components. This is simpler but couples etcd availability with control plane availability.

Advantages:

  • Simpler setup (fewer nodes)
  • Lower resource requirements
  • Easier to manage

Disadvantages:

  • etcd and API server share fate (if node fails, both are affected)
  • More complex recovery (need to restore both)

External etcd Topology

etcd runs on separate nodes from control plane components. This provides better isolation and is recommended for production.

Advantages:

  • Better isolation (etcd and control plane failures are independent)
  • Can scale etcd separately
  • More resilient to failures

Disadvantages:

  • More nodes to manage
  • Higher resource requirements
  • More complex setup

Load Balancing Control Plane Traffic

All components (kubelet, kube-proxy, controllers, users) need to connect to the API server. In an HA setup, a load balancer distributes this traffic:

graph TB subgraph "Clients" K1[kubelet 1] K2[kubelet 2] U1[Users] C1[Controllers] end LB[Load Balancer<br/>VIP: 10.0.0.100] subgraph "Control Plane" A1[API Server 1<br/>10.0.0.11] A2[API Server 2<br/>10.0.0.12] A3[API Server 3<br/>10.0.0.13] end K1 --> LB K2 --> LB U1 --> LB C1 --> LB LB --> A1 LB --> A2 LB --> A3 style LB fill:#e1f5ff style A1 fill:#fff4e1 style A2 fill:#fff4e1 style A3 fill:#fff4e1

The load balancer must:

  • Health check API servers
  • Distribute traffic evenly
  • Handle API server failures gracefully
  • Provide a stable endpoint (VIP) that doesn’t change

Failure Scenarios

High availability protects against various failure scenarios:

Single Control Plane Node Failure

  • API server: Load balancer routes to other API servers (no impact)
  • Controller manager/scheduler: Another instance takes over via leader election (brief pause)
  • etcd (stacked): Cluster continues with remaining etcd nodes (if quorum maintained)

etcd Node Failure

  • 3-node etcd: Cluster continues with 2 nodes (quorum maintained)
  • 5-node etcd: Cluster continues with 4 nodes (quorum maintained)
  • If quorum lost: etcd becomes read-only (cluster effectively down)

Load Balancer Failure

  • Single point of failure
  • Mitigate with redundant load balancers or DNS-based failover

Availability Zone Failure

  • Distribute control plane nodes across zones
  • Distribute worker nodes across zones
  • Use Pod Disruption Budgets to maintain application availability

HA Setup with kubeadm

kubeadm supports HA cluster setup:

  1. Initialize first control plane node - Creates certificates and initial configuration
  2. Copy certificates - Share certificates to other control plane nodes
  3. Join additional control plane nodes - Use kubeadm join with control-plane flag
  4. Configure load balancer - Set up load balancer pointing to all API servers
  5. Update kubeconfig - Point to load balancer VIP instead of single node

Best Practices

  1. Use 3 or 5 etcd nodes - Odd numbers prevent split-brain scenarios
  2. Distribute across zones - Place nodes in different availability zones
  3. Monitor etcd health - Watch etcd cluster health and quorum status
  4. Test failure scenarios - Regularly test node failures to verify HA works
  5. Document procedures - Document how to add/remove control plane nodes
  6. Use external etcd for production - Better isolation than stacked topology
  7. Configure proper load balancing - Use health checks and proper algorithms
  8. Plan for upgrades - Upgrade HA clusters one node at a time
  9. Monitor leader election - Ensure controller manager and scheduler have leaders
  10. Backup etcd regularly - Even with HA, backups are essential

Topics

See Also