kubeadm Production Readiness: HA Support and Upgrade Paths

K8s Guru
5 min read
kubeadm Production Readiness: HA Support and Upgrade Paths

Introduction

By mid-2017, kubeadm had evolved from its beta origins into a tool capable of bootstrapping production-grade clusters. With Kubernetes 1.7 and 1.8, kubeadm gained high availability (HA) control plane support and upgrade workflows that addressed the “single-master only” limitation that constrained early adopters.

This mattered because teams choosing between kubeadm and tools like kops now had a clearer decision framework: kubeadm offered a cloud-agnostic, infrastructure-light path that worked on-premises, while kops remained the AWS-optimized choice for teams already committed to that ecosystem.

kubeadm Improvements in 2017

High Availability Support

  • Stacked Control Plane: kubeadm 1.7+ supports multiple master nodes with etcd co-located on control plane nodes (stacked topology).
  • External etcd: For larger deployments, kubeadm can configure masters to use an external etcd cluster.
  • Load Balancer Integration: HA setups require an external load balancer (HAProxy, nginx, cloud LB) in front of API servers.
  • Certificate Management: kubeadm generates and distributes certificates across all master nodes automatically.

Upgrade Workflows

  • kubeadm upgrade Command: Introduced in Kubernetes 1.8, enabling in-place upgrades from one minor version to the next.
  • Planned Downtime: HA upgrades still require careful coordination; the upgrade process drains and updates masters sequentially.
  • Version Skew Policy: kubeadm enforces Kubernetes version skew rules (masters can be one minor version ahead of nodes).

Production Patterns

  • Configuration Files: kubeadm config API (v1alpha1) allows declarative cluster configuration, enabling GitOps-friendly setups.
  • Add-On Management: Networking (CNI), DNS (CoreDNS), and other add-ons remain manual post-init steps, but patterns emerged for automation.
  • Certificate Rotation: Manual process in 2017; automatic rotation would arrive in later releases.

kubeadm vs kops: When to Choose Each

Capabilitykubeadmkops (AWS)
Infrastructure ScopeCloud-agnostic, works on-premisesAWS-specific (EC2, VPC, IAM)
Control PlaneSelf-managed on VMs/bare metalManaged via EC2 instances + ASGs
HA SetupManual LB + stacked/external etcdAutomated multi-AZ with ELB
Upgradeskubeadm upgrade (manual coordination)kops rolling-update (automated)
NetworkingManual CNI installationIntegrated add-on management
State ManagementLocal config filesS3-backed cluster state
CustomizationDeep (edit manifests, configs)Deep (cluster spec YAML)
Learning CurveLower (fewer abstractions)Higher (AWS concepts required)

Practical Considerations

HA Setup Complexity

Setting up HA with kubeadm in 2017 required:

  1. Provisioning 3+ master nodes (or external etcd cluster)
  2. Configuring an external load balancer
  3. Running kubeadm init on the first master with HA flags
  4. Joining additional masters with kubeadm join --control-plane
  5. Installing CNI and other add-ons manually

Teams found that while kubeadm simplified the control plane setup, the surrounding infrastructure (load balancers, networking) still demanded operational expertise.

Upgrade Reality

The kubeadm upgrade workflow in 2017 was functional but required:

  • Sequential master updates: Upgrade one master at a time, ensuring API server availability
  • Node coordination: Upgrade nodes after masters, respecting version skew
  • Add-on compatibility: Verify CNI, DNS, and other components support the target Kubernetes version
  • Rollback planning: No automated rollback; operators needed backup/restore procedures

On-Premises Fit

kubeadm’s cloud-agnostic nature made it attractive for:

  • Bare metal deployments: No cloud provider lock-in
  • VMware/vSphere environments: Works with any virtualization platform
  • Hybrid cloud: Consistent tooling across cloud and on-premises
  • Air-gapped environments: Could be adapted for offline deployments (though tooling was still emerging)
  1. Infrastructure Layer: Provision VMs/bare metal nodes with Docker/CRI-O, kubelet, and kubeadm installed.
  2. Load Balancer: Deploy HAProxy or nginx in front of API servers (or use cloud LB if available).
  3. kubeadm Init: Run kubeadm init on first master with HA configuration.
  4. Join Masters: Use kubeadm join --control-plane for additional masters.
  5. CNI Installation: Install Calico, Flannel, or Weave Net immediately after init.
  6. Add-Ons: Deploy CoreDNS, Dashboard, and monitoring stack via manifests.

Getting Started with HA kubeadm

# On first master node
kubeadm init \
  --control-plane-endpoint "LOAD_BALANCER_DNS:6443" \
  --upload-certs \
  --pod-network-cidr=10.244.0.0/16

# On additional master nodes
kubeadm join LOAD_BALANCER_DNS:6443 \
  --token <token> \
  --discovery-token-ca-cert-hash sha256:<hash> \
  --control-plane \
  --certificate-key <key>

# Install CNI (example: Calico)
kubectl apply -f https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/rbac-kdd.yaml
kubectl apply -f https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml

Caveats & Lessons Learned

  • Load Balancer Dependency: HA setups require a reliable load balancer; single points of failure here can cause cluster outages.
  • Certificate Management: kubeadm generates certificates with 1-year validity; plan for renewal before expiration.
  • Add-On Lifecycle: CNI and DNS add-ons aren’t managed by kubeadm; upgrades require separate coordination.
  • Stateful Workloads: During upgrades, ensure StatefulSets and persistent volumes are backed up; some CNIs require special handling.
  • Network Policies: If using NetworkPolicy, verify CNI compatibility before enforcing policies cluster-wide.

Common Failure Modes

  • “API server unavailable during upgrade”: Upgrading all masters simultaneously without proper load balancer configuration causes downtime.
  • “Certificate expiration”: Not tracking certificate validity leads to sudden cluster failures.
  • “CNI version mismatch”: Upgrading Kubernetes without verifying CNI compatibility breaks pod networking.

Conclusion

By mid-2017, kubeadm had matured enough to be a viable choice for production on-premises deployments. Its HA support and upgrade workflows addressed critical gaps, though operational complexity remained higher than managed services. Teams choosing kubeadm gained cloud-agnostic flexibility and deep customization at the cost of more day-2 operational overhead compared to kops or managed Kubernetes services.

The trade-off mirrored the broader industry shift: managed services (GKE, and soon EKS/AKS) would handle control plane operations, while kubeadm would become the foundation for distributions, on-premises platforms, and teams needing full control over their Kubernetes infrastructure.