kubeadm Production Readiness: HA Support and Upgrade Paths

Introduction

By mid-2017, kubeadm had evolved from its beta origins into a tool capable of bootstrapping production-grade clusters. With Kubernetes 1.7 and 1.8, kubeadm gained high availability (HA) control plane support and upgrade workflows that addressed the “single-master only” limitation that constrained early adopters.

This mattered because teams choosing between kubeadm and tools like kops now had a clearer decision framework: kubeadm offered a cloud-agnostic, infrastructure-light path that worked on-premises, while kops remained the AWS-optimized choice for teams already committed to that ecosystem.

kubeadm Improvements in 2017

High Availability Support

Stacked Control Plane: kubeadm 1.7+ supports multiple master nodes with etcd co-located on control plane nodes (stacked topology).
External etcd: For larger deployments, kubeadm can configure masters to use an external etcd cluster.
Load Balancer Integration: HA setups require an external load balancer (HAProxy, nginx, cloud LB) in front of API servers.
Certificate Management: kubeadm generates and distributes certificates across all master nodes automatically.

Upgrade Workflows

kubeadm upgrade Command: Introduced in Kubernetes 1.8, enabling in-place upgrades from one minor version to the next.
Planned Downtime: HA upgrades still require careful coordination; the upgrade process drains and updates masters sequentially.
Version Skew Policy: kubeadm enforces Kubernetes version skew rules (masters can be one minor version ahead of nodes).

Production Patterns

Configuration Files: kubeadm config API (v1alpha1) allows declarative cluster configuration, enabling GitOps-friendly setups.
Add-On Management: Networking (CNI), DNS (CoreDNS), and other add-ons remain manual post-init steps, but patterns emerged for automation.
Certificate Rotation: Manual process in 2017; automatic rotation would arrive in later releases.

kubeadm vs kops: When to Choose Each

Capability	kubeadm	kops (AWS)
Infrastructure Scope	Cloud-agnostic, works on-premises	AWS-specific (EC2, VPC, IAM)
Control Plane	Self-managed on VMs/bare metal	Managed via EC2 instances + ASGs
HA Setup	Manual LB + stacked/external etcd	Automated multi-AZ with ELB
Upgrades	`kubeadm upgrade` (manual coordination)	`kops rolling-update` (automated)
Networking	Manual CNI installation	Integrated add-on management
State Management	Local config files	S3-backed cluster state
Customization	Deep (edit manifests, configs)	Deep (cluster spec YAML)
Learning Curve	Lower (fewer abstractions)	Higher (AWS concepts required)

Practical Considerations

HA Setup Complexity

Setting up HA with kubeadm in 2017 required:

Provisioning 3+ master nodes (or external etcd cluster)
Configuring an external load balancer
Running kubeadm init on the first master with HA flags
Joining additional masters with kubeadm join --control-plane
Installing CNI and other add-ons manually

Teams found that while kubeadm simplified the control plane setup, the surrounding infrastructure (load balancers, networking) still demanded operational expertise.

Upgrade Reality

The kubeadm upgrade workflow in 2017 was functional but required:

Sequential master updates: Upgrade one master at a time, ensuring API server availability
Node coordination: Upgrade nodes after masters, respecting version skew
Add-on compatibility: Verify CNI, DNS, and other components support the target Kubernetes version
Rollback planning: No automated rollback; operators needed backup/restore procedures

On-Premises Fit

kubeadm’s cloud-agnostic nature made it attractive for:

Bare metal deployments: No cloud provider lock-in
VMware/vSphere environments: Works with any virtualization platform
Hybrid cloud: Consistent tooling across cloud and on-premises
Air-gapped environments: Could be adapted for offline deployments (though tooling was still emerging)

Recommended Architecture (2017)

Infrastructure Layer: Provision VMs/bare metal nodes with Docker/CRI-O, kubelet, and kubeadm installed.
Load Balancer: Deploy HAProxy or nginx in front of API servers (or use cloud LB if available).
kubeadm Init: Run kubeadm init on first master with HA configuration.
Join Masters: Use kubeadm join --control-plane for additional masters.
CNI Installation: Install Calico, Flannel, or Weave Net immediately after init.
Add-Ons: Deploy CoreDNS, Dashboard, and monitoring stack via manifests.

Getting Started with HA kubeadm

# On first master node
kubeadm init \
  --control-plane-endpoint "LOAD_BALANCER_DNS:6443" \
  --upload-certs \
  --pod-network-cidr=10.244.0.0/16

# On additional master nodes
kubeadm join LOAD_BALANCER_DNS:6443 \
  --token <token> \
  --discovery-token-ca-cert-hash sha256:<hash> \
  --control-plane \
  --certificate-key <key>

# Install CNI (example: Calico)
kubectl apply -f https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/rbac-kdd.yaml
kubectl apply -f https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml

Caveats & Lessons Learned

Load Balancer Dependency: HA setups require a reliable load balancer; single points of failure here can cause cluster outages.
Certificate Management: kubeadm generates certificates with 1-year validity; plan for renewal before expiration.
Add-On Lifecycle: CNI and DNS add-ons aren’t managed by kubeadm; upgrades require separate coordination.
Stateful Workloads: During upgrades, ensure StatefulSets and persistent volumes are backed up; some CNIs require special handling.
Network Policies: If using NetworkPolicy, verify CNI compatibility before enforcing policies cluster-wide.

Common Failure Modes

“API server unavailable during upgrade”: Upgrading all masters simultaneously without proper load balancer configuration causes downtime.
“Certificate expiration”: Not tracking certificate validity leads to sudden cluster failures.
“CNI version mismatch”: Upgrading Kubernetes without verifying CNI compatibility breaks pod networking.

Conclusion

By mid-2017, kubeadm had matured enough to be a viable choice for production on-premises deployments. Its HA support and upgrade workflows addressed critical gaps, though operational complexity remained higher than managed services. Teams choosing kubeadm gained cloud-agnostic flexibility and deep customization at the cost of more day-2 operational overhead compared to kops or managed Kubernetes services.

The trade-off mirrored the broader industry shift: managed services (GKE, and soon EKS/AKS) would handle control plane operations, while kubeadm would become the foundation for distributions, on-premises platforms, and teams needing full control over their Kubernetes infrastructure.

Table of Contents