Cluster API Managed Topologies: Declarative Cluster Lifecycle at Scale

Table of Contents
Introduction
By mid-2022, Cluster API managed topologies had become the recommended pattern for declarative cluster lifecycle management. Introduced in v1beta1, managed topologies enabled teams to manage cluster creation, upgrades, scaling, and configuration through simple spec updates, treating clusters as truly declarative resources.
This mattered because managed topologies solved the operational complexity of managing dozens or hundreds of clusters. Instead of imperative commands (kubectl scale, kubectl upgrade), teams could declare desired state and let Cluster API handle the orchestration. Combined with ClusterClass templates, managed topologies enabled consistent, repeatable cluster operations at scale.
Historical note: Managed topologies were introduced in Cluster API v1beta1 (2021) but became the production-recommended pattern in 2022 as teams adopted them for large-scale deployments. This post focuses on production patterns and best practices that emerged from real-world usage.
Managed Topologies Deep Dive
What Are Managed Topologies?
Managed topologies use the topology field in Cluster resources to enable declarative cluster lifecycle management. The topology field defines:
- Kubernetes Version: Desired Kubernetes version for the cluster.
- Control Plane Configuration: Control plane replicas, metadata, and patches.
- Worker Configuration: Machine deployment classes, replicas, and metadata.
- Upgrade Strategy: How upgrades should be performed.
- Scaling Strategy: How scaling should be handled.
Basic Managed Topology Example
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: production-cluster
spec:
clusterClassRef:
name: standard-cluster-class
topology:
version: v1.24.0
controlPlane:
replicas: 3
metadata:
labels:
environment: production
tier: control-plane
workers:
machineDeployments:
- class: default-worker
replicas: 10
name: worker-pool-1
metadata:
labels:
pool: default
tier: worker
ClusterClass Composition Patterns
Pattern 1: Environment-Specific ClusterClasses
Create separate ClusterClasses for different environments:
# Development ClusterClass
apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
name: dev-cluster-class
spec:
infrastructure:
ref:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSClusterTemplate
name: dev-aws-template
controlPlane:
ref:
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlaneTemplate
name: dev-controlplane-template
workers:
machineDeployments:
- class: default-worker
template:
spec:
infrastructure:
ref:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSMachineTemplate
name: dev-worker-template
---
# Production ClusterClass
apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
name: prod-cluster-class
spec:
infrastructure:
ref:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSClusterTemplate
name: prod-aws-template
controlPlane:
ref:
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlaneTemplate
name: prod-controlplane-template
workers:
machineDeployments:
- class: default-worker
template:
spec:
infrastructure:
ref:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSMachineTemplate
name: prod-worker-template
- class: gpu-worker
template:
spec:
infrastructure:
ref:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSMachineTemplate
name: prod-gpu-worker-template
Pattern 2: Composable ClusterClasses
Build ClusterClasses from reusable components:
# Base infrastructure template
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSClusterTemplate
metadata:
name: base-aws-template
spec:
template:
spec:
region: us-west-2
networkSpec:
vpc:
cidrBlock: "10.0.0.0/16"
---
# Base control plane template
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlaneTemplate
metadata:
name: base-controlplane-template
spec:
template:
spec:
machineTemplate:
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSMachineTemplate
name: controlplane-machine-template
kubeadmConfigSpec:
clusterConfiguration:
apiServer:
extraArgs:
audit-log-maxage: "30"
---
# Composed ClusterClass
apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
name: composed-cluster-class
spec:
infrastructure:
ref:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSClusterTemplate
name: base-aws-template
controlPlane:
ref:
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlaneTemplate
name: base-controlplane-template
Pattern 3: Multi-Region ClusterClasses
Create region-specific ClusterClasses:
# US West ClusterClass
apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
name: us-west-cluster-class
spec:
infrastructure:
ref:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSClusterTemplate
name: us-west-aws-template
---
# EU West ClusterClass
apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
name: eu-west-cluster-class
spec:
infrastructure:
ref:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSClusterTemplate
name: eu-west-aws-template
Topology-Based Upgrades
Declarative Upgrades
Upgrading clusters with managed topologies is a simple spec update:
# Upgrade cluster to v1.25.0
kubectl patch cluster production-cluster --type merge -p '{
"spec": {
"topology": {
"version": "v1.25.0"
}
}
}'
Upgrade Strategies
Rolling Upgrade (Default)
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: production-cluster
spec:
topology:
version: v1.25.0
controlPlane:
replicas: 3
# Rolling upgrade by default
Upgrade with Pause
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: production-cluster
spec:
topology:
version: v1.25.0
controlPlane:
replicas: 3
# Pause upgrade after control plane
upgradeAfter: "2022-06-20T10:00:00Z"
Upgrade Workflow
- Control Plane Upgrade: Cluster API upgrades control plane nodes first.
- Health Validation: Validates control plane health after upgrade.
- Worker Upgrade: Upgrades worker nodes after control plane.
- Final Validation: Validates cluster health after complete upgrade.
Monitoring Upgrades
# Watch upgrade progress
kubectl get cluster production-cluster -w
# Check control plane status
kubectl get kubeadmcontrolplane production-cluster-control-plane
# Check machine status
kubectl get machines -l cluster.x-k8s.io/cluster-name=production-cluster
Topology-Based Scaling
Scaling Control Plane
# Scale control plane from 3 to 5 replicas
kubectl patch cluster production-cluster --type merge -p '{
"spec": {
"topology": {
"controlPlane": {
"replicas": 5
}
}
}
}'
Scaling Worker Pools
# Scale worker pool
kubectl patch cluster production-cluster --type merge -p '{
"spec": {
"topology": {
"workers": {
"machineDeployments": [{
"class": "default-worker",
"replicas": 20,
"name": "worker-pool-1"
}]
}
}
}
}'
Adding New Worker Pools
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: production-cluster
spec:
topology:
workers:
machineDeployments:
- class: default-worker
replicas: 10
name: worker-pool-1
- class: gpu-worker
replicas: 5
name: gpu-pool-1 # New pool
Environment-Specific Cluster Templates
Development Environment
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: dev-cluster
spec:
clusterClassRef:
name: dev-cluster-class
topology:
version: v1.24.0
controlPlane:
replicas: 1 # Single control plane for dev
metadata:
labels:
environment: development
workers:
machineDeployments:
- class: default-worker
replicas: 2 # Minimal workers for dev
name: dev-workers
metadata:
labels:
environment: development
Staging Environment
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: staging-cluster
spec:
clusterClassRef:
name: staging-cluster-class
topology:
version: v1.24.0
controlPlane:
replicas: 3 # HA control plane for staging
metadata:
labels:
environment: staging
workers:
machineDeployments:
- class: default-worker
replicas: 5 # Moderate workers for staging
name: staging-workers
metadata:
labels:
environment: staging
Production Environment
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: prod-cluster
spec:
clusterClassRef:
name: prod-cluster-class
topology:
version: v1.24.0
controlPlane:
replicas: 3 # HA control plane for prod
metadata:
labels:
environment: production
workers:
machineDeployments:
- class: default-worker
replicas: 20 # Scale workers for prod
name: prod-workers
metadata:
labels:
environment: production
pool: default
- class: gpu-worker
replicas: 5 # GPU workers for ML workloads
name: gpu-workers
metadata:
labels:
environment: production
pool: gpu
GitOps Integration
ArgoCD Integration
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: production-clusters
spec:
source:
repoURL: https://github.com/org/cluster-definitions
path: clusters/production
targetRevision: main
destination:
server: https://management-cluster:6443
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
FluxCD Integration
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
name: production-clusters
spec:
sourceRef:
kind: GitRepository
name: cluster-definitions
path: ./clusters/production
interval: 5m
prune: true
wait: true
Git Repository Structure
cluster-definitions/
├── clusterclasses/
│ ├── dev-clusterclass.yaml
│ ├── staging-clusterclass.yaml
│ └── prod-clusterclass.yaml
├── clusters/
│ ├── development/
│ │ └── dev-cluster.yaml
│ ├── staging/
│ │ └── staging-cluster.yaml
│ └── production/
│ ├── prod-us-west.yaml
│ ├── prod-us-east.yaml
│ └── prod-eu-west.yaml
└── templates/
├── aws-templates/
└── controlplane-templates/
Large-Scale Fleet Management
Managing 100+ Clusters
With managed topologies, managing large fleets becomes manageable:
# Regional production clusters
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: prod-us-west-2
spec:
clusterClassRef:
name: prod-cluster-class
topology:
version: v1.24.0
controlPlane:
replicas: 3
workers:
machineDeployments:
- class: default-worker
replicas: 20
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: prod-us-east-1
spec:
clusterClassRef:
name: prod-cluster-class
topology:
version: v1.24.0
controlPlane:
replicas: 3
workers:
machineDeployments:
- class: default-worker
replicas: 20
---
# ... 98 more clusters with same pattern
Bulk Operations
# Upgrade all production clusters
for cluster in $(kubectl get clusters -l environment=production -o name); do
kubectl patch $cluster --type merge -p '{
"spec": {
"topology": {
"version": "v1.25.0"
}
}
}'
done
# Scale all worker pools
for cluster in $(kubectl get clusters -l environment=production -o name); do
kubectl patch $cluster --type merge -p '{
"spec": {
"topology": {
"workers": {
"machineDeployments": [{
"class": "default-worker",
"replicas": 25
}]
}
}
}
}'
done
Best Practices
ClusterClass Design
- Reusability: Design ClusterClasses for multiple use cases.
- Composition: Build ClusterClasses from reusable templates.
- Versioning: Version ClusterClasses for controlled updates.
- Documentation: Document ClusterClass purpose and usage.
Managed Topology Usage
- Version Pinning: Pin Kubernetes versions initially.
- Gradual Upgrades: Upgrade clusters incrementally.
- Testing: Test topology changes in non-production first.
- Monitoring: Monitor cluster health during changes.
Fleet Management
- Standardization: Use ClusterClasses to enforce standards.
- Automation: Automate cluster operations through GitOps.
- Monitoring: Monitor fleet health and configuration drift.
- Documentation: Document cluster definitions and patterns.
Practical Considerations
Management Cluster Requirements
- High Availability: HA management cluster for production fleets.
- Resources: Adequate resources for Cluster API controllers.
- Network Access: Access to cloud provider APIs and workload clusters.
- Backup: Regular backups of management cluster state.
Cluster Lifecycle Management
- Creation: Use managed topologies for consistent cluster creation.
- Upgrades: Leverage topology-based upgrades for controlled rollouts.
- Scaling: Use topology-based scaling for capacity management.
- Deletion: Plan for cluster decommissioning and resource cleanup.
Observability
- Cluster Health: Monitor cluster status and conditions.
- Upgrade Progress: Track upgrade progress and failures.
- Configuration Drift: Detect and alert on configuration drift.
- Resource Usage: Monitor resource usage across clusters.
Caveats & Lessons Learned
Common Pitfalls
- ClusterClass Not Found: Ensure ClusterClass exists before creating clusters.
- Topology Validation: Validate topology settings before applying.
- Upgrade Failures: Monitor upgrade progress and have rollback plans.
- Scaling Limits: Be aware of cloud provider scaling limits.
Best Practices Learned
- Start Simple: Begin with basic ClusterClasses and expand.
- Test Thoroughly: Test topology changes in non-production.
- Monitor Closely: Monitor cluster health during operations.
- Document Patterns: Document successful patterns for reuse.
Conclusion
Cluster API managed topologies in 2022 became the production-recommended pattern for declarative cluster lifecycle management. The combination of ClusterClass templates and managed topologies enabled teams to manage large fleets of clusters with consistency, repeatability, and automation.
The declarative model of managed topologies—where cluster lifecycle is managed through simple spec updates—represented a fundamental shift from imperative cluster management. Teams could now treat clusters like any other Kubernetes resource: declaratively, with version control, and through GitOps workflows.
For organizations managing dozens or hundreds of clusters, managed topologies provided the foundation for scalable, consistent, and maintainable cluster operations. The patterns and practices that emerged in 2022—ClusterClass composition, topology-based upgrades, and GitOps integration—would become standard approaches as Cluster API continued to mature.
Managed topologies weren’t just a feature; they were the operational model that made Cluster API viable for enterprise-scale deployments. By mid-2022, managed topologies had proven that declarative cluster management at scale was not just possible, but practical and powerful.