AKS Autoscaling
Autoscaling on AKS involves scaling at multiple levels: pods (HPA/VPA), nodes (Cluster Autoscaler), and serverless scaling (virtual nodes). AKS provides several autoscaling solutions that work together to ensure your applications have the right resources at the right time while optimizing costs.
Autoscaling Overview
AKS autoscaling operates at different levels:
Cluster Autoscaler
Cluster Autoscaler automatically adjusts the size of node pools based on pod scheduling demands. When pods can’t be scheduled due to insufficient resources, it adds nodes. When nodes are underutilized, it removes them.
How Cluster Autoscaler Works
Enabling Cluster Autoscaler
Using Azure CLI:
# Enable auto-scaling on node pool
az aks nodepool update \
--resource-group myResourceGroup \
--cluster-name myAKSCluster \
--name general-pool \
--enable-cluster-autoscaler \
--min-count 1 \
--max-count 10
When Creating Node Pool:
# Create node pool with auto-scaling
az aks nodepool add \
--resource-group myResourceGroup \
--cluster-name myAKSCluster \
--name general-pool \
--enable-cluster-autoscaler \
--min-count 1 \
--max-count 10 \
--node-count 3
Cluster Autoscaler Configuration
Scaling Parameters:
min-count- Minimum number of nodes in poolmax-count- Maximum number of nodes in poolnode-count- Initial number of nodes (optional)
Scaling Behavior:
- Scales up when pods can’t be scheduled
- Scales down when nodes are underutilized
- Respects min/max node limits
- Uses conservative scale-down to prevent thrashing
Horizontal Pod Autoscaler (HPA)
HPA automatically scales the number of pod replicas based on observed metrics like CPU, memory, or custom metrics.
Basic HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15
selectPolicy: Max
HPA Behavior:
- Scale Down - Conservative scaling down to prevent thrashing
- Scale Up - Aggressive scaling up to handle traffic spikes
- Stabilization Window - Time to wait before scaling
Custom Metrics HPA
Scale based on custom application metrics:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-hpa-custom
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web
minReplicas: 2
maxReplicas: 20
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"
Requires:
- Metrics Server (for resource metrics)
- Prometheus Adapter (for custom metrics)
- External Metrics API
Vertical Pod Autoscaler (VPA)
VPA automatically adjusts CPU and memory requests and limits for pods based on historical usage.
Installation
# Install VPA
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler/
# Install VPA
./hack/vpa-up.sh
VPA Configuration
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: web-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: web
updatePolicy:
updateMode: "Auto" # Auto, Off, Initial, Recreate
resourcePolicy:
containerPolicies:
- containerName: app
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 2
memory: 4Gi
controlledResources: ["cpu", "memory"]
Update Modes:
- Auto - Automatically update pod resources (requires pod restart)
- Off - Only provide recommendations
- Initial - Set resources on pod creation only
- Recreate - Recreate pods with new resources
Note: VPA and HPA should not be used together for the same resource (CPU/memory). Use VPA for resource optimization, HPA for replica scaling.
Virtual Nodes
Virtual nodes provide serverless scaling using Azure Container Instances (ACI) without managing node pools.
Architecture
Enabling Virtual Nodes
# Install virtual nodes add-on
az aks enable-addons \
--resource-group myResourceGroup \
--name myAKSCluster \
--addons virtual-node \
--subnet-name myVirtualNodeSubnet
Using Virtual Nodes
apiVersion: apps/v1
kind: Deployment
metadata:
name: serverless-app
spec:
replicas: 10
selector:
matchLabels:
app: serverless-app
template:
metadata:
labels:
app: serverless-app
spec:
nodeSelector:
kubernetes.azure.com/aci: "true"
containers:
- name: app
image: my-app:latest
Virtual Node Features:
- Serverless container scaling
- Pay-per-second billing
- Rapid scaling without node provisioning
- No node pool management
- Automatic scaling
Spot VM Integration
Cluster Autoscaler with Spot VMs
Configure node pools for Spot VMs:
# Create node pool with Spot VMs
az aks nodepool add \
--resource-group myResourceGroup \
--cluster-name myAKSCluster \
--name spot-pool \
--priority Spot \
--eviction-policy Delete \
--enable-cluster-autoscaler \
--min-count 0 \
--max-count 20 \
--node-count 3 \
--node-taints kubernetes.azure.com/scalesetpriority=spot:NoSchedule \
--node-labels kubernetes.azure.com/scalesetpriority=spot
Pod Tolerations:
apiVersion: v1
kind: Pod
metadata:
name: spot-workload
spec:
tolerations:
- key: kubernetes.azure.com/scalesetpriority
operator: Equal
value: "spot"
effect: NoSchedule
containers:
- name: app
image: my-app:latest
Cost Optimization Strategies
Right-Sizing
Use VPA to right-size pod resources:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: optimize-resources
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Auto"
Spot VMs
Use Spot VMs for cost savings:
- Up to 90% cost savings
- Automatic interruption handling
- Auto-scaling integration
- Fallback to on-demand
Virtual Nodes
Use virtual nodes for variable workloads:
- Pay-per-second billing
- No node infrastructure costs
- Rapid scaling
- Perfect for burst workloads
Scheduled Scaling
Scale down during off-hours:
apiVersion: batch/v1
kind: CronJob
metadata:
name: scale-down
spec:
schedule: "0 20 * * *" # 8 PM daily
jobTemplate:
spec:
template:
spec:
containers:
- name: kubectl
image: bitnami/kubectl:latest
command:
- kubectl
- scale
- deployment
- web
--replicas=1
Scaling Best Practices
Set Appropriate Limits - Configure min/max replicas and node limits
Use Multiple Metrics - Combine CPU, memory, and custom metrics
Configure Behavior - Tune scale-up and scale-down policies
Monitor Scaling - Watch HPA and Cluster Autoscaler behavior
Test Scaling - Verify autoscaling works before production
Use Spot VMs - For cost optimization where appropriate
Right-Size Resources - Use VPA to optimize resource requests
Plan for Spikes - Configure aggressive scale-up for traffic spikes
Prevent Thrashing - Use stabilization windows and conservative scale-down
Combine Solutions - Use HPA for pods, Cluster Autoscaler for nodes, virtual nodes for serverless
Common Issues
HPA Not Scaling
Problem: HPA not scaling pods
Solutions:
- Verify Metrics Server is running
- Check HPA target metrics
- Verify resource requests are set
- Check HPA status and events
Cluster Autoscaler Not Scaling Nodes
Problem: Nodes not being added
Solutions:
- Verify auto-scaling is enabled
- Check min/max node limits
- Verify pods are unschedulable
- Check subscription quotas
- Review Azure Activity Log
Virtual Nodes Not Working
Problem: Pods not scheduling on virtual nodes
Solutions:
- Verify virtual nodes add-on is enabled
- Check node selector matches virtual node label
- Verify subnet configuration
- Check ACI provider permissions
- Review Azure Activity Log
See Also
- Node Management - Node pool configuration
- Observability - Monitoring autoscaling
- Troubleshooting - Autoscaling issues