AKS Autoscaling

Autoscaling on AKS involves scaling at multiple levels: pods (HPA/VPA), nodes (Cluster Autoscaler), and serverless scaling (virtual nodes). AKS provides several autoscaling solutions that work together to ensure your applications have the right resources at the right time while optimizing costs.

Autoscaling Overview

AKS autoscaling operates at different levels:

graph TB subgraph pod_level[Pod Level] HPA[Horizontal Pod Autoscaler] --> SCALE_PODS[Scale Pod Replicas] VPA[Vertical Pod Autoscaler] --> ADJUST_RESOURCES[Adjust Pod Resources] end subgraph node_level[Node Level] CA[Cluster Autoscaler] --> SCALE_NODES[Scale Node Pools] VN[Virtual Nodes] --> SERVERLESS[Serverless Scaling] end subgraph cluster_level[Cluster Level] SCALE_PODS --> TRIGGER_NODES[Trigger Node Scaling] ADJUST_RESOURCES --> OPTIMIZE[Optimize Resource Usage] SCALE_NODES --> ADD_NODES[Add/Remove Nodes] SERVERLESS --> ACI[Azure Container Instances] end style HPA fill:#e1f5ff style CA fill:#fff4e1 style VN fill:#e8f5e9

Cluster Autoscaler

Cluster Autoscaler automatically adjusts the size of node pools based on pod scheduling demands. When pods can’t be scheduled due to insufficient resources, it adds nodes. When nodes are underutilized, it removes them.

How Cluster Autoscaler Works

graph LR A[Pod Pending] --> B{Resources Available?} B -->|No| C[Cluster Autoscaler Detects] C --> D[Increase Node Pool Node Count] D --> E[New Nodes Created] E --> F[Pods Scheduled] G[Node Underutilized] --> H{Can Pods Move?} H -->|Yes| I[Cluster Autoscaler Detects] I --> J[Drain Node] J --> K[Decrease Node Count] K --> L[Node Terminated] style A fill:#e1f5ff style E fill:#fff4e1 style L fill:#e8f5e9

Enabling Cluster Autoscaler

Using Azure CLI:

# Enable auto-scaling on node pool
az aks nodepool update \
  --resource-group myResourceGroup \
  --cluster-name myAKSCluster \
  --name general-pool \
  --enable-cluster-autoscaler \
  --min-count 1 \
  --max-count 10

When Creating Node Pool:

# Create node pool with auto-scaling
az aks nodepool add \
  --resource-group myResourceGroup \
  --cluster-name myAKSCluster \
  --name general-pool \
  --enable-cluster-autoscaler \
  --min-count 1 \
  --max-count 10 \
  --node-count 3

Cluster Autoscaler Configuration

Scaling Parameters:

min-count - Minimum number of nodes in pool
max-count - Maximum number of nodes in pool
node-count - Initial number of nodes (optional)

Scaling Behavior:

Scales up when pods can’t be scheduled
Scales down when nodes are underutilized
Respects min/max node limits
Uses conservative scale-down to prevent thrashing

Horizontal Pod Autoscaler (HPA)

HPA automatically scales the number of pod replicas based on observed metrics like CPU, memory, or custom metrics.

Basic HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 15
      selectPolicy: Max

HPA Behavior:

Scale Down - Conservative scaling down to prevent thrashing
Scale Up - Aggressive scaling up to handle traffic spikes
Stabilization Window - Time to wait before scaling

Custom Metrics HPA

Scale based on custom application metrics:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa-custom
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "100"

Requires:

Metrics Server (for resource metrics)
Prometheus Adapter (for custom metrics)
External Metrics API

Vertical Pod Autoscaler (VPA)

VPA automatically adjusts CPU and memory requests and limits for pods based on historical usage.

Installation

# Install VPA
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler/

# Install VPA
./hack/vpa-up.sh

VPA Configuration

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web
  updatePolicy:
    updateMode: "Auto"  # Auto, Off, Initial, Recreate
  resourcePolicy:
    containerPolicies:
    - containerName: app
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2
        memory: 4Gi
      controlledResources: ["cpu", "memory"]

Update Modes:

Auto - Automatically update pod resources (requires pod restart)
Off - Only provide recommendations
Initial - Set resources on pod creation only
Recreate - Recreate pods with new resources

Note: VPA and HPA should not be used together for the same resource (CPU/memory). Use VPA for resource optimization, HPA for replica scaling.

Virtual Nodes

Virtual nodes provide serverless scaling using Azure Container Instances (ACI) without managing node pools.

Architecture

graph TB POD[Pod] --> VN[Virtual Node] VN --> ACI[Azure Container Instances] ACI --> SCALE[Auto-Scale] AKS[AKS Cluster] --> VN style POD fill:#e1f5ff style VN fill:#fff4e1 style ACI fill:#e8f5e9

Enabling Virtual Nodes

# Install virtual nodes add-on
az aks enable-addons \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --addons virtual-node \
  --subnet-name myVirtualNodeSubnet

Using Virtual Nodes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: serverless-app
spec:
  replicas: 10
  selector:
    matchLabels:
      app: serverless-app
  template:
    metadata:
      labels:
        app: serverless-app
    spec:
      nodeSelector:
        kubernetes.azure.com/aci: "true"
      containers:
      - name: app
        image: my-app:latest

Virtual Node Features:

Serverless container scaling
Pay-per-second billing
Rapid scaling without node provisioning
No node pool management
Automatic scaling

Spot VM Integration

Cluster Autoscaler with Spot VMs

Configure node pools for Spot VMs:

# Create node pool with Spot VMs
az aks nodepool add \
  --resource-group myResourceGroup \
  --cluster-name myAKSCluster \
  --name spot-pool \
  --priority Spot \
  --eviction-policy Delete \
  --enable-cluster-autoscaler \
  --min-count 0 \
  --max-count 20 \
  --node-count 3 \
  --node-taints kubernetes.azure.com/scalesetpriority=spot:NoSchedule \
  --node-labels kubernetes.azure.com/scalesetpriority=spot

Pod Tolerations:

apiVersion: v1
kind: Pod
metadata:
  name: spot-workload
spec:
  tolerations:
  - key: kubernetes.azure.com/scalesetpriority
    operator: Equal
    value: "spot"
    effect: NoSchedule
  containers:
  - name: app
    image: my-app:latest

Cost Optimization Strategies

Right-Sizing

Use VPA to right-size pod resources:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: optimize-resources
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto"

Spot VMs

Use Spot VMs for cost savings:

Up to 90% cost savings
Automatic interruption handling
Auto-scaling integration
Fallback to on-demand

Virtual Nodes

Use virtual nodes for variable workloads:

Pay-per-second billing
No node infrastructure costs
Rapid scaling
Perfect for burst workloads

Scheduled Scaling

Scale down during off-hours:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: scale-down
spec:
  schedule: "0 20 * * *"  # 8 PM daily
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: kubectl
            image: bitnami/kubectl:latest
            command:
            - kubectl
            - scale
            - deployment
            - web
            --replicas=1

Scaling Best Practices

Set Appropriate Limits - Configure min/max replicas and node limits
Use Multiple Metrics - Combine CPU, memory, and custom metrics
Configure Behavior - Tune scale-up and scale-down policies
Monitor Scaling - Watch HPA and Cluster Autoscaler behavior
Test Scaling - Verify autoscaling works before production
Use Spot VMs - For cost optimization where appropriate
Right-Size Resources - Use VPA to optimize resource requests
Plan for Spikes - Configure aggressive scale-up for traffic spikes
Prevent Thrashing - Use stabilization windows and conservative scale-down
Combine Solutions - Use HPA for pods, Cluster Autoscaler for nodes, virtual nodes for serverless

Common Issues

HPA Not Scaling

Problem: HPA not scaling pods

Solutions:

Verify Metrics Server is running
Check HPA target metrics
Verify resource requests are set
Check HPA status and events

Cluster Autoscaler Not Scaling Nodes

Problem: Nodes not being added

Solutions:

Verify auto-scaling is enabled
Check min/max node limits
Verify pods are unschedulable
Check subscription quotas
Review Azure Activity Log

Virtual Nodes Not Working

Problem: Pods not scheduling on virtual nodes

Solutions:

Verify virtual nodes add-on is enabled
Check node selector matches virtual node label
Verify subnet configuration
Check ACI provider permissions
Review Azure Activity Log

AKS Autoscaling

Autoscaling Overview

Cluster Autoscaler

How Cluster Autoscaler Works

Enabling Cluster Autoscaler

Cluster Autoscaler Configuration

Horizontal Pod Autoscaler (HPA)

Basic HPA

Custom Metrics HPA

Vertical Pod Autoscaler (VPA)

Installation

VPA Configuration

Virtual Nodes

Architecture

Enabling Virtual Nodes

Using Virtual Nodes

Spot VM Integration

Cluster Autoscaler with Spot VMs

Cost Optimization Strategies

Right-Sizing

Spot VMs

Virtual Nodes

Scheduled Scaling

Scaling Best Practices

Common Issues

HPA Not Scaling

Cluster Autoscaler Not Scaling Nodes

Virtual Nodes Not Working

See Also