Vertical Pod Autoscaler (VPA) Alpha: Right-Sizing Pod Resources

Vertical Pod Autoscaler (VPA) Alpha: Right-Sizing Pod Resources

Introduction

By late 2017, Horizontal Pod Autoscaler (HPA) had solved the “how many replicas?” question, and Cluster Autoscaler handled “how many nodes?” But there was still a gap: what if each pod was requesting too much or too little CPU and memory?

The Vertical Pod Autoscaler (VPA), introduced as alpha in Kubernetes 1.9 (December 2017), addressed this by automatically adjusting pod resource requests and limits based on observed usage patterns. Where HPA scales horizontally (more pods), VPA scales vertically (more resources per pod).

What made VPA compelling (even in alpha) was that it tackled a real operational pain: developers often over-provision resources “just to be safe,” leading to wasted capacity, or under-provision, causing OOM kills and throttling. VPA promised to learn from actual usage and set requests automatically.

Why this mattered in 2017

  • Resource waste was expensive: over-provisioned pods meant paying for unused capacity across hundreds of nodes.
  • OOM kills were common: Java apps with heap sizing, batch jobs with variable memory needs, and microservices with unpredictable spikes all suffered from static resource requests.
  • Manual tuning didn’t scale: as workloads multiplied, keeping resource requests accurate became a full-time job for platform teams.
  • HPA needed accurate requests: HPA makes scaling decisions based on resource requests; if requests are wrong, HPA decisions are wrong too.

VPA Modes

VPA operates in four distinct modes, each suited to different operational scenarios:

  • Off: VPA only provides recommendations without modifying pods. Useful for auditing and understanding what resource requests should be before making changes.
  • Initial: VPA sets resource requests only when pods are first created. Existing pods are not modified. Best for new deployments where you want automatic right-sizing from day one.
  • Auto: VPA automatically updates resource requests on existing pods by evicting and recreating them. Most aggressive mode, requires careful coordination with HPA.
  • Recreate: Similar to Auto, but explicitly requires pod recreation. Useful when you need predictable update timing.

How VPA Works

  1. Metrics Collection: VPA uses the metrics server (or Heapster in 2017) to collect historical CPU and memory usage for pods.
  2. Recommendation Engine: Analyzes usage patterns over a configurable window (default 8 days) to compute recommended requests and limits.
  3. Admission Controller: When a pod is created, the VPA admission controller intercepts the request and injects recommended resource values.
  4. Update Controller: In Auto or Recreate mode, periodically evicts pods to apply updated recommendations as usage patterns change.

VPA Resource Policy

VPA allows fine-grained control over which resources it can adjust:

apiVersion: autoscaling.k8s.io/v1alpha1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: Auto
  resourcePolicy:
    containerPolicies:
    - containerName: app
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2
        memory: 4Gi
      controlledResources: ["cpu", "memory"]

Relationship with HPA

VPA and HPA can conflict when used together on the same pods:

  • The conflict: HPA scales based on resource requests. If VPA changes those requests, HPA’s scaling decisions become unpredictable.
  • Coordination strategies:
    • Use VPA in Off mode to get recommendations, then manually set requests and let HPA handle scaling.
    • Use VPA for CPU/memory optimization and HPA for custom metrics (QPS, queue depth) that VPA doesn’t understand.
    • Run VPA and HPA on different container sets within the same pod (if your architecture allows).

Use Cases

  • Java applications: JVM heap sizing is notoriously hard to get right. VPA learns actual heap usage and sets memory requests accordingly.
  • Batch jobs: Variable memory needs based on input size make static requests impractical. VPA adapts to each job’s requirements.
  • Microservices with spiky traffic: Services that see occasional bursts benefit from VPA adjusting requests as patterns emerge.
  • New deployments: Use VPA in Initial mode to automatically right-size new services without manual tuning.

A practical rollout pattern

  1. Start with Off mode: Deploy VPA in recommendation-only mode to audit current resource requests and understand what changes it would make.
  2. Validate recommendations: Compare VPA suggestions against actual usage in monitoring tools. Look for patterns: are recommendations consistently higher/lower than current requests?
  3. Pilot with Initial mode: Enable VPA on a few non-critical deployments to see how it behaves with new pods.
  4. Gradually enable Auto: Only after validating that VPA recommendations align with your expectations, enable automatic updates on production workloads.
  5. Set resource policies: Use minAllowed and maxAllowed to prevent VPA from making extreme recommendations that could break workloads.
  1. Metrics Pipeline: Heapster (default in 2017) or metrics-server provides usage data to VPA.
  2. VPA Components: Deploy VPA recommender, updater, and admission controller as separate components for granular control.
  3. Resource Policies: Define VerticalPodAutoscaler resources with conservative minAllowed/maxAllowed bounds.
  4. Monitoring: Track VPA recommendations vs. actual usage to catch misconfigurations early.

Caveats & Tuning

  • Alpha limitations: VPA was alpha in 1.9—APIs could change, and production use required careful testing.
  • Pod eviction overhead: Auto mode evicts pods to apply updates, causing brief service interruptions. Not suitable for latency-sensitive workloads without careful scheduling.
  • HPA conflicts: Running VPA and HPA together requires careful coordination or separation of concerns (VPA for resources, HPA for replicas).
  • Stateful workloads: VPA eviction can disrupt StatefulSets; use Off or Initial mode, or ensure proper PDBs are configured.
  • Cold start patterns: VPA recommendations are based on historical data. New workloads or traffic pattern changes may lead to suboptimal initial recommendations.

Common failure modes (learned the hard way)

  • “VPA keeps evicting my pods”: Auto mode with aggressive update policies can cause constant pod churn. Increase updatePolicy.minReplicas or switch to Initial mode.
  • “OOM kills after VPA update”: VPA recommendations can be too low if historical data doesn’t capture peak usage. Set minAllowed memory bounds based on known minimum requirements.
  • “HPA scaling is broken”: VPA changing resource requests while HPA is scaling causes feedback loops. Separate VPA and HPA to different workloads or use VPA in Off mode.
  • “VPA recommendations seem wrong”: VPA needs sufficient historical data (default 8 days). New workloads or recently changed traffic patterns may produce inaccurate recommendations.

Conclusion

VPA alpha in Kubernetes 1.9 introduced a new dimension to autoscaling: not just “how many” or “how big a cluster,” but “how much should each pod request?” While alpha limitations and HPA coordination challenges made it experimental in 2017, VPA laid the groundwork for automatic resource right-sizing that would mature in later releases, eventually becoming a critical tool for cost optimization and reliability in production Kubernetes clusters.