Metrics Server

Metrics Server is a cluster-wide aggregator of resource usage data. It collects CPU and memory metrics from each node’s kubelet and makes them available via the Kubernetes metrics API. Metrics Server is essential for features like Horizontal Pod Autoscaler (HPA) and kubectl top commands.

What is Metrics Server?

Metrics Server is a lightweight, in-memory metrics collector that:

Scrapes metrics from kubelets on each node
Aggregates CPU and memory usage for nodes and pods
Exposes data via the Kubernetes metrics API
Provides data for HPA and Vertical Pod Autoscaler (VPA)

graph TB A[Metrics Server] --> B[Metrics API] B --> C[HPA] B --> D[kubectl top] B --> E[VPA] F[Node 1 Kubelet] --> A G[Node 2 Kubelet] --> A H[Node 3 Kubelet] --> A F --> F1[CPU Usage] F --> F2[Memory Usage] G --> G1[CPU Usage] G --> G2[Memory Usage] H --> H1[CPU Usage] H --> H2[Memory Usage] style A fill:#e1f5ff style B fill:#e8f5e9 style C fill:#fff4e1 style D fill:#f3e5f5 style E fill:#ffe1e1

Why Metrics Server is Essential

Metrics Server enables several critical Kubernetes features:

1. Horizontal Pod Autoscaler (HPA)

HPA uses Metrics Server data to automatically scale pods based on CPU or memory usage:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Without Metrics Server, HPA cannot automatically scale based on resource metrics.

2. kubectl top Commands

Metrics Server enables the kubectl top commands for viewing resource usage:

# View node resource usage
kubectl top nodes

# View pod resource usage
kubectl top pods

# View pod resource usage in specific namespace
kubectl top pods -n production

3. Kubernetes Dashboard

The Kubernetes Dashboard uses Metrics Server to display resource usage graphs and charts.

Architecture

Metrics Server collects metrics using the Summary API provided by each kubelet:

sequenceDiagram participant MS as Metrics Server participant K1 as Kubelet 1 participant K2 as Kubelet 2 participant API as Metrics API MS->>K1: Request Summary API K1->>MS: CPU/Memory metrics MS->>K2: Request Summary API K2->>MS: CPU/Memory metrics MS->>MS: Aggregate metrics MS->>API: Store metrics API->>HPA: Provide metrics API->>kubectl: Provide metrics

Data Collection Flow

Metrics Server scrapes each kubelet’s Summary API endpoint
Kubelets collect metrics from cAdvisor (container metrics) and the OS (node metrics)
Metrics Server aggregates and stores metrics in memory
Clients (HPA, kubectl) query the Metrics API
Metrics expire after a configurable time (default: 15 minutes)

Metrics Collected

Metrics Server collects:

Node metrics: CPU and memory usage for each node
Pod metrics: CPU and memory usage for each pod
Container metrics: CPU and memory usage for each container

Installation

Metrics Server is typically installed as a cluster add-on. The installation method depends on your cluster setup:

Standard Kubernetes Cluster

# Apply the Metrics Server manifest
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Cloud Provider Clusters

Cloud-managed clusters (EKS, GKE, AKS) may have Metrics Server pre-installed or require provider-specific installation methods.

Verification

After installation, verify Metrics Server is running:

# Check if Metrics Server pod is running
kubectl get pods -n kube-system | grep metrics-server

# Test kubectl top command
kubectl top nodes

# Check Metrics Server API
kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes

Configuration

Metrics Server can be configured via command-line flags or ConfigMap:

Common Configuration Options

apiVersion: apps/v1
kind: Deployment
metadata:
  name: metrics-server
  namespace: kube-system
spec:
  template:
    spec:
      containers:
      - name: metrics-server
        image: registry.k8s.io/metrics-server/metrics-server:v0.6.3
        args:
        - --kubelet-insecure-tls  # Only for testing, use proper certs in production
        - --metric-resolution=15s  # Scrape interval (default: 60s)
        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname

Important Flags

--kubelet-insecure-tls: Skip TLS verification (NOT recommended for production)
--metric-resolution: How often to scrape metrics (default: 60s)
--kubelet-preferred-address-types: Preferred kubelet address types
--kubelet-port: Port to use for kubelet connections (default: 10250)

TLS Configuration

For production clusters, configure proper TLS:

args:
- --kubelet-certificate-authority=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
- --kubelet-client-certificate=/etc/ssl/metrics-server/tls.crt
- --kubelet-client-key=/etc/ssl/metrics-server/tls.key

Relationship with HPA

Metrics Server is a dependency for HPA when using resource-based metrics:

graph TB A[HPA Controller] --> B{Need Metrics?} B -->|Yes| C[Metrics API] C --> D[Metrics Server] D --> E[Kubelets] E --> F[Container Metrics] C --> G[Resource Metrics] C --> H[Custom Metrics] G --> I[CPU Usage] G --> J[Memory Usage] style A fill:#e1f5ff style D fill:#e8f5e9 style C fill:#fff4e1

Viewing Metrics

kubectl top nodes

View resource usage for all nodes:

kubectl top nodes

Example output:

NAME           CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
worker-node1   245m         12%    2.1Gi           54%
worker-node2   189m         9%     1.8Gi           47%

kubectl top pods

View resource usage for pods:

# All namespaces
kubectl top pods

# Specific namespace
kubectl top pods -n production

# With labels
kubectl top pods -l app=my-app

# Include node information
kubectl top pods --sort-by=memory

Example output:

NAME                          CPU(cores)   MEMORY(bytes)
my-app-7d8f9b5c4-x2k9j        50m          128Mi
my-app-7d8f9b5c4-z7m3n        45m          120Mi

Using Metrics API Directly

Query the Metrics API programmatically:

# Get node metrics
kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes

# Get pod metrics
kubectl get --raw /apis/metrics.k8s.io/v1beta1/pods

# Get pod metrics in namespace
kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespaces/production/pods

Troubleshooting

Metrics Server Not Running

# Check pod status
kubectl get pods -n kube-system -l k8s-app=metrics-server

# Check pod logs
kubectl logs -n kube-system -l k8s-app=metrics-server

# Check events
kubectl get events -n kube-system --sort-by='.lastTimestamp'

kubectl top Returns No Metrics

Common causes:

Metrics Server not installed or not running
Metrics Server can’t reach kubelets (network/firewall issues)
TLS certificate issues
Metrics Server still collecting initial data (wait a few minutes)

TLS Certificate Errors

If you see certificate errors:

# Check Metrics Server logs
kubectl logs -n kube-system -l k8s-app=metrics-server

# Common fix: Add --kubelet-insecure-tls flag (development only)
# Or configure proper certificates for production

Metrics Not Updating

Metrics Server has a default scrape interval of 60 seconds. Metrics may take time to appear or update:

# Check Metrics Server configuration
kubectl get deployment metrics-server -n kube-system -o yaml

# Verify scrape interval
# Look for --metric-resolution flag (default: 60s)

HPA Not Scaling

If HPA isn’t working:

# Check HPA status
kubectl describe hpa <hpa-name>

# Verify Metrics API is available
kubectl get --raw /apis/metrics.k8s.io/v1beta1

# Check if Metrics Server can provide metrics
kubectl top pods

Best Practices

Install Metrics Server early - Install it before setting up HPA or other features that depend on it
Configure proper TLS - Never use --kubelet-insecure-tls in production
Set appropriate scrape interval - Balance between accuracy and resource usage (default 60s is usually fine)
Monitor Metrics Server - Ensure the Metrics Server pod is healthy and running
Resource limits - Set resource requests and limits for Metrics Server
High availability - Run multiple Metrics Server replicas for production clusters
Regular updates - Keep Metrics Server updated to the latest version

Limitations

In-memory only - Metrics are not persisted; they expire after 15 minutes
Limited metrics - Only CPU and memory; custom metrics require additional components
Resource overhead - Adds some CPU/memory usage to the cluster
Not a monitoring solution - For long-term metrics storage, use Prometheus or similar tools