Kubelet & CRI

Kubelet is the primary node agent that runs on each node, managing pod lifecycles and communicating with the container runtime. The Container Runtime Interface (CRI) provides the abstraction layer between kubelet and container runtimes. This guide covers troubleshooting kubelet and CRI issues.

Kubelet Overview

Kubelet is responsible for:

  • Managing pod lifecycle
  • Pulling container images
  • Starting and stopping containers
  • Reporting node and pod status
  • Executing liveness and readiness probes
  • Managing volumes
graph TB A[Kubelet] --> B[API Server] A --> C[Container Runtime] A --> D[Pod Lifecycle] B --> B1[Watch Pods] B --> B2[Report Status] C --> C1[Pull Images] C --> C2[Create Containers] C --> C3[Run Containers] D --> D1[Start Pods] D --> D2[Stop Pods] D --> D3[Restart Containers] style A fill:#e1f5ff style B fill:#e8f5e9 style C fill:#fff4e1 style D fill:#f3e5f5

Common Kubelet Issues

Node Not Registered

Symptoms:

  • Node doesn’t appear in kubectl get nodes
  • Node status unknown
  • API server cannot communicate with kubelet

Diagnosis:

# Check if node is registered
kubectl get nodes

# Check kubelet status (on node)
systemctl status kubelet

# Check kubelet logs
journalctl -u kubelet -n 100

# Check kubelet configuration
cat /var/lib/kubelet/config.yaml

Common Causes:

  • Kubelet not running
  • Certificate/authentication issues
  • API server connectivity problems
  • Configuration errors

Solutions:

  • Start kubelet: systemctl start kubelet
  • Check certificates: Verify kubelet certificates
  • Verify API server connectivity
  • Check kubelet configuration

Image Pull Failures

Symptoms:

  • Pods stuck in ImagePullBackOff
  • Pod events show pull errors
  • Containers fail to start

Diagnosis:

# Check pod events
kubectl describe pod <pod-name>

# Check kubelet logs for pull errors
journalctl -u kubelet | grep -i "pull\|image"

# Check image pull secrets
kubectl get pod <pod-name> -o jsonpath='{.spec.imagePullSecrets}'

# Test image pull manually
crictl pull <image-name>

Common Causes:

  • Image doesn’t exist
  • Wrong image name or tag
  • Private registry without credentials
  • Network connectivity issues
  • Registry authentication failures

Solutions:

  • Verify image name and tag
  • Check image pull secrets
  • Verify network connectivity to registry
  • Test image pull manually
  • Check registry authentication

Pod Startup Failures

Symptoms:

  • Pods stuck in Pending or ContainerCreating
  • Containers fail to start
  • Pod events show startup errors

Diagnosis:

# Check pod status
kubectl get pod <pod-name>

# Check pod events
kubectl describe pod <pod-name>

# Check kubelet logs
journalctl -u kubelet | grep <pod-name>

# Check container runtime logs
journalctl -u containerd | grep <pod-name>

Common Causes:

  • Image pull failures
  • Volume mount issues
  • Resource constraints
  • Configuration errors
  • Container runtime issues

Solutions:

  • Check image pull status
  • Verify volume mounts
  • Check resource limits
  • Review pod configuration
  • Check container runtime status

CRI Issues

Symptoms:

  • Containers not starting
  • CRI errors in kubelet logs
  • Container runtime unresponsive

Diagnosis:

# Check container runtime status
systemctl status containerd
# or
systemctl status docker

# Check CRI socket
ls -la /var/run/containerd/containerd.sock
# or
ls -la /var/run/docker.sock

# Test CRI connectivity
crictl version

# Check kubelet logs for CRI errors
journalctl -u kubelet | grep -i "cri\|runtime"

Common Causes:

  • Container runtime not running
  • CRI socket not accessible
  • Version incompatibility
  • Resource exhaustion

Solutions:

  • Start container runtime
  • Verify CRI socket permissions
  • Check version compatibility
  • Restart container runtime

Kubelet Logs

Accessing Kubelet Logs

On the Node

# View kubelet logs (systemd)
journalctl -u kubelet -n 100

# Follow kubelet logs
journalctl -u kubelet -f

# View logs since specific time
journalctl -u kubelet --since "1 hour ago"

# View logs with timestamps
journalctl -u kubelet --since "1 hour ago" --no-pager

From Kubernetes API

If kubelet exposes logs via API (depends on configuration):

# Get kubelet logs (if available)
kubectl get --raw /api/v1/nodes/<node-name>/proxy/logs/kubelet

Log File Location

# Kubelet log file (if configured)
tail -f /var/log/kubelet.log

# Systemd journal
journalctl -u kubelet

Kubelet Configuration

Configuration File

# Check kubelet configuration
cat /var/lib/kubelet/config.yaml

# Check kubelet command-line arguments
ps aux | grep kubelet

# Check kubelet config file
cat /etc/kubernetes/kubelet.conf

Important Configuration Options

  • --api-servers - API server endpoints
  • --pod-manifest-path - Static pod manifest path
  • --config - Kubelet config file path
  • --container-runtime-endpoint - CRI endpoint
  • --image-pull-progress-deadline - Image pull timeout
  • --node-ip - Node IP address
  • --pod-infra-container-image - Pod infrastructure image

Container Runtime Interface (CRI)

CRI Overview

CRI provides abstraction between kubelet and container runtimes:

graph TB A[Kubelet] --> B[CRI Interface] B --> C[containerd] B --> D[CRI-O] B --> E[Docker] C --> C1[Containers] D --> D1[Containers] E --> E1[Containers] style A fill:#e1f5ff style B fill:#e8f5e9 style C fill:#fff4e1 style D fill:#f3e5f5 style E fill:#ffe1e1

Supported Runtimes

  • containerd - Default in many distributions
  • CRI-O - Lightweight CRI implementation
  • Docker - Via dockershim (deprecated)

CRI Socket Location

# containerd socket
/var/run/containerd/containerd.sock

# CRI-O socket
/var/run/crio/crio.sock

# Docker socket (deprecated)
/var/run/docker.sock

Troubleshooting CRI Issues

Checking CRI Status

# Check container runtime status
systemctl status containerd
# or
systemctl status crio

# Test CRI connectivity
crictl version

# Check CRI socket
ls -la /var/run/containerd/containerd.sock

Common CRI Problems

Container Runtime Not Running

# Check status
systemctl status containerd

# Start container runtime
systemctl start containerd

# Enable on boot
systemctl enable containerd

CRI Socket Not Accessible

# Check socket permissions
ls -la /var/run/containerd/containerd.sock

# Check kubelet can access socket
sudo -u kubelet test -r /var/run/containerd/containerd.sock

# Fix permissions if needed
chmod 666 /var/run/containerd/containerd.sock

Version Incompatibility

# Check versions
crictl version
kubectl version --short

# Check kubelet logs for version errors
journalctl -u kubelet | grep -i version

Debugging Kubelet Connectivity

API Server Connectivity

# Check API server connectivity from node
curl -k https://<api-server-ip>:6443/healthz

# Check kubelet can reach API server
journalctl -u kubelet | grep -i "connection\|unreachable"

# Verify kubelet certificate
openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -text -noout

Node Registration

# Check node registration
kubectl get nodes

# Check node conditions
kubectl describe node <node-name>

# Check kubelet logs for registration
journalctl -u kubelet | grep -i "register\|certificate"

Resource Management Problems

Resource Limits

Kubelet enforces resource limits:

# Check node capacity
kubectl describe node <node-name> | grep -A 5 "Capacity\|Allocatable"

# Check resource usage
kubectl top node <node-name>

# Check for resource pressure
kubectl describe node <node-name> | grep -i "pressure"

Eviction Policies

Kubelet evicts pods under resource pressure:

# Check eviction policy
cat /var/lib/kubelet/config.yaml | grep -i eviction

# Check for evicted pods
kubectl get pods --all-namespaces | grep Evicted

# Check eviction events
kubectl get events --all-namespaces | grep Evicted

Pod Startup Failures

Diagnosing Startup Issues

# Step 1: Check pod status
kubectl get pod <pod-name>

# Step 2: Check pod events
kubectl describe pod <pod-name>

# Step 3: Check kubelet logs
journalctl -u kubelet | grep <pod-name>

# Step 4: Check container runtime logs
journalctl -u containerd | grep <pod-name>

Common Startup Problems

Image Pull Issues

# Check image pull status
kubectl describe pod <pod-name> | grep -A 5 "Events"

# Check kubelet logs
journalctl -u kubelet | grep -i "pull\|image"

# Test image pull
crictl pull <image-name>

Volume Mount Issues

# Check volume mounts
kubectl describe pod <pod-name> | grep -A 5 "Volumes\|Mounts"

# Check volume events
kubectl get events --field-selector involvedObject.name=<pod-name> | grep -i volume

# Verify volume exists
kubectl get pv,pvc

Resource Constraints

# Check resource requests
kubectl describe pod <pod-name> | grep -A 5 "Requests\|Limits"

# Check node resources
kubectl describe node <node-name> | grep -A 5 "Allocatable"

# Check for resource pressure
kubectl describe node <node-name> | grep -i "pressure"

Best Practices

  1. Monitor kubelet health - Set up monitoring for kubelet metrics

  2. Regular log rotation - Configure log rotation for kubelet logs

  3. Resource limits - Set appropriate resource limits for kubelet

  4. Certificate management - Regularly rotate and verify certificates

  5. Version compatibility - Keep kubelet and container runtime versions compatible

  6. Configuration management - Use kubelet config files for better management

  7. Health checks - Implement kubelet health checks

Troubleshooting Checklist

  • Check kubelet service status
  • Review kubelet logs for errors
  • Verify API server connectivity
  • Check node registration
  • Verify container runtime status
  • Check CRI socket accessibility
  • Verify image pull functionality
  • Check resource constraints
  • Review pod startup logs
  • Verify certificates and authentication

See Also