Networking

Network issues in Kubernetes can be complex due to the layered networking architecture. This guide covers common networking problems, diagnostic tools, and solutions for pod-to-pod communication, service connectivity, DNS, and network policies.

Network Troubleshooting Methodology

Systematic approach to network troubleshooting:

graph TB A[Network Issue] --> B{Issue Type?} B -->|Pod-to-Pod| C[Pod Connectivity] B -->|Service| D[Service Connectivity] B -->|DNS| E[DNS Resolution] B -->|External| F[External Access] C --> C1[Check CNI] C --> C2[Check Network Policies] C --> C3[Check Pod IPs] D --> D1[Check Service] D --> D2[Check Endpoints] D --> D3[Check Selector] E --> E1[Check CoreDNS] E --> E2[Check DNS Config] E --> E3[Test Resolution] F --> F1[Check Ingress] F --> F2[Check LoadBalancer] F --> F3[Check Firewall] style A fill:#e1f5ff style C fill:#e8f5e9 style D fill:#fff4e1 style E fill:#f3e5f5 style F fill:#ffe1e1

Common Network Issues

Pod-to-Pod Communication

Symptoms:

  • Pods cannot ping each other
  • Services cannot reach backend pods
  • Network timeouts

Common Causes:

  • CNI plugin issues
  • NetworkPolicy blocking traffic
  • IP address conflicts
  • Routing problems

Service Connectivity

Symptoms:

  • Cannot access services
  • Service endpoints empty
  • Service IP not responding

Common Causes:

  • Service selector mismatch
  • No pod endpoints
  • NetworkPolicy blocking
  • kube-proxy issues

DNS Failures

Symptoms:

  • DNS resolution failures
  • Services not resolvable
  • nslookup failures

Common Causes:

  • CoreDNS not running
  • DNS configuration issues
  • NetworkPolicy blocking DNS
  • DNS service problems

Ingress Issues

Symptoms:

  • Ingress not routing traffic
  • 502/503 errors
  • Certificates not working

Common Causes:

  • Ingress controller not running
  • Backend service issues
  • Certificate problems
  • Configuration errors

Diagnostic Tools

Basic Connectivity Tests

# Test pod-to-pod connectivity
kubectl run test-pod --image=busybox --rm -it -- ping <target-pod-ip>

# Test service connectivity
kubectl run test-pod --image=busybox --rm -it -- wget -O- <service-name>

# Test DNS resolution
kubectl run test-pod --image=busybox --rm -it -- nslookup <service-name>

Network Debug Pod

Create a persistent network debugging pod:

apiVersion: v1
kind: Pod
metadata:
  name: netshoot
  namespace: default
spec:
  containers:
  - name: netshoot
    image: nicolaka/netshoot
    command: ["sleep", "3600"]

Use for debugging:

# Execute into debug pod
kubectl exec -it netshoot -- bash

# Now you have network tools:
# - ping, traceroute
# - curl, wget
# - dig, nslookup
# - tcpdump
# - netstat, ss

kubectl exec Tools

# Execute into pod
kubectl exec -it <pod-name> -- /bin/sh

# Test connectivity
kubectl exec <pod-name> -- ping <target-ip>
kubectl exec <pod-name> -- curl http://<service-name>

# Check DNS
kubectl exec <pod-name> -- nslookup <service-name>
kubectl exec <pod-name> -- dig <service-name>

# Check network interfaces
kubectl exec <pod-name> -- ip addr
kubectl exec <pod-name> -- netstat -an

Pod-to-Pod Communication

Checking Pod IPs

# Get pod IP addresses
kubectl get pods -o wide

# Get specific pod IP
kubectl get pod <pod-name> -o jsonpath='{.status.podIP}'

# List all pod IPs
kubectl get pods -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.podIP}{"\n"}{end}'

Testing Connectivity

# Create test pod
kubectl run test-pod --image=busybox --rm -it -- sh

# Inside pod, test connectivity:
ping <target-pod-ip>
wget -O- http://<target-pod-ip>:8080
telnet <target-pod-ip> 8080

CNI Plugin Issues

# Check CNI plugin pods
kubectl get pods -n kube-system | grep -i cni

# Check CNI plugin logs
kubectl logs -n kube-system -l app=<cni-plugin>

# Check CNI configuration
cat /etc/cni/net.d/*.conf

# Check CNI binary
ls -la /opt/cni/bin/

Service Troubleshooting

Service Status

# Check service
kubectl get service <service-name>

# Describe service
kubectl describe service <service-name>

# Get service details
kubectl get service <service-name> -o yaml

Service Endpoints

# Check service endpoints
kubectl get endpoints <service-name>

# Describe endpoints
kubectl describe endpoints <service-name>

# Check if endpoints are empty
kubectl get endpoints <service-name> -o jsonpath='{.subsets}'

Empty endpoints indicate:

  • No pods match service selector
  • Pods not ready (readiness probe failing)
  • Port mismatch

Service Selector

# Check service selector
kubectl get service <service-name> -o jsonpath='{.spec.selector}'

# Check pods matching selector
kubectl get pods -l <selector>

# Verify labels match
kubectl get pods --show-labels

Testing Service Connectivity

# Test service from pod
kubectl run test-pod --image=busybox --rm -it -- \
  wget -O- http://<service-name>.<namespace>.svc.cluster.local

# Test service IP directly
kubectl get service <service-name> -o jsonpath='{.spec.clusterIP}'
kubectl run test-pod --image=busybox --rm -it -- \
  wget -O- http://<cluster-ip>

# Test with port
kubectl run test-pod --image=busybox --rm -it -- \
  wget -O- http://<service-name>:<port>

DNS Troubleshooting

CoreDNS Status

# Check CoreDNS pods
kubectl get pods -n kube-system | grep coredns

# Check CoreDNS logs
kubectl logs -n kube-system -l k8s-app=kube-dns

# Check CoreDNS configuration
kubectl get configmap coredns -n kube-system -o yaml

DNS Configuration

# Check pod DNS configuration
kubectl run test-pod --image=busybox --rm -it -- cat /etc/resolv.conf

# Expected output:
# nameserver <coredns-service-ip>
# search <namespace>.svc.cluster.local svc.cluster.local cluster.local
# options ndots:5

DNS Resolution Tests

# Test DNS resolution
kubectl run test-pod --image=busybox --rm -it -- nslookup <service-name>

# Test with dig
kubectl run test-pod --image=busybox --rm -it -- \
  dig <service-name>.<namespace>.svc.cluster.local

# Test FQDN
kubectl run test-pod --image=busybox --rm -it -- \
  nslookup <service-name>.<namespace>.svc.cluster.local

Common DNS Issues

DNS Not Resolving

# Check CoreDNS is running
kubectl get pods -n kube-system -l k8s-app=kube-dns

# Check DNS service
kubectl get service kube-dns -n kube-system

# Test CoreDNS directly
kubectl run test-pod --image=busybox --rm -it -- \
  nslookup kubernetes.default.svc.cluster.local <coredns-service-ip>

Slow DNS Resolution

# Check CoreDNS performance
kubectl logs -n kube-system -l k8s-app=kube-dns | grep -i "slow\|timeout"

# Check DNS configuration
kubectl get configmap coredns -n kube-system -o yaml

# Consider optimizing DNS
# - Reduce ndots
# - Add DNS caching
# - Scale CoreDNS

NetworkPolicy Debugging

Checking Network Policies

# List network policies
kubectl get networkpolicies -A

# Get network policy details
kubectl get networkpolicy <policy-name> -o yaml

# Describe network policy
kubectl describe networkpolicy <policy-name>

Testing Network Policies

# Test from allowed pod
kubectl exec -it <allowed-pod> -- ping <target-pod-ip>

# Test from blocked pod
kubectl exec -it <blocked-pod> -- ping <target-pod-ip>

# Check NetworkPolicy logs (if CNI supports it)
kubectl logs -n kube-system -l app=<cni-plugin> | grep -i networkpolicy

Common NetworkPolicy Issues

  • Too restrictive - Blocking legitimate traffic
  • Missing rules - Not blocking intended traffic
  • Label mismatches - Selectors not matching pods
  • Namespace issues - Policies not applying to correct namespace

Ingress Troubleshooting

Ingress Status

# Check ingress
kubectl get ingress <ingress-name>

# Describe ingress
kubectl describe ingress <ingress-name>

# Get ingress details
kubectl get ingress <ingress-name> -o yaml

Ingress Controller

# Check ingress controller pods
kubectl get pods -n <ingress-namespace> | grep ingress

# Check ingress controller logs
kubectl logs -n <ingress-namespace> -l app=<ingress-controller>

# Check ingress controller service
kubectl get service -n <ingress-namespace> | grep ingress

Testing Ingress

# Test ingress from outside
curl -H "Host: <hostname>" http://<ingress-ip>

# Test ingress from pod
kubectl run test-pod --image=busybox --rm -it -- \
  wget -O- -H "Host: <hostname>" http://<ingress-ip>

# Check ingress backend
kubectl describe ingress <ingress-name> | grep -A 5 "Backends\|Rules"

kube-proxy Troubleshooting

kube-proxy Status

# Check kube-proxy pods
kubectl get pods -n kube-system | grep kube-proxy

# Check kube-proxy logs
kubectl logs -n kube-system -l k8s-app=kube-proxy

# Check kube-proxy mode
kubectl get configmap kube-proxy -n kube-system -o yaml | grep mode

Common kube-proxy Issues

  • iptables rules - Check iptables rules for services
  • IPVS mode - Verify IPVS configuration
  • Connection tracking - Check conntrack table

Network Debugging Examples

Complete Connectivity Test

# Step 1: Create debug pod
kubectl run netshoot --image=nicolaka/netshoot --rm -it -- sleep 3600

# Step 2: Test DNS
nslookup kubernetes.default.svc.cluster.local
dig <service-name>.<namespace>.svc.cluster.local

# Step 3: Test service connectivity
curl http://<service-name>.<namespace>.svc.cluster.local
wget -O- http://<service-name>:<port>

# Step 4: Test pod-to-pod
ping <target-pod-ip>
telnet <target-pod-ip> <port>

# Step 5: Capture traffic
tcpdump -i any -n

Service Discovery Test

# Test service discovery
kubectl run test-pod --image=busybox --rm -it -- sh

# Inside pod:
# Short name (same namespace)
wget -O- http://<service-name>

# Short name (different namespace)
wget -O- http://<service-name>.<namespace>

# FQDN
wget -O- http://<service-name>.<namespace>.svc.cluster.local

Best Practices

  1. Use network debug pods - Keep debugging tools ready

  2. Test incrementally - Start from basic connectivity, then add layers

  3. Check logs - Review CNI, kube-proxy, and CoreDNS logs

  4. Verify configurations - Check service selectors, NetworkPolicies, DNS config

  5. Document network topology - Understand your CNI and network setup

  6. Monitor network metrics - Track network performance and errors

  7. Test regularly - Proactively test network connectivity

Troubleshooting Checklist

  • Verify pod IPs are assigned
  • Test pod-to-pod connectivity
  • Check service endpoints
  • Verify service selector matches pods
  • Test DNS resolution
  • Check CoreDNS is running
  • Review NetworkPolicies
  • Verify CNI plugin is working
  • Check kube-proxy status
  • Test Ingress configuration

See Also