CNI Performance Benchmark: Choosing the Right Plugin

CNI Performance Benchmark: Choosing the Right Plugin

Introduction

CNI plugin performance directly impacts application latency, cluster scalability, and resource efficiency. The choice between eBPF-based solutions, overlay networks, and cloud-native CNIs can mean the difference between sub-millisecond and multi-millisecond packet processing, or between supporting hundreds vs. thousands of pods per node.

This benchmark compares the performance characteristics of major CNI plugins: Cilium (eBPF), Calico (BGP/iptables), Antrea (OVS), AWS VPC CNI (native VPC), and Flannel (VXLAN overlay). Understanding these differences helps teams choose the right CNI based on performance requirements.


Performance Dimensions

We’ll compare CNIs across these dimensions:

  1. Pod-to-Pod Latency: Inter-pod communication latency
  2. Throughput: Maximum bandwidth between pods
  3. CPU Usage: CPU overhead for network processing
  4. Memory Usage: Memory footprint of CNI components
  5. Pod Startup Time: Time to allocate network resources
  6. Scalability: Maximum pods per node and cluster size
  7. Network Policy Performance: Policy enforcement overhead

Cilium (eBPF)

Performance Characteristics

  • Latency: Excellent (~0.1ms pod-to-pod)
  • Throughput: Excellent (near line-rate)
  • CPU Usage: Low (kernel-level processing)
  • Memory Usage: Low (~10-20MB per node)
  • Pod Startup: Fast (milliseconds)
  • Scalability: High (thousands of pods per node)

Why It’s Fast

  • Kernel-Level Processing: eBPF programs run in kernel, avoiding user-space overhead
  • No Overlay: Direct routing without encapsulation
  • Efficient Policy Enforcement: Policy checks in kernel, not iptables chains
  • Connection Tracking: Optimized connection tracking in kernel

Trade-offs

  • Kernel Requirements: Requires Linux 4.19+ with eBPF support
  • Learning Curve: eBPF concepts needed for advanced troubleshooting

Calico (BGP/iptables)

Performance Characteristics

  • Latency: Good (~0.5ms pod-to-pod, iptables mode)
  • Throughput: Good (high throughput, iptables can bottleneck)
  • CPU Usage: Medium (iptables processing overhead)
  • Memory Usage: Medium (~50-100MB per node)
  • Pod Startup: Fast (milliseconds)
  • Scalability: Good (hundreds of pods per node)

Performance Modes

iptables Mode:

  • Lower latency but CPU overhead increases with policy count
  • iptables chains can become bottlenecks

eBPF Mode (Calico 3.16+):

  • Better performance, closer to Cilium
  • Still some overhead compared to native eBPF CNIs

Trade-offs

  • Policy Scaling: iptables performance degrades with many policies
  • eBPF Mode: Better performance but requires kernel support

Antrea (OVS)

Performance Characteristics

  • Latency: Good (~0.3-0.5ms pod-to-pod)
  • Throughput: Good (OVS kernel datapath is efficient)
  • CPU Usage: Medium (OVS processing overhead)
  • Memory Usage: Medium (~100-200MB per node)
  • Pod Startup: Fast (milliseconds)
  • Scalability: Good (hundreds of pods per node)

Why It’s Fast

  • OVS Kernel Datapath: Fast-path processing in kernel
  • Flow Caching: OVS flow tables cache routing decisions
  • Efficient Switching: OVS optimized for virtual switching

Trade-offs

  • OVS Overhead: Some overhead compared to native kernel routing
  • Memory Usage: OVS components consume more memory
  • Complexity: OVS troubleshooting can be complex

AWS VPC CNI

Performance Characteristics

  • Latency: Excellent (~0.1ms pod-to-pod, no overlay)
  • Throughput: Excellent (native VPC performance)
  • CPU Usage: Low (minimal processing, uses AWS networking)
  • Memory Usage: Low (~20-50MB per node)
  • Pod Startup: Variable (depends on IP pre-allocation)
  • Scalability: Limited (constrained by ENI and IP limits)

Why It’s Fast

  • No Overlay: Direct VPC networking, no encapsulation
  • AWS Hardware: Leverages AWS network hardware acceleration
  • Native Integration: Uses AWS networking primitives directly

Trade-offs

  • IP Address Limits: VPC subnet IP limits constrain scalability
  • ENI Limits: EC2 instance ENI limits affect pod density
  • AWS-Only: Works only in AWS environments

Flannel (VXLAN)

Performance Characteristics

  • Latency: Fair (~1-2ms pod-to-pod, overlay overhead)
  • Throughput: Fair (VXLAN encapsulation overhead)
  • CPU Usage: Medium (VXLAN encapsulation/decapsulation)
  • Memory Usage: Low (~20-50MB per node)
  • Pod Startup: Fast (milliseconds)
  • Scalability: Good (hundreds of pods per node)

Why It’s Slower

  • VXLAN Overlay: Encapsulation adds latency and CPU overhead
  • User-Space Processing: Some processing in user space
  • Simple Implementation: Prioritizes simplicity over performance

Trade-offs

  • Simplicity: Easiest to understand and operate
  • Performance: Overlay overhead limits performance
  • Features: Limited features compared to other CNIs

Performance Comparison Matrix

MetricCiliumCalico (iptables)Calico (eBPF)AntreaAWS VPC CNIFlannel
Pod-to-Pod Latency~0.1ms~0.5ms~0.2ms~0.3ms~0.1ms~1-2ms
ThroughputExcellentGoodExcellentGoodExcellentFair
CPU UsageLowMediumLowMediumLowMedium
Memory UsageLowMediumMediumMediumLowLow
Pod StartupFastFastFastFastVariableFast
Policy PerformanceExcellentDegradesGoodGoodN/ALimited
ScalabilityHighGoodHighGoodLimitedGood

Benchmark Results (Typical Scenarios)

Small Cluster (< 100 pods)

  • Winner: Any CNI performs well
  • Recommendation: Choose based on features, not performance

Medium Cluster (100-1000 pods)

  • Winner: Cilium or Calico (eBPF mode)
  • Recommendation: eBPF-based CNIs show advantages

Large Cluster (1000+ pods)

  • Winner: Cilium
  • Recommendation: eBPF kernel-level processing scales best

High Policy Count (> 100 policies)

  • Winner: Cilium
  • Recommendation: eBPF policy enforcement doesn’t degrade like iptables

Low Latency Requirements (< 1ms)

  • Winner: Cilium or AWS VPC CNI
  • Recommendation: Avoid overlay networks (Flannel)

High Throughput Requirements

  • Winner: Cilium, AWS VPC CNI, or Calico (eBPF)
  • Recommendation: Avoid iptables-based policy enforcement

Use Case Recommendations

Choose Cilium if:

  • You need the best performance (latency, throughput)
  • You have many network policies (> 50)
  • You’re running large clusters (1000+ pods)
  • Low latency is critical (< 1ms)

Choose Calico (eBPF mode) if:

  • You need good performance with Calico features
  • You have moderate policy counts
  • You want BGP integration
  • Kernel supports eBPF

Choose Antrea if:

  • You’re using VMware infrastructure
  • OVS performance meets your needs
  • You want OVS features
  • You have moderate performance requirements

Choose AWS VPC CNI if:

  • You’re running EKS exclusively
  • You need native AWS integration
  • IP address limits are acceptable
  • You want best AWS-native performance

Choose Flannel if:

  • Simplicity is more important than performance
  • You have low performance requirements
  • You want the simplest CNI
  • Overlay overhead is acceptable

Operational Considerations

  • Kernel Requirements: eBPF-based CNIs require newer kernels
  • Resource Planning: Consider CPU and memory overhead
  • Scalability Planning: Understand pod density limits
  • Policy Impact: Many policies can impact iptables-based CNIs
  • Monitoring: Monitor CNI performance metrics

Summary

Performance varies significantly across CNI plugins. Cilium provides the best overall performance with eBPF-based processing, making it ideal for high-performance, large-scale deployments. Calico in eBPF mode offers good performance with Calico’s feature set. AWS VPC CNI excels in AWS environments with native VPC performance. Antrea provides good performance with OVS features. Flannel prioritizes simplicity over performance.

The choice depends on your performance requirements: for the best performance at scale, Cilium is unmatched. For AWS-only environments, VPC CNI provides excellent native performance. For balanced performance and features, Calico (eBPF) or Antrea are solid choices. For simplicity-first deployments, Flannel remains viable.