CNI Performance Benchmark: Choosing the Right Plugin

Introduction

CNI plugin performance directly impacts application latency, cluster scalability, and resource efficiency. The choice between eBPF-based solutions, overlay networks, and cloud-native CNIs can mean the difference between sub-millisecond and multi-millisecond packet processing, or between supporting hundreds vs. thousands of pods per node.

This benchmark compares the performance characteristics of major CNI plugins: Cilium (eBPF), Calico (BGP/iptables), Antrea (OVS), AWS VPC CNI (native VPC), and Flannel (VXLAN overlay). Understanding these differences helps teams choose the right CNI based on performance requirements.

Performance Dimensions

We’ll compare CNIs across these dimensions:

Pod-to-Pod Latency: Inter-pod communication latency
Throughput: Maximum bandwidth between pods
CPU Usage: CPU overhead for network processing
Memory Usage: Memory footprint of CNI components
Pod Startup Time: Time to allocate network resources
Scalability: Maximum pods per node and cluster size
Network Policy Performance: Policy enforcement overhead

Cilium (eBPF)

Performance Characteristics

Latency: Excellent (~0.1ms pod-to-pod)
Throughput: Excellent (near line-rate)
CPU Usage: Low (kernel-level processing)
Memory Usage: Low (~10-20MB per node)
Pod Startup: Fast (milliseconds)
Scalability: High (thousands of pods per node)

Why It’s Fast

Kernel-Level Processing: eBPF programs run in kernel, avoiding user-space overhead
No Overlay: Direct routing without encapsulation
Efficient Policy Enforcement: Policy checks in kernel, not iptables chains
Connection Tracking: Optimized connection tracking in kernel

Trade-offs

Kernel Requirements: Requires Linux 4.19+ with eBPF support
Learning Curve: eBPF concepts needed for advanced troubleshooting

Calico (BGP/iptables)

Performance Characteristics

Latency: Good (~0.5ms pod-to-pod, iptables mode)
Throughput: Good (high throughput, iptables can bottleneck)
CPU Usage: Medium (iptables processing overhead)
Memory Usage: Medium (~50-100MB per node)
Pod Startup: Fast (milliseconds)
Scalability: Good (hundreds of pods per node)

Performance Modes

iptables Mode:

Lower latency but CPU overhead increases with policy count
iptables chains can become bottlenecks

eBPF Mode (Calico 3.16+):

Better performance, closer to Cilium
Still some overhead compared to native eBPF CNIs

Trade-offs

Policy Scaling: iptables performance degrades with many policies
eBPF Mode: Better performance but requires kernel support

Antrea (OVS)

Performance Characteristics

Latency: Good (~0.3-0.5ms pod-to-pod)
Throughput: Good (OVS kernel datapath is efficient)
CPU Usage: Medium (OVS processing overhead)
Memory Usage: Medium (~100-200MB per node)
Pod Startup: Fast (milliseconds)
Scalability: Good (hundreds of pods per node)

Why It’s Fast

OVS Kernel Datapath: Fast-path processing in kernel
Flow Caching: OVS flow tables cache routing decisions
Efficient Switching: OVS optimized for virtual switching

Trade-offs

OVS Overhead: Some overhead compared to native kernel routing
Memory Usage: OVS components consume more memory
Complexity: OVS troubleshooting can be complex

AWS VPC CNI

Performance Characteristics

Latency: Excellent (~0.1ms pod-to-pod, no overlay)
Throughput: Excellent (native VPC performance)
CPU Usage: Low (minimal processing, uses AWS networking)
Memory Usage: Low (~20-50MB per node)
Pod Startup: Variable (depends on IP pre-allocation)
Scalability: Limited (constrained by ENI and IP limits)

Why It’s Fast

No Overlay: Direct VPC networking, no encapsulation
AWS Hardware: Leverages AWS network hardware acceleration
Native Integration: Uses AWS networking primitives directly

Trade-offs

IP Address Limits: VPC subnet IP limits constrain scalability
ENI Limits: EC2 instance ENI limits affect pod density
AWS-Only: Works only in AWS environments

Flannel (VXLAN)

Performance Characteristics

Latency: Fair (~1-2ms pod-to-pod, overlay overhead)
Throughput: Fair (VXLAN encapsulation overhead)
CPU Usage: Medium (VXLAN encapsulation/decapsulation)
Memory Usage: Low (~20-50MB per node)
Pod Startup: Fast (milliseconds)
Scalability: Good (hundreds of pods per node)

Why It’s Slower

VXLAN Overlay: Encapsulation adds latency and CPU overhead
User-Space Processing: Some processing in user space
Simple Implementation: Prioritizes simplicity over performance

Trade-offs

Simplicity: Easiest to understand and operate
Performance: Overlay overhead limits performance
Features: Limited features compared to other CNIs

Performance Comparison Matrix

Metric	Cilium	Calico (iptables)	Calico (eBPF)	Antrea	AWS VPC CNI	Flannel
Pod-to-Pod Latency	~0.1ms	~0.5ms	~0.2ms	~0.3ms	~0.1ms	~1-2ms
Throughput	Excellent	Good	Excellent	Good	Excellent	Fair
CPU Usage	Low	Medium	Low	Medium	Low	Medium
Memory Usage	Low	Medium	Medium	Medium	Low	Low
Pod Startup	Fast	Fast	Fast	Fast	Variable	Fast
Policy Performance	Excellent	Degrades	Good	Good	N/A	Limited
Scalability	High	Good	High	Good	Limited	Good

Benchmark Results (Typical Scenarios)

Small Cluster (< 100 pods)

Winner: Any CNI performs well
Recommendation: Choose based on features, not performance

Medium Cluster (100-1000 pods)

Winner: Cilium or Calico (eBPF mode)
Recommendation: eBPF-based CNIs show advantages

Large Cluster (1000+ pods)

Winner: Cilium
Recommendation: eBPF kernel-level processing scales best

High Policy Count (> 100 policies)

Winner: Cilium
Recommendation: eBPF policy enforcement doesn’t degrade like iptables

Low Latency Requirements (< 1ms)

Winner: Cilium or AWS VPC CNI
Recommendation: Avoid overlay networks (Flannel)

High Throughput Requirements

Winner: Cilium, AWS VPC CNI, or Calico (eBPF)
Recommendation: Avoid iptables-based policy enforcement

Use Case Recommendations

Choose Cilium if:

You need the best performance (latency, throughput)
You have many network policies (> 50)
You’re running large clusters (1000+ pods)
Low latency is critical (< 1ms)

Choose Calico (eBPF mode) if:

You need good performance with Calico features
You have moderate policy counts
You want BGP integration
Kernel supports eBPF

Choose Antrea if:

You’re using VMware infrastructure
OVS performance meets your needs
You want OVS features
You have moderate performance requirements

Choose AWS VPC CNI if:

You’re running EKS exclusively
You need native AWS integration
IP address limits are acceptable
You want best AWS-native performance

Choose Flannel if:

Simplicity is more important than performance
You have low performance requirements
You want the simplest CNI
Overlay overhead is acceptable

Operational Considerations

Kernel Requirements: eBPF-based CNIs require newer kernels
Resource Planning: Consider CPU and memory overhead
Scalability Planning: Understand pod density limits
Policy Impact: Many policies can impact iptables-based CNIs
Monitoring: Monitor CNI performance metrics

Summary

Performance varies significantly across CNI plugins. Cilium provides the best overall performance with eBPF-based processing, making it ideal for high-performance, large-scale deployments. Calico in eBPF mode offers good performance with Calico’s feature set. AWS VPC CNI excels in AWS environments with native VPC performance. Antrea provides good performance with OVS features. Flannel prioritizes simplicity over performance.

The choice depends on your performance requirements: for the best performance at scale, Cilium is unmatched. For AWS-only environments, VPC CNI provides excellent native performance. For balanced performance and features, Calico (eBPF) or Antrea are solid choices. For simplicity-first deployments, Flannel remains viable.

Table of Contents

Introduction

Performance Dimensions

Cilium (eBPF)

Performance Characteristics

Why It’s Fast

Trade-offs

Calico (BGP/iptables)

Performance Characteristics

Performance Modes

Trade-offs

Antrea (OVS)

Performance Characteristics

Why It’s Fast

Trade-offs

AWS VPC CNI

Performance Characteristics

Why It’s Fast

Trade-offs

Flannel (VXLAN)

Performance Characteristics

Why It’s Slower

Trade-offs

Performance Comparison Matrix

Benchmark Results (Typical Scenarios)

Small Cluster (< 100 pods)

Medium Cluster (100-1000 pods)

Large Cluster (1000+ pods)

High Policy Count (> 100 policies)

Low Latency Requirements (< 1ms)

High Throughput Requirements

Use Case Recommendations

Choose Cilium if:

Choose Calico (eBPF mode) if:

Choose Antrea if:

Choose AWS VPC CNI if:

Choose Flannel if:

Operational Considerations

Summary