Cilium Cluster Mesh: eBPF-Based Multi-Cluster Networking

Table of Contents
Introduction
On April 25, 2023, Cilium released Cluster Mesh, a multi-cluster networking solution that enables seamless connectivity between Kubernetes clusters using eBPF. Unlike VPN-based approaches, Cilium Cluster Mesh provides transparent service communication and policy enforcement across cluster boundaries, making multiple clusters feel like a single logical cluster.
As organizations adopt multi-cluster architectures for high availability, geographic distribution, or workload isolation, the challenge of connecting clusters becomes critical. Cilium Cluster Mesh addresses this by extending Cilium’s eBPF-based networking across cluster boundaries, providing the same performance and security guarantees as single-cluster deployments.
Why Cluster Mesh?
- Transparent Connectivity: Services in different clusters communicate as if they’re in the same cluster.
- eBPF Performance: Kernel-level networking provides high performance across clusters.
- Policy Enforcement: Network policies and service mesh features work across cluster boundaries.
- No VPN Overhead: Direct cluster-to-cluster connectivity without VPN tunnels.
Core Architecture
- Cluster Identity: Each cluster has a unique identity for policy and routing.
- Service Discovery: Automatic service discovery across clusters using Kubernetes Services.
- eBPF Programs: Kernel-level programs handle cross-cluster packet forwarding.
- Cluster Peering: Secure peering between clusters using mutual TLS.
- Global Services: Services can be exposed globally across all clusters.
Getting Started
Enable Cluster Mesh in Cilium:
helm install cilium cilium/cilium --version 1.13.0 \
--namespace kube-system \
--set cluster.name=cluster1 \
--set cluster.id=1 \
--set clusterMesh.enabled=true
Configure cluster peering:
cilium clustermesh connect --context cluster1 --destination-context cluster2
Verify connectivity:
cilium clustermesh status
Key Features
- Transparent Service Communication: Services in different clusters communicate using standard Kubernetes Service DNS.
- Global Services: Expose services across all clusters with automatic load balancing.
- Cross-Cluster Policies: Network policies can reference services in other clusters.
- Service Mesh Integration: Cilium service mesh features work across clusters.
- High Performance: eBPF-based forwarding provides low latency across clusters.
Use Cases
High Availability
Deploy applications across multiple clusters for redundancy and failover.
Geographic Distribution
Distribute workloads across regions while maintaining service connectivity.
Workload Isolation
Isolate workloads in separate clusters while enabling controlled communication.
Hybrid Cloud
Connect on-premises clusters with cloud clusters for hybrid deployments.
Comparison with Alternatives
| Approach | Cilium Cluster Mesh | Submariner | VPN |
|---|---|---|---|
| Performance | High (eBPF) | Medium | Low |
| Transparency | Full | Partial | Limited |
| Policy Support | Native | Limited | None |
| Complexity | Medium | Low | Low |
| Overhead | Low | Medium | High |
Service Discovery
Cluster Mesh extends Kubernetes Service discovery across clusters:
apiVersion: v1
kind: Service
metadata:
name: global-service
annotations:
io.cilium/global-service: "true"
spec:
selector:
app: my-app
ports:
- port: 80
Services annotated with io.cilium/global-service: "true" are available across all clusters in the mesh.
Network Policies
Network policies can reference services in other clusters:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-cross-cluster
spec:
podSelector:
matchLabels:
app: frontend
policyTypes:
- Egress
egress:
- to:
- namespaceSelector:
matchLabels:
io.cilium.k8s.policy.cluster: cluster2
- podSelector:
matchLabels:
app: backend
Operational Considerations
- Cluster Peering: Secure peering requires proper network connectivity between clusters.
- Latency: Cross-cluster communication adds network latency; consider geographic proximity.
- Bandwidth: Inter-cluster traffic consumes bandwidth; plan for capacity.
- Troubleshooting: Debugging cross-cluster issues requires understanding both clusters.
Common Patterns
- Active-Active: Run services in multiple clusters with automatic load balancing.
- Active-Passive: Primary cluster with standby cluster for failover.
- Regional Distribution: Deploy services in different regions with global service discovery.
- Workload Isolation: Isolate workloads while enabling controlled cross-cluster communication.
Limitations
- Cilium Requirement: Cluster Mesh only works with Cilium CNI.
- Network Requirements: Requires network connectivity between clusters.
- Latency: Cross-cluster communication adds latency compared to intra-cluster.
- Complexity: Multi-cluster setups are more complex than single-cluster.
Looking Ahead
Cilium Cluster Mesh would continue to evolve with:
- Performance Improvements: Continued eBPF optimizations for cross-cluster traffic.
- Policy Enhancements: More sophisticated cross-cluster policy capabilities.
- Service Mesh Integration: Enhanced service mesh features across clusters.
- Ecosystem Growth: More tools and integrations supporting Cluster Mesh.
Summary
| Aspect | Details |
|---|---|
| Release Date | April 25, 2023 |
| Key Innovations | eBPF-based multi-cluster networking, transparent service communication, cross-cluster policies |
| Significance | Demonstrated that multi-cluster networking could achieve single-cluster performance and transparency using eBPF |
Cilium Cluster Mesh proved that multi-cluster networking didn’t have to sacrifice performance or transparency. By extending eBPF-based networking across cluster boundaries, it provided seamless service communication and policy enforcement that made multiple clusters feel like a single logical cluster, setting a new standard for multi-cluster architectures.