Cilium Cluster Mesh: eBPF-Based Multi-Cluster Networking

Cilium Cluster Mesh: eBPF-Based Multi-Cluster Networking

Introduction

On April 25, 2023, Cilium released Cluster Mesh, a multi-cluster networking solution that enables seamless connectivity between Kubernetes clusters using eBPF. Unlike VPN-based approaches, Cilium Cluster Mesh provides transparent service communication and policy enforcement across cluster boundaries, making multiple clusters feel like a single logical cluster.

As organizations adopt multi-cluster architectures for high availability, geographic distribution, or workload isolation, the challenge of connecting clusters becomes critical. Cilium Cluster Mesh addresses this by extending Cilium’s eBPF-based networking across cluster boundaries, providing the same performance and security guarantees as single-cluster deployments.


Why Cluster Mesh?

  • Transparent Connectivity: Services in different clusters communicate as if they’re in the same cluster.
  • eBPF Performance: Kernel-level networking provides high performance across clusters.
  • Policy Enforcement: Network policies and service mesh features work across cluster boundaries.
  • No VPN Overhead: Direct cluster-to-cluster connectivity without VPN tunnels.

Core Architecture

  • Cluster Identity: Each cluster has a unique identity for policy and routing.
  • Service Discovery: Automatic service discovery across clusters using Kubernetes Services.
  • eBPF Programs: Kernel-level programs handle cross-cluster packet forwarding.
  • Cluster Peering: Secure peering between clusters using mutual TLS.
  • Global Services: Services can be exposed globally across all clusters.

Getting Started

Enable Cluster Mesh in Cilium:

helm install cilium cilium/cilium --version 1.13.0 \
  --namespace kube-system \
  --set cluster.name=cluster1 \
  --set cluster.id=1 \
  --set clusterMesh.enabled=true

Configure cluster peering:

cilium clustermesh connect --context cluster1 --destination-context cluster2

Verify connectivity:

cilium clustermesh status

Key Features

  1. Transparent Service Communication: Services in different clusters communicate using standard Kubernetes Service DNS.
  2. Global Services: Expose services across all clusters with automatic load balancing.
  3. Cross-Cluster Policies: Network policies can reference services in other clusters.
  4. Service Mesh Integration: Cilium service mesh features work across clusters.
  5. High Performance: eBPF-based forwarding provides low latency across clusters.

Use Cases

High Availability

Deploy applications across multiple clusters for redundancy and failover.

Geographic Distribution

Distribute workloads across regions while maintaining service connectivity.

Workload Isolation

Isolate workloads in separate clusters while enabling controlled communication.

Hybrid Cloud

Connect on-premises clusters with cloud clusters for hybrid deployments.


Comparison with Alternatives

ApproachCilium Cluster MeshSubmarinerVPN
PerformanceHigh (eBPF)MediumLow
TransparencyFullPartialLimited
Policy SupportNativeLimitedNone
ComplexityMediumLowLow
OverheadLowMediumHigh

Service Discovery

Cluster Mesh extends Kubernetes Service discovery across clusters:

apiVersion: v1
kind: Service
metadata:
  name: global-service
  annotations:
    io.cilium/global-service: "true"
spec:
  selector:
    app: my-app
  ports:
  - port: 80

Services annotated with io.cilium/global-service: "true" are available across all clusters in the mesh.


Network Policies

Network policies can reference services in other clusters:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-cross-cluster
spec:
  podSelector:
    matchLabels:
      app: frontend
  policyTypes:
  - Egress
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          io.cilium.k8s.policy.cluster: cluster2
    - podSelector:
        matchLabels:
          app: backend

Operational Considerations

  • Cluster Peering: Secure peering requires proper network connectivity between clusters.
  • Latency: Cross-cluster communication adds network latency; consider geographic proximity.
  • Bandwidth: Inter-cluster traffic consumes bandwidth; plan for capacity.
  • Troubleshooting: Debugging cross-cluster issues requires understanding both clusters.

Common Patterns

  • Active-Active: Run services in multiple clusters with automatic load balancing.
  • Active-Passive: Primary cluster with standby cluster for failover.
  • Regional Distribution: Deploy services in different regions with global service discovery.
  • Workload Isolation: Isolate workloads while enabling controlled cross-cluster communication.

Limitations

  • Cilium Requirement: Cluster Mesh only works with Cilium CNI.
  • Network Requirements: Requires network connectivity between clusters.
  • Latency: Cross-cluster communication adds latency compared to intra-cluster.
  • Complexity: Multi-cluster setups are more complex than single-cluster.

Looking Ahead

Cilium Cluster Mesh would continue to evolve with:

  • Performance Improvements: Continued eBPF optimizations for cross-cluster traffic.
  • Policy Enhancements: More sophisticated cross-cluster policy capabilities.
  • Service Mesh Integration: Enhanced service mesh features across clusters.
  • Ecosystem Growth: More tools and integrations supporting Cluster Mesh.

Summary

AspectDetails
Release DateApril 25, 2023
Key InnovationseBPF-based multi-cluster networking, transparent service communication, cross-cluster policies
SignificanceDemonstrated that multi-cluster networking could achieve single-cluster performance and transparency using eBPF

Cilium Cluster Mesh proved that multi-cluster networking didn’t have to sacrifice performance or transparency. By extending eBPF-based networking across cluster boundaries, it provided seamless service communication and policy enforcement that made multiple clusters feel like a single logical cluster, setting a new standard for multi-cluster architectures.