Chaos Mesh 1.0: Chaos Engineering Platform

Chaos Mesh 1.0: Chaos Engineering Platform

Introduction

Most outages aren’t triggered by a single catastrophic bug — they’re triggered by ordinary failures (a node reboot, a flaky network link, a disk that starts timing out) happening at the worst possible time. Chaos engineering is a disciplined way to practice those failures before production does it for you.

Chaos Mesh 1.0, released on September 20, 2020, is a big step toward making that practice repeatable on Kubernetes: a single platform for injecting faults, running controlled experiments, and learning which assumptions in your systems are actually true.


How to start safely

  • Keep the blast radius small: select a single namespace and a narrow label selector before you ever target shared dependencies.
  • Treat experiments like deployments: schedule them, review them, and add clear rollback steps (even for “simple” pod-kill tests).
  • Measure a hypothesis: define what “healthy” means (latency, error rate, SLO) so you can tell resilience from luck.

Fault Injection Capabilities

  • Pod chaos enables killing, stopping, or restarting pods to test application resilience.
  • Network chaos simulates network failures, latency, and packet loss.
  • I/O chaos injects file system and disk I/O faults.
  • Time chaos manipulates system time to test time-dependent behaviors.
  • Kernel chaos injects kernel-level faults for advanced testing scenarios.

Experiment Management

  1. Web UI provides intuitive interface for creating and managing chaos experiments.
  2. Scheduling support enables running experiments on schedules or triggers.
  3. Experiment templates simplify creating common chaos scenarios.
  4. Multi-cluster support enables running experiments across multiple clusters.

Observability

  • Metrics integration exposes detailed chaos experiment metrics for Prometheus.
  • Event tracking provides comprehensive logs of all chaos operations.
  • Dashboard integration with Grafana provides visualization of chaos experiments.
  • Alerting support enables notifications when experiments complete or fail.

Getting Started

curl -sSL https://mirrors.chaos-mesh.org/latest/install.sh | bash
kubectl apply -f https://mirrors.chaos-mesh.org/latest/crd.yaml

Create a chaos experiment:

apiVersion: chaos-mesh.org/v1alpha1
kind: PodChaos
metadata:
  name: pod-kill-example
spec:
  action: pod-kill
  mode: one
  selector:
    namespaces:
    - default
    labelSelectors:
      app: my-app
  scheduler:
    cron: "@every 2m"

Summary

AspectDetails
Release DateSeptember 20, 2020
Headline FeaturesComprehensive fault injection, experiment management, observability
Why it MattersProvides a platform for testing and improving application resilience through chaos engineering

Chaos Mesh 1.0 continues to evolve as a leading chaos engineering platform, providing teams with powerful tools for testing and improving the resilience of Kubernetes applications.