Litmus 2.0: Chaos Engineering for Kubernetes

Litmus 2.0: Chaos Engineering for Kubernetes

Introduction

Litmus 2.0 — Chaos Engineering for Kubernetes — was released on August 25, 2021.

This is a practical update aimed at making day‑to‑day Kubernetes work a bit more predictable.

In this release: Litmus 2.0 delivers a comprehensive chaos engineering platform for Kubernetes with enhanced experiments, improved observability, and better GitOps integration.


Chaos Experiment Framework

  • Experiment catalog provides a library of pre-built chaos experiments for common failure scenarios.
  • Custom experiments enable teams to create domain-specific chaos tests tailored to their applications.
  • Experiment scheduling supports one-time, recurring, and event-driven chaos experiment execution.
  • Multi-cluster support enables chaos testing across distributed Kubernetes deployments.

Experiment Types

  1. Pod chaos experiments inject pod failures, network partitions, and resource constraints.
  2. Network chaos tests validate resilience to network latency, packet loss, and DNS failures.
  3. Node chaos experiments simulate node failures, reboots, and resource exhaustion.
  4. Storage chaos tests validate behavior during storage failures and I/O issues.

Observability & Analysis

  • Experiment results provide detailed reports on system behavior during chaos experiments.
  • Metrics integration exposes chaos experiment metrics for Prometheus and Grafana dashboards.
  • Event tracking records all chaos experiment events for audit and analysis.
  • Recovery validation ensures systems recover correctly after chaos experiments complete.

GitOps & CI/CD Integration

  • GitOps workflows enable chaos experiments to be defined and managed through Git repositories.
  • CI/CD integration allows chaos testing to be part of automated deployment pipelines.
  • Policy enforcement ensures chaos experiments are approved and scheduled appropriately.
  • Audit trails provide complete history of chaos experiments for compliance and analysis.

Safety & Controls

  • Experiment scoping limits chaos experiments to specific namespaces, labels, or resource selectors.
  • Safeguards prevent chaos experiments from affecting critical production workloads.
  • Rollback capabilities enable immediate termination of chaos experiments if issues are detected.
  • Dry-run mode previews experiment effects without actually injecting failures.

Getting Started

kubectl apply -f https://litmuschaos.github.io/litmus/2.0.0/litmus-2.0.0.yaml

Run a pod chaos experiment:

apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: pod-delete-chaos
spec:
  appinfo:
    appns: default
    applabel: app=nginx
    appkind: deployment
  chaosServiceAccount: litmus-admin
  monitoring: true
  jobCleanUpPolicy: retain
  experiments:
  - name: pod-delete
    spec:
      components:
        env:
        - name: TOTAL_CHAOS_DURATION
          value: "30"
        - name: CHAOS_INTERVAL
          value: "10"

Summary

AspectDetails
Release DateAugust 25, 2021
Headline FeaturesComprehensive chaos framework, enhanced experiments, GitOps integration, improved observability
Why it MattersProvides a production-ready platform for validating system resilience and disaster recovery procedures

Litmus 2.0 empowers teams to proactively identify and fix weaknesses in their Kubernetes deployments through systematic chaos engineering practices.