etcd Basics
etcd is a distributed, consistent key-value store that serves as Kubernetes’ backing store. Understanding etcd is crucial because it’s where all cluster state lives—every pod, service, deployment, and configuration is stored in etcd. If etcd fails or loses data, your cluster loses its “memory” and can’t function properly.
Think of etcd like a filing cabinet for Kubernetes. Every piece of information the cluster needs to remember—what pods are running, what services exist, what configurations are set—is stored in etcd. Just as you can’t run a company without records of what’s happening, Kubernetes can’t function without etcd storing its state.
What is etcd?
etcd is a distributed key-value store designed for reliable storage of data that needs to be accessed by a distributed system or cluster of machines. It’s written in Go and uses the Raft consensus algorithm to ensure consistency across a cluster of etcd nodes.
Key Characteristics
- Distributed - Runs as a cluster of multiple nodes
- Consistent - Strong consistency guarantees (CP in CAP theorem)
- Reliable - Fault-tolerant and durable
- Fast - Optimized for read and write performance
- Simple - Key-value interface, easy to understand
Why Kubernetes Uses etcd
Kubernetes chose etcd as its backing store because it provides:
Consistency
Kubernetes needs strong consistency—all components must see the same state. etcd provides linearizable reads and writes, meaning all operations appear to happen in a single, well-defined order. This is essential for Kubernetes’ control plane to make correct decisions.
Reliability
etcd is designed to be fault-tolerant. It can survive node failures as long as a majority of nodes (quorum) remain available. For a 3-node cluster, 1 node can fail. For a 5-node cluster, 2 nodes can fail.
Watch Support
Kubernetes components need to be notified when state changes. etcd provides efficient watch functionality that allows clients to subscribe to changes in the key-value store. This is how controllers and other components react to changes.
Performance
etcd is optimized for the read-heavy workload of Kubernetes. Most operations are reads (checking current state), with writes happening less frequently (when resources are created or updated).
etcd in Kubernetes
In Kubernetes, etcd stores everything:
What Gets Stored
- All API objects - Pods, services, deployments, nodes, namespaces, etc.
- Configuration - ConfigMaps, Secrets, RBAC policies
- State - Current state of all resources (status fields)
- Metadata - Labels, annotations, resource versions
- Cluster configuration - Cluster-level settings
Storage Structure
etcd stores data in a hierarchical key structure:
/registry/pods/default/my-pod
/registry/services/default/my-service
/registry/deployments/production/my-app
/registry/nodes/node-1
/registry/namespaces/default
Each resource is stored as a JSON document containing:
- Spec - Desired state (what you want)
- Status - Current state (what actually is)
- Metadata - Name, labels, annotations, etc.
etcd Operations
Reads
When you run kubectl get pods, the API server reads from etcd:
- API server receives GET request
- API server reads from etcd:
/registry/pods/default/* - API server filters and processes results
- API server returns response to kubectl
Reads are fast because etcd keeps data in memory (with disk persistence for durability).
Writes
When you create a pod, the API server writes to etcd:
- API server receives POST request
- API server validates request
- API server writes to etcd:
/registry/pods/default/my-pod - etcd confirms write
- API server returns success
Writes go through the Raft consensus algorithm to ensure all etcd nodes agree.
Watches
Components watch etcd (through the API server) for changes:
- Component opens watch on
/registry/pods/default - etcd streams changes as they occur
- Component receives notification of changes
- Component reacts to changes (e.g., controller reconciles)
Watches are efficient—they only send changes, not full state.
etcd Clustering
For production, etcd runs as a cluster for high availability and performance.
Cluster Size
Typical etcd cluster sizes:
- 1 node - Development/testing only (no fault tolerance)
- 3 nodes - Can survive 1 node failure (minimum for production)
- 5 nodes - Can survive 2 node failures (for larger clusters)
- 7 nodes - Rarely needed (for very large clusters)
Odd numbers are required for quorum (majority voting).
Quorum
etcd uses Raft consensus, which requires a majority (quorum) for writes:
- 3 nodes - Need 2 nodes for quorum (can lose 1)
- 5 nodes - Need 3 nodes for quorum (can lose 2)
- 7 nodes - Need 4 nodes for quorum (can lose 3)
If quorum is lost, etcd becomes read-only and can’t accept writes.
etcd Cluster Architecture
Leader - Handles all write requests and replicates to followers Followers - Receive replication from leader, can handle read requests Election - If leader fails, followers elect a new leader
Raft Consensus
etcd uses the Raft consensus algorithm to ensure all nodes agree on the state.
How Raft Works
- Leader election - One node becomes leader
- Log replication - Leader replicates writes to followers
- Commit - Write committed when majority acknowledge
- Consistency - All nodes see same committed state
Why Raft?
Raft provides:
- Strong consistency - All nodes see same state
- Fault tolerance - Survives node failures
- Understandability - Simpler than alternatives like Paxos
- Performance - Efficient for Kubernetes’ workload
etcd and API Server
The API server is the only component that directly communicates with etcd:
Why this design?
- Security - etcd not exposed directly
- Abstraction - API server provides higher-level API
- Validation - All writes go through API server validation
- Versioning - API server handles API versioning
Components never talk to etcd directly—they always go through the API server.
etcd Performance
Read Performance
etcd is optimized for reads:
- In-memory storage - Fast access to current state
- Consistent reads - All reads see committed state
- Efficient watches - Only sends changes, not full state
Write Performance
Writes are slower than reads because they:
- Go through Raft consensus
- Must be replicated to majority
- Are persisted to disk
For Kubernetes, this is acceptable because writes are less frequent than reads.
Size Considerations
etcd performance degrades with size:
- Recommended - Keep etcd database under 8GB
- Maximum - Can handle up to ~50GB (with performance impact)
- Compaction - etcd compacts old revisions automatically
etcd Backup and Restore
Why Backup?
etcd contains all cluster state. If etcd data is lost:
- Cluster loses all configuration
- All resource definitions are gone
- Cluster must be rebuilt from scratch
Regular backups are essential for disaster recovery.
Backup Process
# Backup etcd
ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-snapshot.db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
Restore Process
# Restore etcd from backup
ETCDCTL_API=3 etcdctl snapshot restore /backup/etcd-snapshot.db \
--data-dir=/var/lib/etcd-restore
Restore should be done carefully and typically requires cluster downtime.
Backup Best Practices
- Regular backups - Daily or more frequent for production
- Test restores - Regularly test that backups can be restored
- Off-site storage - Store backups outside the cluster
- Automation - Automate backup process
- Retention - Keep multiple backup versions
etcd Maintenance
Compaction
etcd keeps a history of all changes. Over time, this history grows. Compaction removes old history:
# Compact etcd (keep last 1000 revisions)
ETCDCTL_API=3 etcdctl compact 1000
Kubernetes typically handles this automatically.
Defragmentation
As etcd writes and deletes data, the database can become fragmented. Defragmentation reorganizes data:
# Defragment etcd
ETCDCTL_API=3 etcdctl defrag
Should be done during maintenance windows as it can impact performance.
Health Checks
Monitor etcd health:
- Node health - Check if nodes are responding
- Leader health - Ensure leader is functioning
- Database size - Monitor database growth
- Performance - Monitor read/write latency
etcd Security
Authentication
etcd supports authentication:
- Client certificates - Mutual TLS authentication
- Username/password - Basic authentication (less secure)
Kubernetes typically uses client certificates.
Encryption
etcd data can be encrypted at rest:
- Encryption at rest - Encrypt data on disk
- TLS in transit - Encrypt communication between nodes
Access Control
etcd supports role-based access control (RBAC) to limit what clients can do.
Key Takeaways
- etcd is Kubernetes’ backing store—all cluster state is stored there
- etcd provides strong consistency, which is essential for Kubernetes
- etcd runs as a cluster for high availability (typically 3 or 5 nodes)
- The API server is the only component that directly talks to etcd
- etcd uses Raft consensus to ensure all nodes agree on state
- Regular backups are essential—etcd data loss means cluster data loss
- etcd is optimized for reads, which matches Kubernetes’ workload
See Also
- Kubernetes Architecture - How etcd fits in
- Control Plane Components - How API server uses etcd
- Backup & Restore - Backing up etcd
- High Availability Overview - etcd clustering for HA