Backup & Restore (data)
Persistent storage in Kubernetes doesn’t automatically backup your data. You need to implement backup and restore strategies to protect your data from loss, corruption, or accidental deletion. This guide covers different approaches to backing up and restoring data stored in PersistentVolumes.
Why Backup is Important
PersistentVolumes provide data persistence, but they don’t protect against:
- Data corruption
- Accidental deletion
- Storage system failures
- Cluster disasters
- Application bugs that delete data
Having a backup strategy ensures you can recover from these scenarios.
Backup Strategies
There are two main approaches to backing up Kubernetes persistent storage:
1. Volume-Level Backups
Backing up at the storage/volume level using snapshots or storage system features.
Characteristics:
- Fast and efficient
- Captures entire volume state
- Requires storage system support
- Good for disaster recovery
2. Application-Level Backups
Backing up at the application level using application-specific backup tools.
Characteristics:
- Application-aware
- Can be selective (specific databases, tables)
- Application must support backups
- More granular control
Volume-Level Backup: CSI Snapshots
CSI volume snapshots provide a native Kubernetes way to backup volumes.
Creating a backup snapshot:
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: backup-snapshot-class
driver: pd.csi.storage.gke.io
deletionPolicy: Retain # Keep snapshot data
---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: db-backup-20240115
spec:
volumeSnapshotClassName: backup-snapshot-class
source:
persistentVolumeClaimName: postgres-data
Restoring from snapshot:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-data-restored
spec:
dataSource:
name: db-backup-20240115
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
accessModes:
- ReadWriteOnce
storageClassName: fast-ssd
resources:
requests:
storage: 200Gi
Advantages:
- Native Kubernetes API
- Fast snapshot creation
- Storage-efficient (incremental snapshots in many systems)
- Point-in-time recovery
Limitations:
- Requires CSI driver with snapshot support
- Snapshot storage costs money
- May have performance impact during snapshot creation
Application-Level Backup: Database Example
For databases, application-level backups often provide better features like point-in-time recovery and selective backups.
PostgreSQL Backup
Using pg_dump in a Job:
apiVersion: batch/v1
kind: Job
metadata:
name: postgres-backup
spec:
template:
spec:
containers:
- name: backup
image: postgres:14
command:
- /bin/bash
- -c
- |
pg_dump -h postgres-service -U postgres mydb > /backup/db-$(date +%Y%m%d).sql
gzip /backup/db-$(date +%Y%m%d).sql
env:
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
volumeMounts:
- name: backup-storage
mountPath: /backup
volumes:
- name: backup-storage
persistentVolumeClaim:
claimName: backup-storage
restartPolicy: OnFailure
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: backup-storage
spec:
accessModes:
- ReadWriteOnce
storageClassName: standard
resources:
requests:
storage: 100Gi
MySQL Backup
Using mysqldump:
apiVersion: batch/v1
kind: Job
metadata:
name: mysql-backup
spec:
template:
spec:
containers:
- name: backup
image: mysql:8.0
command:
- /bin/bash
- -c
- |
mysqldump -h mysql-service -u root -p$MYSQL_ROOT_PASSWORD --all-databases > /backup/db-$(date +%Y%m%d).sql
gzip /backup/db-$(date +%Y%m%d).sql
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: root-password
volumeMounts:
- name: backup-storage
mountPath: /backup
volumes:
- name: backup-storage
persistentVolumeClaim:
claimName: backup-storage
restartPolicy: OnFailure
Velero: Comprehensive Backup Solution
Velero (formerly Heptio Ark) is a popular open-source tool for backing up and restoring Kubernetes resources and persistent volumes.
Installing Velero
# Download Velero CLI
# Install Velero server (example for AWS)
velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.7.0 \
--bucket my-backup-bucket \
--secret-file ./credentials-velero
Creating Backups
Backup all resources in a namespace:
velero backup create my-backup --include-namespaces myapp
Backup specific resources:
velero backup create db-backup \
--include-resources persistentvolumeclaims \
--selector app=postgres
Scheduled backups:
velero schedule create daily-backup \
--schedule="0 2 * * *" \
--include-namespaces production
Restoring from Velero Backup
# List backups
velero backup get
# Describe backup
velero backup describe my-backup
# Restore backup
velero restore create --from-backup my-backup
# Restore to different namespace
velero restore create restore-1 \
--from-backup my-backup \
--namespace-mappings production:staging
Backup Strategy: The 3-2-1 Rule
Follow the 3-2-1 backup rule:
- 3 copies of your data
- 2 different media types (e.g., local storage + cloud)
- 1 copy offsite
Complete Backup Workflow Example
Here’s a complete example combining volume snapshots and application backups:
# 1. Volume snapshot for quick recovery
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: postgres-volume-snapshot-20240115
spec:
volumeSnapshotClassName: backup-snapshot-class
source:
persistentVolumeClaimName: postgres-data
---
# 2. Application-level backup job
apiVersion: batch/v1
kind: CronJob
metadata:
name: postgres-backup
spec:
schedule: "0 2 * * *" # Daily at 2 AM
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: postgres:14
command:
- /bin/bash
- -c
- |
# Create database dump
pg_dump -h postgres -U postgres mydb | gzip > /backup/db-$(date +%Y%m%d-%H%M%S).sql.gz
# Copy to backup storage
# (Could also copy to S3, GCS, etc.)
# Keep only last 30 days
find /backup -name "db-*.sql.gz" -mtime +30 -delete
env:
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
volumeMounts:
- name: backup-storage
mountPath: /backup
volumes:
- name: backup-storage
persistentVolumeClaim:
claimName: backup-storage
restartPolicy: OnFailure
Restore Procedures
Restore from Volume Snapshot
- Create PVC from snapshot:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-data-restored
spec:
dataSource:
name: postgres-volume-snapshot-20240115
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
accessModes:
- ReadWriteOnce
storageClassName: fast-ssd
resources:
requests:
storage: 200Gi
- Update deployment to use restored PVC
- Restart application
Restore from Application Backup
- Create a new PVC for the restored data
- Start the application with the new PVC
- Restore the backup:
apiVersion: batch/v1
kind: Job
metadata:
name: postgres-restore
spec:
template:
spec:
containers:
- name: restore
image: postgres:14
command:
- /bin/bash
- -c
- |
gunzip -c /backup/db-20240115.sql.gz | psql -h postgres -U postgres mydb
env:
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
volumeMounts:
- name: backup-storage
mountPath: /backup
volumes:
- name: backup-storage
persistentVolumeClaim:
claimName: backup-storage
restartPolicy: OnFailure
Testing Backups
Regularly test your backups to ensure they work:
- Test restore procedures - Periodically restore from backups to verify they work
- Verify data integrity - Check that restored data is complete and correct
- Measure restore time - Know how long restores take (RTO - Recovery Time Objective)
- Document procedures - Document restore steps so anyone can perform them
Best Practices
- Automate backups - Use CronJobs or Velero schedules for regular backups
- Test restores regularly - Don’t wait for a disaster to test restore procedures
- Store backups offsite - Keep copies in different locations/clouds
- Encrypt backups - Protect sensitive backup data
- Version backups - Keep multiple versions (daily, weekly, monthly)
- Monitor backup jobs - Ensure backups complete successfully
- Document procedures - Document backup and restore procedures
- Set retention policies - Automatically clean up old backups
- Use both strategies - Combine volume-level and application-level backups
- Regular review - Review and update backup strategies regularly
Backup Tools Comparison
| Tool | Type | Pros | Cons |
|---|---|---|---|
| CSI Snapshots | Volume-level | Native K8s, fast | Requires CSI support |
| Velero | Application + Volume | Comprehensive, cross-cloud | Additional component |
| Application tools | Application-level | Application-aware | Per-application setup |
| Storage system | Volume-level | Storage-native | Vendor-specific |
See Also
- Snapshots & Cloning - Volume snapshots and cloning
- PVs & PVCs - Persistent Volumes and Persistent Volume Claims
- CSI Persistent Volumes - Container Storage Interface