Backup & Restore (data)

Persistent storage in Kubernetes doesn’t automatically backup your data. You need to implement backup and restore strategies to protect your data from loss, corruption, or accidental deletion. This guide covers different approaches to backing up and restoring data stored in PersistentVolumes.

Why Backup is Important

PersistentVolumes provide data persistence, but they don’t protect against:

  • Data corruption
  • Accidental deletion
  • Storage system failures
  • Cluster disasters
  • Application bugs that delete data

Having a backup strategy ensures you can recover from these scenarios.

Backup Strategies

There are two main approaches to backing up Kubernetes persistent storage:

1. Volume-Level Backups

Backing up at the storage/volume level using snapshots or storage system features.

Characteristics:

  • Fast and efficient
  • Captures entire volume state
  • Requires storage system support
  • Good for disaster recovery

2. Application-Level Backups

Backing up at the application level using application-specific backup tools.

Characteristics:

  • Application-aware
  • Can be selective (specific databases, tables)
  • Application must support backups
  • More granular control
graph TB A[Backup Strategy] --> B[Volume-Level] A --> C[Application-Level] B --> D[CSI Snapshots] B --> E[Storage System Snapshots] B --> F[Velero] C --> G[Database Dumps] C --> H[Application Export] C --> I[File-Level Backup] style A fill:#e1f5ff style B fill:#fff4e1 style C fill:#e8f5e9

Volume-Level Backup: CSI Snapshots

CSI volume snapshots provide a native Kubernetes way to backup volumes.

Creating a backup snapshot:

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: backup-snapshot-class
driver: pd.csi.storage.gke.io
deletionPolicy: Retain  # Keep snapshot data
---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: db-backup-20240115
spec:
  volumeSnapshotClassName: backup-snapshot-class
  source:
    persistentVolumeClaimName: postgres-data

Restoring from snapshot:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-data-restored
spec:
  dataSource:
    name: db-backup-20240115
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
  accessModes:
    - ReadWriteOnce
  storageClassName: fast-ssd
  resources:
    requests:
      storage: 200Gi

Advantages:

  • Native Kubernetes API
  • Fast snapshot creation
  • Storage-efficient (incremental snapshots in many systems)
  • Point-in-time recovery

Limitations:

  • Requires CSI driver with snapshot support
  • Snapshot storage costs money
  • May have performance impact during snapshot creation

Application-Level Backup: Database Example

For databases, application-level backups often provide better features like point-in-time recovery and selective backups.

PostgreSQL Backup

Using pg_dump in a Job:

apiVersion: batch/v1
kind: Job
metadata:
  name: postgres-backup
spec:
  template:
    spec:
      containers:
      - name: backup
        image: postgres:14
        command:
        - /bin/bash
        - -c
        - |
          pg_dump -h postgres-service -U postgres mydb > /backup/db-$(date +%Y%m%d).sql
          gzip /backup/db-$(date +%Y%m%d).sql
        env:
        - name: PGPASSWORD
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: password
        volumeMounts:
        - name: backup-storage
          mountPath: /backup
      volumes:
      - name: backup-storage
        persistentVolumeClaim:
          claimName: backup-storage
      restartPolicy: OnFailure
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: backup-storage
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: standard
  resources:
    requests:
      storage: 100Gi

MySQL Backup

Using mysqldump:

apiVersion: batch/v1
kind: Job
metadata:
  name: mysql-backup
spec:
  template:
    spec:
      containers:
      - name: backup
        image: mysql:8.0
        command:
        - /bin/bash
        - -c
        - |
          mysqldump -h mysql-service -u root -p$MYSQL_ROOT_PASSWORD --all-databases > /backup/db-$(date +%Y%m%d).sql
          gzip /backup/db-$(date +%Y%m%d).sql
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: root-password
        volumeMounts:
        - name: backup-storage
          mountPath: /backup
      volumes:
      - name: backup-storage
        persistentVolumeClaim:
          claimName: backup-storage
      restartPolicy: OnFailure

Velero: Comprehensive Backup Solution

Velero (formerly Heptio Ark) is a popular open-source tool for backing up and restoring Kubernetes resources and persistent volumes.

Installing Velero

# Download Velero CLI
# Install Velero server (example for AWS)
velero install \
    --provider aws \
    --plugins velero/velero-plugin-for-aws:v1.7.0 \
    --bucket my-backup-bucket \
    --secret-file ./credentials-velero

Creating Backups

Backup all resources in a namespace:

velero backup create my-backup --include-namespaces myapp

Backup specific resources:

velero backup create db-backup \
  --include-resources persistentvolumeclaims \
  --selector app=postgres

Scheduled backups:

velero schedule create daily-backup \
  --schedule="0 2 * * *" \
  --include-namespaces production

Restoring from Velero Backup

# List backups
velero backup get

# Describe backup
velero backup describe my-backup

# Restore backup
velero restore create --from-backup my-backup

# Restore to different namespace
velero restore create restore-1 \
  --from-backup my-backup \
  --namespace-mappings production:staging

Backup Strategy: The 3-2-1 Rule

Follow the 3-2-1 backup rule:

  • 3 copies of your data
  • 2 different media types (e.g., local storage + cloud)
  • 1 copy offsite
graph TB A[Production Data] --> B[Backup 1: Local Snapshot] A --> C[Backup 2: Application Backup to Storage] C --> D[Backup 3: Offsite/Cloud Storage] style A fill:#e1f5ff style B fill:#fff4e1 style C fill:#e8f5e9 style D fill:#f3e5f5

Complete Backup Workflow Example

Here’s a complete example combining volume snapshots and application backups:

# 1. Volume snapshot for quick recovery
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: postgres-volume-snapshot-20240115
spec:
  volumeSnapshotClassName: backup-snapshot-class
  source:
    persistentVolumeClaimName: postgres-data
---
# 2. Application-level backup job
apiVersion: batch/v1
kind: CronJob
metadata:
  name: postgres-backup
spec:
  schedule: "0 2 * * *"  # Daily at 2 AM
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: postgres:14
            command:
            - /bin/bash
            - -c
            - |
              # Create database dump
              pg_dump -h postgres -U postgres mydb | gzip > /backup/db-$(date +%Y%m%d-%H%M%S).sql.gz
              
              # Copy to backup storage
              # (Could also copy to S3, GCS, etc.)
              
              # Keep only last 30 days
              find /backup -name "db-*.sql.gz" -mtime +30 -delete
            env:
            - name: PGPASSWORD
              valueFrom:
                secretKeyRef:
                  name: postgres-secret
                  key: password
            volumeMounts:
            - name: backup-storage
              mountPath: /backup
          volumes:
          - name: backup-storage
            persistentVolumeClaim:
              claimName: backup-storage
          restartPolicy: OnFailure

Restore Procedures

Restore from Volume Snapshot

  1. Create PVC from snapshot:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-data-restored
spec:
  dataSource:
    name: postgres-volume-snapshot-20240115
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
  accessModes:
    - ReadWriteOnce
  storageClassName: fast-ssd
  resources:
    requests:
      storage: 200Gi
  1. Update deployment to use restored PVC
  2. Restart application

Restore from Application Backup

  1. Create a new PVC for the restored data
  2. Start the application with the new PVC
  3. Restore the backup:
apiVersion: batch/v1
kind: Job
metadata:
  name: postgres-restore
spec:
  template:
    spec:
      containers:
      - name: restore
        image: postgres:14
        command:
        - /bin/bash
        - -c
        - |
          gunzip -c /backup/db-20240115.sql.gz | psql -h postgres -U postgres mydb
        env:
        - name: PGPASSWORD
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: password
        volumeMounts:
        - name: backup-storage
          mountPath: /backup
      volumes:
      - name: backup-storage
        persistentVolumeClaim:
          claimName: backup-storage
      restartPolicy: OnFailure

Testing Backups

Regularly test your backups to ensure they work:

  1. Test restore procedures - Periodically restore from backups to verify they work
  2. Verify data integrity - Check that restored data is complete and correct
  3. Measure restore time - Know how long restores take (RTO - Recovery Time Objective)
  4. Document procedures - Document restore steps so anyone can perform them

Best Practices

  1. Automate backups - Use CronJobs or Velero schedules for regular backups
  2. Test restores regularly - Don’t wait for a disaster to test restore procedures
  3. Store backups offsite - Keep copies in different locations/clouds
  4. Encrypt backups - Protect sensitive backup data
  5. Version backups - Keep multiple versions (daily, weekly, monthly)
  6. Monitor backup jobs - Ensure backups complete successfully
  7. Document procedures - Document backup and restore procedures
  8. Set retention policies - Automatically clean up old backups
  9. Use both strategies - Combine volume-level and application-level backups
  10. Regular review - Review and update backup strategies regularly

Backup Tools Comparison

ToolTypeProsCons
CSI SnapshotsVolume-levelNative K8s, fastRequires CSI support
VeleroApplication + VolumeComprehensive, cross-cloudAdditional component
Application toolsApplication-levelApplication-awarePer-application setup
Storage systemVolume-levelStorage-nativeVendor-specific

See Also