Jobs
Jobs run one or more pods until a specified number of them successfully terminate. Unlike Deployments and StatefulSets that run continuously, Jobs are designed for tasks that run to completion—one-time work like data processing, backups, database migrations, or batch operations.
What Are Jobs?
A Job creates one or more pods and ensures that a specified number of them successfully complete. When pods complete successfully, the Job is considered complete. If a pod fails, the Job can automatically retry by creating new pods.
Why Use Jobs?
Jobs are perfect for:
✅ One-time tasks - Run a task once and stop
✅ Batch processing - Process a batch of data
✅ Database migrations - Run migration scripts
✅ Backups - One-time backup operations
✅ Data transformations - ETL jobs and data processing
✅ Parallel processing - Run multiple pods in parallel
✅ Retry logic - Automatic retries on failure
Job vs Deployment
Jobs and Deployments serve different purposes:
Use Jobs when:
- Task runs to completion
- One-time or batch work
- Need to ensure task succeeds
Use Deployments when:
- Application runs continuously
- Need to maintain replica count
- Stateless service
Job Completion
A Job is complete when:
- Success - Specified number of pods complete successfully
- Failure - Maximum retries exceeded or backoff limit reached
- Manual deletion - Job is deleted (completed pods remain)
Basic Job Example
Here’s a simple Job that runs a task to completion:
apiVersion: batch/v1
kind: Job
metadata:
name: pi
spec:
template:
spec:
containers:
- name: pi
image: perl:5.34
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never
backoffLimit: 4
Key fields:
spec.template- Pod template (required)restartPolicy- Must beNeverorOnFailure(notAlways)backoffLimit- Maximum number of retries (default: 6)
Job Types
Non-Parallel Jobs
Runs a single pod until successful completion:
apiVersion: batch/v1
kind: Job
metadata:
name: single-task
spec:
template:
spec:
containers:
- name: task
image: busybox
command: ["sh", "-c", "echo 'Task completed' && sleep 5"]
restartPolicy: Never
Parallel Jobs with Fixed Completion Count
Runs multiple pods in parallel until a specific number succeed:
apiVersion: batch/v1
kind: Job
metadata:
name: parallel-fixed
spec:
completions: 5 # Need 5 successful completions
parallelism: 2 # Run 2 pods at a time
template:
spec:
containers:
- name: worker
image: busybox
command: ["sh", "-c", "echo Processing item && sleep 10"]
restartPolicy: Never
Parallel Jobs with Work Queue
Multiple pods process items from a queue until queue is empty:
apiVersion: batch/v1
kind: Job
metadata:
name: parallel-queue
spec:
parallelism: 3 # Run 3 pods in parallel
completions: null # No fixed completion count
template:
spec:
containers:
- name: worker
image: busybox
command: ["sh", "-c", "process-queue-items"]
restartPolicy: Never
Job Lifecycle
Job Completion Modes
NonIndexed (Default)
Each pod is independent. Job completes when the required number of pods succeed:
spec:
completions: 5
parallelism: 2
completionMode: NonIndexed # Default
Indexed
Each pod gets a unique index (0 to completions-1). Useful for partitioning work:
spec:
completions: 5
parallelism: 2
completionMode: Indexed
template:
spec:
containers:
- name: worker
image: busybox
command: ["sh", "-c", "process-item-$JOB_COMPLETION_INDEX"]
The JOB_COMPLETION_INDEX environment variable contains the pod’s index.
Retry and Backoff
Jobs automatically retry failed pods with exponential backoff:
spec:
backoffLimit: 4 # Retry up to 4 times
activeDeadlineSeconds: 300 # Kill job after 5 minutes
Backoff behavior:
- First retry: 10 seconds
- Second retry: 20 seconds
- Third retry: 40 seconds
- And so on (exponential backoff)
Job Completion and Cleanup
By default, completed Jobs and their pods remain in the cluster. You can configure automatic cleanup:
apiVersion: batch/v1
kind: Job
metadata:
name: cleanup-example
spec:
ttlSecondsAfterFinished: 100 # Delete job 100 seconds after completion
template:
spec:
containers:
- name: task
image: busybox
command: ["echo", "done"]
restartPolicy: Never
Or use a CronJob’s successfulJobsHistoryLimit and failedJobsHistoryLimit for automatic cleanup.
Common Use Cases
1. Database Migration
apiVersion: batch/v1
kind: Job
metadata:
name: db-migration
spec:
template:
spec:
containers:
- name: migration
image: postgres:15
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-secret
key: url
command:
- /bin/sh
- -c
- |
psql $DATABASE_URL -f /migrations/001_schema.sql
psql $DATABASE_URL -f /migrations/002_data.sql
volumeMounts:
- name: migrations
mountPath: /migrations
volumes:
- name: migrations
configMap:
name: migration-scripts
restartPolicy: Never
backoffLimit: 3
activeDeadlineSeconds: 600
2. Data Processing
apiVersion: batch/v1
kind: Job
metadata:
name: data-processing
spec:
completions: 10
parallelism: 3
template:
spec:
containers:
- name: processor
image: data-processor:latest
command: ["process-batch"]
env:
- name: BATCH_SIZE
value: "1000"
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
restartPolicy: Never
3. Backup Job
apiVersion: batch/v1
kind: Job
metadata:
name: backup
spec:
template:
spec:
containers:
- name: backup
image: backup-tool:latest
command:
- /bin/sh
- -c
- |
backup-database
upload-to-s3 s3://backups/$(date +%Y%m%d).sql
volumeMounts:
- name: backup-dir
mountPath: /backups
volumes:
- name: backup-dir
emptyDir: {}
restartPolicy: OnFailure
backoffLimit: 2
activeDeadlineSeconds: 3600
Best Practices
Set appropriate restartPolicy - Use
NeverorOnFailure, notAlwaysSet backoffLimit - Control how many times to retry
backoffLimit: 4 # Retry up to 4 times
- Use activeDeadlineSeconds - Prevent jobs from running indefinitely
activeDeadlineSeconds: 3600 # 1 hour timeout
- Set resource limits - Jobs should have resource constraints
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
- Use ttlSecondsAfterFinished - Automatically clean up completed jobs
ttlSecondsAfterFinished: 300 # Delete after 5 minutes
Handle failures gracefully - Make sure your application exits with proper exit codes
- Exit code 0: Success
- Non-zero: Failure (triggers retry)
Use parallel jobs wisely - Balance parallelism with resource availability
Monitor job status - Check job completion and pod logs
Use ConfigMaps/Secrets - Store configuration and credentials securely
Consider using CronJobs - For recurring tasks
Common Operations
Create a Job
# Create from YAML
kubectl create -f job.yaml
# Create from command
kubectl create job myjob --image=busybox -- echo "Hello"
View Job Status
# List jobs
kubectl get jobs
# Detailed information
kubectl describe job myjob
# View job pods
kubectl get pods -l job-name=myjob
# View pod logs
kubectl logs -l job-name=myjob
Delete a Job
# Delete job (pods are also deleted)
kubectl delete job myjob
# Delete without cascading (orphans pods)
kubectl delete job myjob --cascade=orphan
Check Job Completion
# Check if job succeeded
kubectl get job myjob -o jsonpath='{.status.conditions[?(@.type=="Complete")].status}'
# View job completion time
kubectl get job myjob -o jsonpath='{.status.completionTime}'
Troubleshooting
Job Not Starting
# Check job events
kubectl describe job myjob
# Check for resource constraints
kubectl get events --sort-by=.metadata.creationTimestamp
# Verify pod template
kubectl get job myjob -o yaml
Pods Failing
# Check pod logs
kubectl logs -l job-name=myjob
# Check pod events
kubectl describe pod -l job-name=myjob
# Check exit codes
kubectl get pods -l job-name=myjob -o jsonpath='{.items[*].status.containerStatuses[*].state.terminated.exitCode}'
Job Hanging
# Check active deadline
kubectl get job myjob -o jsonpath='{.spec.activeDeadlineSeconds}'
# Check job status
kubectl describe job myjob
# Check if pods are stuck
kubectl get pods -l job-name=myjob
See Also
- CronJobs - For scheduled, recurring Jobs
- Deployments - For continuously running applications
- ConfigMaps - For job configuration
- Secrets - For sensitive job data