Capabilities
Linux capabilities break down root privileges into smaller, specific permissions. Instead of running as root (which has all privileges), containers can be granted only the specific capabilities they need. Think of it like giving someone a key to a specific room instead of master keys to the entire building.
What are Capabilities?
Traditional Unix permissions are binary: you’re either root (all-powerful) or a regular user (limited). Capabilities provide granular control, allowing processes to have specific privileges without full root access.
Common Capabilities
- NET_BIND_SERVICE - Bind to ports < 1024
- CHOWN - Change file ownership
- DAC_OVERRIDE - Bypass file read/write permissions
- NET_RAW - Create raw sockets (used by ping)
- SYS_ADMIN - Administrative operations
- SYS_TIME - Set system time
How Capabilities Work
Managing Capabilities
Dropping All Capabilities
The most secure approach—drop everything, then add back only what’s needed:
apiVersion: v1
kind: Pod
metadata:
name: minimal-caps
spec:
containers:
- name: app
image: nginx:latest
securityContext:
capabilities:
drop:
- ALL
Adding Specific Capabilities
Add only the capabilities your application needs:
apiVersion: v1
kind: Pod
metadata:
name: web-server
spec:
containers:
- name: nginx
image: nginx:latest
securityContext:
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE # Allow binding to port 80
Common Capability Patterns
Web Server (needs to bind to port 80)
securityContext:
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE
Database (may need file operations)
securityContext:
capabilities:
drop:
- ALL
add:
- CHOWN
- DAC_OVERRIDE
- FOWNER
Network Tools (needs raw sockets)
securityContext:
capabilities:
drop:
- ALL
add:
- NET_RAW
Complete Example
Here’s a hardened pod with minimal capabilities:
apiVersion: v1
kind: Pod
metadata:
name: secure-app
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
containers:
- name: app
image: nginx:latest
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE
ports:
- containerPort: 80
Capability Categories
File System Capabilities
- CHOWN - Change file ownership
- DAC_OVERRIDE - Bypass file permissions
- DAC_READ_SEARCH - Bypass read/search permissions
- FOWNER - Bypass permission checks for file operations
- FSETID - Don’t clear setuid/setgid on file modification
- SETGID - Manipulate process GIDs
- SETUID - Manipulate process UIDs
Network Capabilities
- NET_BIND_SERVICE - Bind to ports < 1024
- NET_RAW - Create raw sockets
- NET_ADMIN - Network administration (configure interfaces, routing)
System Capabilities
- SYS_ADMIN - System administration (mount, swapon, etc.)
- SYS_TIME - Set system time
- SYS_MODULE - Load/unload kernel modules
- SYS_PTRACE - Trace processes
Best Practices
- Drop ALL first - Always start by dropping all capabilities
- Add minimum needed - Only add capabilities that are absolutely required
- Document exceptions - If privileged capabilities are needed, document why
- Test thoroughly - Verify applications work with minimal capabilities
- Regular audits - Review capabilities periodically
- Use alternatives - Consider alternatives (e.g., run on non-privileged ports instead of NET_BIND_SERVICE)
Common Mistakes
❌ Not Dropping Capabilities
# Bad: Container has all capabilities
securityContext:
capabilities:
add:
- NET_BIND_SERVICE
✅ Dropping All First
# Good: Drop all, then add only what's needed
securityContext:
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE
❌ Using Privileged Instead
# Bad: Too permissive
securityContext:
privileged: true
✅ Using Specific Capabilities
# Good: Minimal privileges
securityContext:
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE
Troubleshooting
Permission Denied Errors
If you see permission denied errors:
- Check which capability is needed
- Verify the capability is added
- Test with
capshto see current capabilities
# Check capabilities in a running container
kubectl exec <pod-name> -- capsh --print
Finding Required Capabilities
Use strace to identify system calls:
# Run application and trace system calls
kubectl exec <pod-name> -- strace -e trace=open,openat,chown <command>
# Look for EPERM (permission denied) errors
Testing Capabilities
Test if a capability works:
# Test NET_BIND_SERVICE capability
kubectl run test-pod --image=nginx --overrides='
{
"spec": {
"containers": [{
"name": "nginx",
"securityContext": {
"capabilities": {
"drop": ["ALL"],
"add": ["NET_BIND_SERVICE"]
}
}
}]
}
}'
Capability Alternatives
Sometimes you can avoid needing capabilities:
Instead of NET_BIND_SERVICE
Run on a non-privileged port and use a service:
containers:
- name: app
ports:
- containerPort: 8080 # Non-privileged port
---
apiVersion: v1
kind: Service
metadata:
name: app-service
spec:
ports:
- port: 80
targetPort: 8080
Instead of SYS_TIME
Use NTP or external time synchronization.
Instead of SYS_ADMIN
Avoid mounting filesystems; use volumes instead.
See Also
- Workload Hardening - Overview of workload security
- SecurityContext - Pod security context
- Seccomp & AppArmor - System call restrictions
- Pod Security Standards - Built-in security profiles