Capabilities

Linux capabilities break down root privileges into smaller, specific permissions. Instead of running as root (which has all privileges), containers can be granted only the specific capabilities they need. Think of it like giving someone a key to a specific room instead of master keys to the entire building.

What are Capabilities?

Traditional Unix permissions are binary: you’re either root (all-powerful) or a regular user (limited). Capabilities provide granular control, allowing processes to have specific privileges without full root access.

Common Capabilities

  • NET_BIND_SERVICE - Bind to ports < 1024
  • CHOWN - Change file ownership
  • DAC_OVERRIDE - Bypass file read/write permissions
  • NET_RAW - Create raw sockets (used by ping)
  • SYS_ADMIN - Administrative operations
  • SYS_TIME - Set system time

How Capabilities Work

flowchart TD A[Container Process] --> B{Needs Privilege?} B --> C{Has Capability?} C -->|Yes| D[Operation Allowed] C -->|No| E[Operation Denied] style A fill:#e1f5ff style D fill:#e8f5e9 style E fill:#ffebee

Managing Capabilities

Dropping All Capabilities

The most secure approach—drop everything, then add back only what’s needed:

apiVersion: v1
kind: Pod
metadata:
  name: minimal-caps
spec:
  containers:
  - name: app
    image: nginx:latest
    securityContext:
      capabilities:
        drop:
        - ALL

Adding Specific Capabilities

Add only the capabilities your application needs:

apiVersion: v1
kind: Pod
metadata:
  name: web-server
spec:
  containers:
  - name: nginx
    image: nginx:latest
    securityContext:
      capabilities:
        drop:
        - ALL
        add:
        - NET_BIND_SERVICE  # Allow binding to port 80

Common Capability Patterns

Web Server (needs to bind to port 80)

securityContext:
  capabilities:
    drop:
    - ALL
    add:
    - NET_BIND_SERVICE

Database (may need file operations)

securityContext:
  capabilities:
    drop:
    - ALL
    add:
    - CHOWN
    - DAC_OVERRIDE
    - FOWNER

Network Tools (needs raw sockets)

securityContext:
  capabilities:
    drop:
    - ALL
    add:
    - NET_RAW

Complete Example

Here’s a hardened pod with minimal capabilities:

apiVersion: v1
kind: Pod
metadata:
  name: secure-app
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
  containers:
  - name: app
    image: nginx:latest
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
        add:
        - NET_BIND_SERVICE
    ports:
    - containerPort: 80

Capability Categories

File System Capabilities

  • CHOWN - Change file ownership
  • DAC_OVERRIDE - Bypass file permissions
  • DAC_READ_SEARCH - Bypass read/search permissions
  • FOWNER - Bypass permission checks for file operations
  • FSETID - Don’t clear setuid/setgid on file modification
  • SETGID - Manipulate process GIDs
  • SETUID - Manipulate process UIDs

Network Capabilities

  • NET_BIND_SERVICE - Bind to ports < 1024
  • NET_RAW - Create raw sockets
  • NET_ADMIN - Network administration (configure interfaces, routing)

System Capabilities

  • SYS_ADMIN - System administration (mount, swapon, etc.)
  • SYS_TIME - Set system time
  • SYS_MODULE - Load/unload kernel modules
  • SYS_PTRACE - Trace processes

Best Practices

  1. Drop ALL first - Always start by dropping all capabilities
  2. Add minimum needed - Only add capabilities that are absolutely required
  3. Document exceptions - If privileged capabilities are needed, document why
  4. Test thoroughly - Verify applications work with minimal capabilities
  5. Regular audits - Review capabilities periodically
  6. Use alternatives - Consider alternatives (e.g., run on non-privileged ports instead of NET_BIND_SERVICE)

Common Mistakes

❌ Not Dropping Capabilities

# Bad: Container has all capabilities
securityContext:
  capabilities:
    add:
    - NET_BIND_SERVICE

✅ Dropping All First

# Good: Drop all, then add only what's needed
securityContext:
  capabilities:
    drop:
    - ALL
    add:
    - NET_BIND_SERVICE

❌ Using Privileged Instead

# Bad: Too permissive
securityContext:
  privileged: true

✅ Using Specific Capabilities

# Good: Minimal privileges
securityContext:
  capabilities:
    drop:
    - ALL
    add:
    - NET_BIND_SERVICE

Troubleshooting

Permission Denied Errors

If you see permission denied errors:

  1. Check which capability is needed
  2. Verify the capability is added
  3. Test with capsh to see current capabilities
# Check capabilities in a running container
kubectl exec <pod-name> -- capsh --print

Finding Required Capabilities

Use strace to identify system calls:

# Run application and trace system calls
kubectl exec <pod-name> -- strace -e trace=open,openat,chown <command>

# Look for EPERM (permission denied) errors

Testing Capabilities

Test if a capability works:

# Test NET_BIND_SERVICE capability
kubectl run test-pod --image=nginx --overrides='
{
  "spec": {
    "containers": [{
      "name": "nginx",
      "securityContext": {
        "capabilities": {
          "drop": ["ALL"],
          "add": ["NET_BIND_SERVICE"]
        }
      }
    }]
  }
}'

Capability Alternatives

Sometimes you can avoid needing capabilities:

Instead of NET_BIND_SERVICE

Run on a non-privileged port and use a service:

containers:
- name: app
  ports:
  - containerPort: 8080  # Non-privileged port
---
apiVersion: v1
kind: Service
metadata:
  name: app-service
spec:
  ports:
  - port: 80
    targetPort: 8080

Instead of SYS_TIME

Use NTP or external time synchronization.

Instead of SYS_ADMIN

Avoid mounting filesystems; use volumes instead.

See Also