Datadog

Datadog is a cloud-native monitoring and security platform that provides comprehensive observability for applications and infrastructure. It offers log collection, metrics monitoring, distributed tracing, and security monitoring, all integrated into a single platform.

What is Datadog?

Datadog provides:

  • Log Management - Centralized log collection and analysis
  • APM - Application Performance Monitoring with distributed tracing
  • Infrastructure Monitoring - Metrics for hosts, containers, and services
  • Real User Monitoring - Frontend performance monitoring
  • Security Monitoring - Security analytics and threat detection
  • Dashboards - Customizable visualizations
graph TB A[Kubernetes Cluster] --> B[Datadog Agent] B --> C[Datadog Platform] D[Application Logs] --> B E[System Logs] --> B F[Container Metrics] --> B G[APM Traces] --> B C --> H[Log Management] C --> I[Metrics] C --> J[APM] C --> K[Dashboards] style A fill:#e1f5ff style B fill:#e8f5e9 style C fill:#fff4e1 style H fill:#f3e5f5

Datadog Agent

The Datadog Agent is a lightweight daemon that runs on each node to collect logs, metrics, and traces:

  • Log collection - Collects logs from containers and applications
  • Metrics collection - System, container, and application metrics
  • APM - Distributed tracing
  • Autodiscovery - Automatically discovers services and configurations
  • Kubernetes integration - Native Kubernetes support

Installation

# Add Datadog Helm repository
helm repo add datadog https://helm.datadoghq.com
helm repo update

# Create secret with API key
kubectl create secret generic datadog-secret \
  --from-literal api-key=<YOUR_API_KEY> \
  --namespace datadog

# Install Datadog Agent
helm install datadog-agent datadog/datadog \
  --namespace datadog \
  --create-namespace \
  --set datadog.apiKeyExistingSecret=datadog-secret \
  --set datadog.logs.enabled=true \
  --set datadog.logs.containerCollectAll=true \
  --set datadog.apm.enabled=true \
  --set clusterAgent.enabled=true

Manual Deployment

Datadog Agent DaemonSet

apiVersion: v1
kind: Secret
metadata:
  name: datadog-secret
  namespace: datadog
type: Opaque
stringData:
  api-key: <YOUR_DATADOG_API_KEY>
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: datadog-agent
  namespace: datadog
spec:
  selector:
    matchLabels:
      app: datadog-agent
  template:
    metadata:
      labels:
        app: datadog-agent
    spec:
      serviceAccountName: datadog-agent
      containers:
      - image: gcr.io/datadoghq/agent:7
        name: datadog-agent
        env:
        - name: DD_API_KEY
          valueFrom:
            secretKeyRef:
              name: datadog-secret
              key: api-key
        - name: DD_SITE
          value: datadoghq.com
        - name: DD_LOGS_ENABLED
          value: "true"
        - name: DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL
          value: "true"
        - name: DD_CONTAINER_EXCLUDE
          value: "name:datadog-agent"
        - name: DD_APM_ENABLED
          value: "true"
        - name: DD_COLLECT_KUBERNETES_EVENTS
          value: "true"
        - name: DD_LEADER_ELECTION
          value: "true"
        - name: KUBERNETES
          value: "true"
        - name: DD_KUBERNETES_KUBELET_HOST
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        volumeMounts:
        - name: dockersocket
          mountPath: /var/run/docker.sock
        - name: procdir
          mountPath: /host/proc
          readOnly: true
        - name: cgroups
          mountPath: /host/sys/fs/cgroup
          readOnly: true
        - name: pointerdir
          mountPath: /opt/datadog-agent/run
        resources:
          requests:
            memory: "256Mi"
            cpu: "200m"
          limits:
            memory: "512Mi"
            cpu: "500m"
      volumes:
      - name: dockersocket
        hostPath:
          path: /var/run/docker.sock
      - name: procdir
        hostPath:
          path: /proc
      - name: cgroups
        hostPath:
          path: /sys/fs/cgroup
      - name: pointerdir
        emptyDir: {}
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: datadog-agent
  namespace: datadog
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: datadog-agent
rules:
- apiGroups: [""]
  resources:
  - services
  - events
  - endpoints
  - pods
  - nodes
  - componentstatuses
  verbs: ["get", "list", "watch"]
- apiGroups: ["quota.openshift.io"]
  resources:
  - clusterresourcequotas
  verbs: ["get", "list"]
- apiGroups: ["apps"]
  resources:
  - deployments
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: datadog-agent
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: datadog-agent
subjects:
- kind: ServiceAccount
  name: datadog-agent
  namespace: datadog

Log Collection Configuration

Container Log Collection

The Datadog Agent automatically collects logs from all containers when enabled:

env:
- name: DD_LOGS_ENABLED
  value: "true"
- name: DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL
  value: "true"

Selective Log Collection

Collect logs only from specific containers using annotations:

apiVersion: v1
kind: Pod
metadata:
  name: my-app
  annotations:
    ad.datadoghq.com/my-app.logs: '[{"source": "myapp", "service": "my-service"}]'
spec:
  containers:
  - name: my-app
    image: my-app:latest

Log Processing Rules

Configure log processing in the agent:

env:
- name: DD_LOGS_CONFIG_PROCESSING_RULES
  value: |
    [{
      "type": "multi_line",
      "name": "log_start_with_date",
      "pattern": "\\d{4}-\\d{2}-\\d{2}"
    }]

Service Tags

Add service tags for better organization:

env:
- name: DD_TAGS
  value: "env:production service:my-app team:backend"

Kubernetes Metadata

Automatic Kubernetes metadata enrichment:

env:
- name: DD_KUBERNETES_COLLECT_METADATA_TAGS
  value: "true"
- name: DD_KUBERNETES_METADATA_TAG_UPDATE_FREQ
  value: "60"

Autodiscovery

Autodiscovery automatically configures log collection based on pod annotations:

Annotation-Based Configuration

apiVersion: v1
kind: Pod
metadata:
  name: my-app
  annotations:
    ad.datadoghq.com/my-app.logs: |
      [
        {
          "source": "python",
          "service": "my-service",
          "log_processing_rules": [
            {
              "type": "multi_line",
              "name": "log_start_with_date",
              "pattern": "\\d{4}-\\d{2}-\\d{2}"
            }
          ]
        }
      ]
spec:
  containers:
  - name: my-app
    image: my-app:latest

ConfigMap-Based Configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: datadog-logs-config
  namespace: datadog
data:
  my-service.yaml: |
    ad_identifiers:
      - my-app
    logs:
      - type: file
        path: /var/log/app.log
        source: python
        service: my-service
---
# In agent configuration
env:
- name: DD_LOGS_CONFIG_AUTODISCOVERY_PATHS
  value: "/etc/datadog-agent/conf.d/logs.d/auto-discovery"

Log Queries

In Datadog Log Explorer:

  • Search: service:my-service
  • Filter by time range
  • Add facets for filtering

Advanced Queries

service:my-service status:error
source:nginx status:>=400
env:production @http.status_code:[400 TO 499]
kubernetes.namespace:production @message:error

Log Facets

Create facets for commonly filtered fields:

  • service
  • source
  • status
  • env
  • kubernetes.namespace
  • kubernetes.pod_name

APM Integration

Enable APM for distributed tracing:

env:
- name: DD_APM_ENABLED
  value: "true"
- name: DD_APM_NON_LOCAL_TRAFFIC
  value: "true"

Expose APM port:

ports:
- containerPort: 8126
  name: apm
  protocol: TCP

Application Instrumentation

For Python applications:

from ddtrace import patch_all
patch_all()

# Your application code

For Node.js applications:

const tracer = require('dd-trace').init({
  service: 'my-service',
  env: 'production'
});

Dashboards

Creating Dashboards

  1. Go to Dashboards > New Dashboard
  2. Add widgets:
    • Timeseries - Metrics over time
    • Log Stream - Log events
    • Heatmap - Distribution visualization
    • Query Value - Single value
    • Top List - Ranked list

Log-Based Widgets

Log Volume:

  • Widget: Timeseries
  • Query: *
  • Group by: service

Error Rate:

  • Widget: Query Value
  • Query: status:error
  • Aggregation: Count

Alerts and Monitors

Log-Based Monitors

  1. Go to Monitors > New Monitor
  2. Select Logs
  3. Configure:
    • Query: status:error
    • Alert conditions: Count > threshold
    • Notification channels

Alert Conditions

  • Threshold - Alert when count exceeds value
  • Anomaly - Alert on anomalies
  • Forecast - Alert based on predictions

Best Practices

1. Resource Management

Set appropriate resource limits:

resources:
  requests:
    memory: "256Mi"
    cpu: "200m"
  limits:
    memory: "512Mi"
    cpu: "500m"

2. Log Sampling

Use sampling for high-volume logs:

env:
- name: DD_LOGS_CONFIG_PROCESSING_RULES
  value: |
    [{
      "type": "sample_rate",
      "sample_rate": 0.1,
      "name": "sample_logs"
    }]

3. Service Tags

Use consistent tagging:

  • env: environment (production, staging, dev)
  • service: service name
  • version: application version
  • team: team name

4. Log Parsing

Configure proper log parsing:

  • Use source auto-detection when possible
  • Add custom parsing rules for structured logs
  • Parse JSON logs automatically

5. Cost Optimization

  • Use log sampling for verbose logs
  • Filter unnecessary logs at collection
  • Set appropriate log retention
  • Use log exclusion filters

6. Security

  • Store API key in secrets
  • Use RBAC for agent permissions
  • Encrypt agent communication (TLS)
  • Follow least privilege principle

7. Monitoring the Agent

Monitor Datadog Agent health:

  • Agent status in Datadog UI
  • Agent metrics in infrastructure monitoring
  • Alert on agent failures

Troubleshooting

Check Agent Status

# Check agent pods
kubectl get pods -n datadog

# Check agent logs
kubectl logs -n datadog -l app=datadog-agent

# Test agent connectivity
kubectl exec -n datadog <agent-pod> -- agent status

Verify Log Collection

  1. Check agent configuration
  2. Verify logs are being sent to Datadog
  3. Check log source and service tags
  4. Verify autodiscovery is working

Common Issues

No logs appearing:

  • Verify DD_LOGS_ENABLED=true
  • Check DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL
  • Verify API key is correct
  • Check network connectivity

High agent resource usage:

  • Reduce log sampling rate
  • Exclude unnecessary containers
  • Adjust resource limits
  • Filter logs at collection

See Also