Datadog
Datadog is a cloud-native monitoring and security platform that provides comprehensive observability for applications and infrastructure. It offers log collection, metrics monitoring, distributed tracing, and security monitoring, all integrated into a single platform.
What is Datadog?
Datadog provides:
- Log Management - Centralized log collection and analysis
- APM - Application Performance Monitoring with distributed tracing
- Infrastructure Monitoring - Metrics for hosts, containers, and services
- Real User Monitoring - Frontend performance monitoring
- Security Monitoring - Security analytics and threat detection
- Dashboards - Customizable visualizations
Datadog Agent
The Datadog Agent is a lightweight daemon that runs on each node to collect logs, metrics, and traces:
- Log collection - Collects logs from containers and applications
- Metrics collection - System, container, and application metrics
- APM - Distributed tracing
- Autodiscovery - Automatically discovers services and configurations
- Kubernetes integration - Native Kubernetes support
Installation
Using Helm (Recommended)
# Add Datadog Helm repository
helm repo add datadog https://helm.datadoghq.com
helm repo update
# Create secret with API key
kubectl create secret generic datadog-secret \
--from-literal api-key=<YOUR_API_KEY> \
--namespace datadog
# Install Datadog Agent
helm install datadog-agent datadog/datadog \
--namespace datadog \
--create-namespace \
--set datadog.apiKeyExistingSecret=datadog-secret \
--set datadog.logs.enabled=true \
--set datadog.logs.containerCollectAll=true \
--set datadog.apm.enabled=true \
--set clusterAgent.enabled=true
Manual Deployment
Datadog Agent DaemonSet
apiVersion: v1
kind: Secret
metadata:
name: datadog-secret
namespace: datadog
type: Opaque
stringData:
api-key: <YOUR_DATADOG_API_KEY>
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: datadog-agent
namespace: datadog
spec:
selector:
matchLabels:
app: datadog-agent
template:
metadata:
labels:
app: datadog-agent
spec:
serviceAccountName: datadog-agent
containers:
- image: gcr.io/datadoghq/agent:7
name: datadog-agent
env:
- name: DD_API_KEY
valueFrom:
secretKeyRef:
name: datadog-secret
key: api-key
- name: DD_SITE
value: datadoghq.com
- name: DD_LOGS_ENABLED
value: "true"
- name: DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL
value: "true"
- name: DD_CONTAINER_EXCLUDE
value: "name:datadog-agent"
- name: DD_APM_ENABLED
value: "true"
- name: DD_COLLECT_KUBERNETES_EVENTS
value: "true"
- name: DD_LEADER_ELECTION
value: "true"
- name: KUBERNETES
value: "true"
- name: DD_KUBERNETES_KUBELET_HOST
valueFrom:
fieldRef:
fieldPath: status.hostIP
volumeMounts:
- name: dockersocket
mountPath: /var/run/docker.sock
- name: procdir
mountPath: /host/proc
readOnly: true
- name: cgroups
mountPath: /host/sys/fs/cgroup
readOnly: true
- name: pointerdir
mountPath: /opt/datadog-agent/run
resources:
requests:
memory: "256Mi"
cpu: "200m"
limits:
memory: "512Mi"
cpu: "500m"
volumes:
- name: dockersocket
hostPath:
path: /var/run/docker.sock
- name: procdir
hostPath:
path: /proc
- name: cgroups
hostPath:
path: /sys/fs/cgroup
- name: pointerdir
emptyDir: {}
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: datadog-agent
namespace: datadog
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: datadog-agent
rules:
- apiGroups: [""]
resources:
- services
- events
- endpoints
- pods
- nodes
- componentstatuses
verbs: ["get", "list", "watch"]
- apiGroups: ["quota.openshift.io"]
resources:
- clusterresourcequotas
verbs: ["get", "list"]
- apiGroups: ["apps"]
resources:
- deployments
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: datadog-agent
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: datadog-agent
subjects:
- kind: ServiceAccount
name: datadog-agent
namespace: datadog
Log Collection Configuration
Container Log Collection
The Datadog Agent automatically collects logs from all containers when enabled:
env:
- name: DD_LOGS_ENABLED
value: "true"
- name: DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL
value: "true"
Selective Log Collection
Collect logs only from specific containers using annotations:
apiVersion: v1
kind: Pod
metadata:
name: my-app
annotations:
ad.datadoghq.com/my-app.logs: '[{"source": "myapp", "service": "my-service"}]'
spec:
containers:
- name: my-app
image: my-app:latest
Log Processing Rules
Configure log processing in the agent:
env:
- name: DD_LOGS_CONFIG_PROCESSING_RULES
value: |
[{
"type": "multi_line",
"name": "log_start_with_date",
"pattern": "\\d{4}-\\d{2}-\\d{2}"
}]
Service Tags
Add service tags for better organization:
env:
- name: DD_TAGS
value: "env:production service:my-app team:backend"
Kubernetes Metadata
Automatic Kubernetes metadata enrichment:
env:
- name: DD_KUBERNETES_COLLECT_METADATA_TAGS
value: "true"
- name: DD_KUBERNETES_METADATA_TAG_UPDATE_FREQ
value: "60"
Autodiscovery
Autodiscovery automatically configures log collection based on pod annotations:
Annotation-Based Configuration
apiVersion: v1
kind: Pod
metadata:
name: my-app
annotations:
ad.datadoghq.com/my-app.logs: |
[
{
"source": "python",
"service": "my-service",
"log_processing_rules": [
{
"type": "multi_line",
"name": "log_start_with_date",
"pattern": "\\d{4}-\\d{2}-\\d{2}"
}
]
}
]
spec:
containers:
- name: my-app
image: my-app:latest
ConfigMap-Based Configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: datadog-logs-config
namespace: datadog
data:
my-service.yaml: |
ad_identifiers:
- my-app
logs:
- type: file
path: /var/log/app.log
source: python
service: my-service
---
# In agent configuration
env:
- name: DD_LOGS_CONFIG_AUTODISCOVERY_PATHS
value: "/etc/datadog-agent/conf.d/logs.d/auto-discovery"
Log Queries
Basic Log Search
In Datadog Log Explorer:
- Search:
service:my-service - Filter by time range
- Add facets for filtering
Advanced Queries
service:my-service status:error
source:nginx status:>=400
env:production @http.status_code:[400 TO 499]
kubernetes.namespace:production @message:error
Log Facets
Create facets for commonly filtered fields:
servicesourcestatusenvkubernetes.namespacekubernetes.pod_name
APM Integration
Enable APM for distributed tracing:
env:
- name: DD_APM_ENABLED
value: "true"
- name: DD_APM_NON_LOCAL_TRAFFIC
value: "true"
Expose APM port:
ports:
- containerPort: 8126
name: apm
protocol: TCP
Application Instrumentation
For Python applications:
from ddtrace import patch_all
patch_all()
# Your application code
For Node.js applications:
const tracer = require('dd-trace').init({
service: 'my-service',
env: 'production'
});
Dashboards
Creating Dashboards
- Go to Dashboards > New Dashboard
- Add widgets:
- Timeseries - Metrics over time
- Log Stream - Log events
- Heatmap - Distribution visualization
- Query Value - Single value
- Top List - Ranked list
Log-Based Widgets
Log Volume:
- Widget: Timeseries
- Query:
* - Group by:
service
Error Rate:
- Widget: Query Value
- Query:
status:error - Aggregation: Count
Alerts and Monitors
Log-Based Monitors
- Go to Monitors > New Monitor
- Select Logs
- Configure:
- Query:
status:error - Alert conditions: Count > threshold
- Notification channels
- Query:
Alert Conditions
- Threshold - Alert when count exceeds value
- Anomaly - Alert on anomalies
- Forecast - Alert based on predictions
Best Practices
1. Resource Management
Set appropriate resource limits:
resources:
requests:
memory: "256Mi"
cpu: "200m"
limits:
memory: "512Mi"
cpu: "500m"
2. Log Sampling
Use sampling for high-volume logs:
env:
- name: DD_LOGS_CONFIG_PROCESSING_RULES
value: |
[{
"type": "sample_rate",
"sample_rate": 0.1,
"name": "sample_logs"
}]
3. Service Tags
Use consistent tagging:
env: environment (production, staging, dev)service: service nameversion: application versionteam: team name
4. Log Parsing
Configure proper log parsing:
- Use source auto-detection when possible
- Add custom parsing rules for structured logs
- Parse JSON logs automatically
5. Cost Optimization
- Use log sampling for verbose logs
- Filter unnecessary logs at collection
- Set appropriate log retention
- Use log exclusion filters
6. Security
- Store API key in secrets
- Use RBAC for agent permissions
- Encrypt agent communication (TLS)
- Follow least privilege principle
7. Monitoring the Agent
Monitor Datadog Agent health:
- Agent status in Datadog UI
- Agent metrics in infrastructure monitoring
- Alert on agent failures
Troubleshooting
Check Agent Status
# Check agent pods
kubectl get pods -n datadog
# Check agent logs
kubectl logs -n datadog -l app=datadog-agent
# Test agent connectivity
kubectl exec -n datadog <agent-pod> -- agent status
Verify Log Collection
- Check agent configuration
- Verify logs are being sent to Datadog
- Check log source and service tags
- Verify autodiscovery is working
Common Issues
No logs appearing:
- Verify
DD_LOGS_ENABLED=true - Check
DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL - Verify API key is correct
- Check network connectivity
High agent resource usage:
- Reduce log sampling rate
- Exclude unnecessary containers
- Adjust resource limits
- Filter logs at collection
See Also
- Log Solutions - Other log solutions
- Node & Sidecar logging - Log collection patterns
- Container & Pod logs - Basic log viewing