Tracing (OpenTelemetry)
OpenTelemetry is a unified standard for observability that provides APIs, SDKs, and tools to instrument, generate, collect, and export telemetry data (metrics, logs, and traces). It’s become the de facto standard for distributed tracing in modern applications and Kubernetes environments.
What is OpenTelemetry?
OpenTelemetry (OTel) is an open-source observability framework that:
- Unifies observability - Single standard for traces, metrics, and logs
- Vendor-neutral - Works with any observability backend
- Language support - SDKs for 10+ programming languages
- Automatic instrumentation - Reduces manual coding effort
- Cloud-native - Built for distributed systems like Kubernetes
Three Observability Signals
OpenTelemetry handles three types of telemetry data:
Traces
Traces show the path of a request through distributed services:
- Spans - Individual operations within a trace
- Trace context - Propagated across service boundaries
- Timing - Duration of each operation
- Relationships - Parent-child span relationships
Metrics
Metrics are numerical measurements over time:
- Counters - Incrementing values (e.g., request count)
- Gauges - Point-in-time values (e.g., CPU usage)
- Histograms - Distribution of measurements
Logs
Logs are structured event records:
- Structured format - JSON or key-value pairs
- Correlation - Linked to traces via trace IDs
- Context - Rich contextual information
OpenTelemetry Architecture
Components
- SDK - Language-specific library for instrumentation
- Collector - Receives, processes, and exports telemetry data
- Receivers - Accept data from SDKs or other sources
- Processors - Transform, filter, or batch data
- Exporters - Send data to observability backends
Automatic vs Manual Instrumentation
Automatic Instrumentation
Zero-code instrumentation for popular frameworks:
# Example: Automatic instrumentation sidecar
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
template:
spec:
containers:
- name: app
image: my-app:latest
- name: otel-collector
image: otel/opentelemetry-collector:latest
env:
- name: OTEL_SERVICE_NAME
value: my-app
Automatic instrumentation supports:
- HTTP frameworks (Express, Django, Flask, etc.)
- Database drivers (PostgreSQL, MySQL, MongoDB, etc.)
- Message queues (Kafka, RabbitMQ, etc.)
- gRPC and REST APIs
Manual Instrumentation
Explicit instrumentation for custom code:
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
# Setup
tracer_provider = TracerProvider()
trace.set_tracer_provider(tracer_provider)
tracer = trace.get_tracer(__name__)
# Instrumentation
def process_order(order_id):
with tracer.start_as_current_span("process_order") as span:
span.set_attribute("order.id", order_id)
# Business logic
span.set_attribute("order.status", "completed")
OpenTelemetry Collector
The Collector is a vendor-neutral agent that processes telemetry data:
Deployment in Kubernetes
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: otel-collector
namespace: monitoring
spec:
selector:
matchLabels:
app: otel-collector
template:
metadata:
labels:
app: otel-collector
spec:
containers:
- name: otel-collector
image: otel/opentelemetry-collector:latest
volumeMounts:
- name: config
mountPath: /etc/otelcol
volumes:
- name: config
configMap:
name: otel-collector-config
Collector Configuration
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 10s
send_batch_size: 1024
exporters:
prometheus:
endpoint: "0.0.0.0:8889"
jaeger:
endpoint: jaeger:14250
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [jaeger]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [prometheus]
Instrumentation Examples
Go Application
package main
import (
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/exporters/jaeger"
"go.opentelemetry.io/otel/sdk/trace"
)
func initTracer() {
exporter, _ := jaeger.New(jaeger.WithCollectorEndpoint(
jaeger.WithEndpoint("http://jaeger:14268/api/traces"),
))
tp := trace.NewTracerProvider(
trace.WithBatcher(exporter),
)
otel.SetTracerProvider(tp)
}
Python Application
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
# Setup
trace.set_tracer_provider(TracerProvider())
tracer = trace.get_tracer(__name__)
# Configure exporter
otlp_exporter = OTLPSpanExporter(endpoint="http://otel-collector:4317")
span_processor = BatchSpanProcessor(otlp_exporter)
trace.get_tracer_provider().add_span_processor(span_processor)
# Use
with tracer.start_as_current_span("operation") as span:
span.set_attribute("key", "value")
# Your code
Java Application
import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.exporter.otlp.trace.OtlpGrpcSpanExporter;
import io.opentelemetry.sdk.OpenTelemetrySdk;
import io.opentelemetry.sdk.trace.SdkTracerProvider;
OpenTelemetry openTelemetry = OpenTelemetrySdk.builder()
.setTracerProvider(
SdkTracerProvider.builder()
.addSpanProcessor(BatchSpanProcessor.builder(
OtlpGrpcSpanExporter.builder()
.setEndpoint("http://otel-collector:4317")
.build())
.build())
.build())
.build();
Kubernetes Deployment
Sidecar Pattern
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-with-otel
spec:
template:
spec:
containers:
- name: app
image: my-app:latest
env:
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://localhost:4317"
- name: otel-collector
image: otel/opentelemetry-collector:latest
ports:
- containerPort: 4317
name: otlp-grpc
- containerPort: 4318
name: otlp-http
DaemonSet Pattern
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: otel-collector
spec:
selector:
matchLabels:
app: otel-collector
template:
metadata:
labels:
app: otel-collector
spec:
containers:
- name: otel-collector
image: otel/opentelemetry-collector:latest
ports:
- containerPort: 4317
name: otlp-grpc
- containerPort: 4318
name: otlp-http
volumeMounts:
- name: config
mountPath: /etc/otelcol
volumes:
- name: config
configMap:
name: otel-collector-config
Exporters and Backends
OpenTelemetry can export to many backends:
Prometheus (Metrics)
exporters:
prometheus:
endpoint: "0.0.0.0:8889"
Jaeger (Traces)
exporters:
jaeger:
endpoint: jaeger:14250
tls:
insecure: true
Loki (Logs)
exporters:
loki:
endpoint: http://loki:3100/loki/api/v1/push
OTLP (Generic)
exporters:
otlp:
endpoint: backend.example.com:4317
tls:
cert_file: /etc/certs/client.crt
key_file: /etc/certs/client.key
Trace Context Propagation
OpenTelemetry automatically propagates trace context across service boundaries:
Best Practices
Start with automatic instrumentation - Use automatic instrumentation when possible
Use the Collector - Deploy Collector as a sidecar or DaemonSet
Sample appropriately - Configure sampling to control data volume and costs
Set resource attributes - Add service name, version, and environment info
Correlate signals - Link logs and metrics to traces via trace IDs
Instrument at the edge - Instrument API gateways and load balancers
Monitor the Collector - Ensure Collector is healthy and not dropping data
Use semantic conventions - Follow OpenTelemetry semantic conventions for consistency
Sampling
Control data volume with sampling:
processors:
probabilistic_sampler:
sampling_percentage: 10 # Sample 10% of traces
Or in code:
from opentelemetry.sdk.trace.sampling import TraceIdRatioBased
# Sample 50% of traces
sampler = TraceIdRatioBased(0.5)
Troubleshooting
No Traces Appearing
# Check Collector logs
kubectl logs -n monitoring -l app=otel-collector
# Verify Collector is receiving data
kubectl port-forward -n monitoring svc/otel-collector 8888:8888
# Check /metrics endpoint
# Check application instrumentation
kubectl logs <app-pod> | grep -i otel
High Cardinality
Too many unique spans can cause performance issues:
- Use span attributes wisely
- Avoid putting unique IDs in attribute names
- Consider sampling for high-volume traces
Missing Context
Ensure trace context is propagated:
- Check HTTP headers (traceparent, tracestate)
- Verify gRPC metadata propagation
- Test across service boundaries
See Also
- Prometheus - Metrics backend for OpenTelemetry
- Grafana - Visualization platform
- Logging - Log collection and analysis