Loki
Grafana Loki is a horizontally-scalable, highly-available log aggregation system inspired by Prometheus. It’s designed to be very cost-effective and easy to operate, making it an excellent choice for Kubernetes log management, especially when already using Grafana for visualization.
What is Loki?
Loki is a log aggregation system that:
- Labels logs like Prometheus - Uses the same label model for easy correlation
- Cost-effective - Doesn’t index log content, only labels
- Integrated with Grafana - Works seamlessly with Grafana dashboards
- Simple to operate - Fewer moving parts than traditional solutions
- Horizontally scalable - Designed for scale-out architecture
Architecture
Loki consists of several components:
- Loki - Log aggregation system (write path and read path)
- Promtail - Log collection agent (like Prometheus’s node_exporter)
- Distributor - Receives logs and distributes to ingesters
- Ingester - Writes logs to storage
- Querier - Handles queries
- Query Frontend - Query optimization and caching
Key Concepts
Labels
Loki uses labels (like Prometheus) instead of indexing log content:
{namespace="production", pod="my-app-123", container="app"}
Benefits:
- Fast queries on labels
- Low storage costs
- Easy correlation with metrics
Streams
A stream is a unique set of labels:
{namespace="production", pod="my-app-123"} # Stream 1
{namespace="production", pod="my-app-456"} # Stream 2
All logs with the same labels belong to the same stream.
Chunks
Logs are stored in chunks (compressed blocks) for efficiency.
Installation
Using Helm (Recommended)
# Add Grafana Helm repository
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
# Install Loki
helm install loki grafana/loki-stack \
--namespace logging \
--create-namespace \
--set loki.enabled=true \
--set promtail.enabled=true \
--set grafana.enabled=false \
--set loki.persistence.enabled=true \
--set loki.persistence.size=50Gi
Manual Deployment
Loki Deployment
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: loki
namespace: logging
spec:
serviceName: loki
replicas: 1
selector:
matchLabels:
app: loki
template:
metadata:
labels:
app: loki
spec:
containers:
- name: loki
image: grafana/loki:2.9.0
args:
- -config.file=/etc/loki/local-config.yaml
ports:
- containerPort: 3100
name: http
volumeMounts:
- name: config
mountPath: /etc/loki
- name: storage
mountPath: /loki
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
volumes:
- name: config
configMap:
name: loki-config
- name: storage
persistentVolumeClaim:
claimName: loki-pvc
---
apiVersion: v1
kind: ConfigMap
metadata:
name: loki-config
namespace: logging
data:
local-config.yaml: |
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
common:
instance_addr: 127.0.0.1
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
kvstore:
store: inmemory
schema_config:
configs:
- from: 2020-10-24
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
ruler:
alertmanager_url: http://localhost:9093
# By default, Loki will send anonymous, but uniquely-identifiable usage and configuration
# analytics to Grafana Labs. These statistics are sent to https://stats.grafana.org/
#
# Statistics help us better understand how Loki is used, and they show us performance
# levels for most users. This helps us prioritize features and documentation.
# For more information on what's sent, look at
# https://github.com/grafana/loki/blob/main/pkg/analytics/stats.go
# Refer to the buildReport method to see what goes into a report.
#
# If you would like to disable reporting, uncomment the following lines:
#analytics:
# reporting_enabled: false
---
apiVersion: v1
kind: Service
metadata:
name: loki
namespace: logging
spec:
selector:
app: loki
ports:
- port: 3100
targetPort: 3100
name: http
Promtail DaemonSet
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: promtail
namespace: logging
spec:
selector:
matchLabels:
name: promtail
template:
metadata:
labels:
name: promtail
spec:
serviceAccountName: promtail
containers:
- name: promtail
image: grafana/promtail:2.9.0
args:
- -config.file=/etc/promtail/config.yml
volumeMounts:
- name: config
mountPath: /etc/promtail
- name: varlog
mountPath: /var/log
readOnly: true
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
volumes:
- name: config
configMap:
name: promtail-config
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
---
apiVersion: v1
kind: ConfigMap
metadata:
name: promtail-config
namespace: logging
data:
config.yml: |
server:
http_listen_port: 3101
grpc_listen_port: 9095
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki.logging.svc.cluster.local:3100/loki/api/v1/push
scrape_configs:
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels:
- __meta_kubernetes_pod_node_name
target_label: __host__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- action: replace
replacement: $1
separator: /
source_labels:
- __meta_kubernetes_namespace
- __meta_kubernetes_pod_name
target_label: job
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: namespace
- action: replace
source_labels:
- __meta_kubernetes_pod_name
target_label: pod
- action: replace
source_labels:
- __meta_kubernetes_pod_container_name
target_label: container
- replacement: /var/log/pods/*$1/*.log
separator: /
source_labels:
- __meta_kubernetes_pod_uid
- __meta_kubernetes_pod_container_name
target_label: __path__
- action: replace
replacement: /var/log/pods/*$1/*.log
separator: /
source_labels:
- __meta_kubernetes_pod_uid
- __meta_kubernetes_pod_container_name
target_label: __path__
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: promtail
namespace: logging
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: promtail
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/proxy
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: promtail
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: promtail
subjects:
- kind: ServiceAccount
name: promtail
namespace: logging
Promtail Configuration
Kubernetes Discovery
Promtail automatically discovers pods using Kubernetes service discovery:
scrape_configs:
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
relabel_configs:
# Extract labels from Kubernetes metadata
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
# Set namespace label
- action: replace
source_labels: [__meta_kubernetes_namespace]
target_label: namespace
# Set pod name label
- action: replace
source_labels: [__meta_kubernetes_pod_name]
target_label: pod
Pipeline Stages
Process and transform logs using pipeline stages:
pipeline_stages:
# Parse JSON logs
- json:
expressions:
level: level
message: message
# Add labels from parsed fields
- labels:
level:
# Drop logs matching pattern
- drop:
expression: ".*debug.*"
drop_counter_reason: "debug_logs"
Parsing Logs
Parse different log formats:
JSON Parsing:
pipeline_stages:
- json:
expressions:
timestamp: timestamp
level: level
message: message
Regex Parsing:
pipeline_stages:
- regex:
expression: '^(?P<timestamp>\d{4}-\d{2}-\d{2}) (?P<level>\w+) (?P<message>.*)'
Multiline Parsing:
pipeline_stages:
- multiline:
firstline: '^\d{4}-\d{2}-\d{2}'
max_wait_time: 3s
LogQL Queries
LogQL is Loki’s query language, inspired by PromQL.
Basic Queries
# All logs
{namespace="production"}
# Filter by label
{namespace="production", pod="my-app-123"}
# Text search
{namespace="production"} |= "error"
# Multiple filters
{namespace="production"} |= "error" |~ "timeout|connection"
Log Filters
Line filters:
|= "text"- Contains text!= "text"- Doesn’t contain text|~ "regex"- Matches regex!~ "regex"- Doesn’t match regex
Label filters:
{label="value"}- Exact match{label!="value"}- Not equal{label=~"regex"}- Matches regex{label!~"regex"}- Doesn’t match regex
Aggregations
# Count logs
count_over_time({namespace="production"}[5m])
# Rate of logs
rate({namespace="production"}[5m])
# Top labels
topk(10, count by (pod) ({namespace="production"}))
Range Queries
# Count over time range
count_over_time({namespace="production"}[1h])
# Bytes over time
bytes_over_time({namespace="production"}[1h])
# Rate calculation
rate({namespace="production"} |= "error" [5m])
Examples
Error logs:
{namespace="production"} |= "error"
Error rate:
sum(rate({namespace="production"} |= "error" [5m]))
Top error pods:
topk(10, count by (pod) ({namespace="production"} |= "error"))
Logs by level:
{namespace="production", level="ERROR"}
Grafana Integration
Adding Loki as Data Source
- Go to Configuration > Data Sources
- Click Add data source
- Select Loki
- Configure:
- URL:
http://loki.logging.svc.cluster.local:3100 - Access: Server (default)
- URL:
- Click Save & Test
Exploring Logs
- Go to Explore
- Select Loki data source
- Enter LogQL query
- View logs in table or live view
Creating Log Panels
- Create new dashboard
- Add panel
- Select Loki data source
- Enter LogQL query
- Choose visualization (Logs, Table, etc.)
Correlating with Metrics
Link logs with Prometheus metrics:
- Use same label names in both systems
- Create dashboards with both metrics and logs
- Use Grafana’s correlation features
Storage Configuration
Filesystem Storage (Development)
schema_config:
configs:
- from: 2020-10-24
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
storage_config:
filesystem:
directory: /loki/chunks
Object Storage (Production)
S3:
storage_config:
aws:
s3: s3://my-bucket/loki
region: us-east-1
GCS:
storage_config:
gcs:
bucket_name: my-loki-bucket
Azure:
storage_config:
azure:
account_name: myaccount
account_key: mykey
container_name: loki
Best Practices
1. Label Design
Good labels:
namespace- Kubernetes namespacepod- Pod namecontainer- Container nameservice- Service namelevel- Log level (if parsed)
Avoid high-cardinality labels:
- Timestamps
- Request IDs (unless needed)
- User IDs (unless needed)
- Random values
2. Retention Policies
Configure retention:
limits_config:
retention_period: 720h # 30 days
Or use compactor:
compactor:
working_directory: /loki/compactor
compaction_interval: 10m
retention_enabled: true
retention_delete_delay: 2h
retention_delete_worker_count: 150
3. Resource Limits
Set appropriate limits:
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
4. High Availability
Run multiple Loki instances:
- Multiple ingesters
- Multiple queriers
- Query frontend for load balancing
5. Performance
- Use object storage for production
- Enable query caching
- Use query frontend for optimization
- Monitor chunk size
6. Security
- Enable authentication (if using Grafana Enterprise)
- Use TLS for transport
- Configure RBAC in Grafana
- Secure Promtail communication
Troubleshooting
Check Loki Status
# Check Loki pods
kubectl get pods -n logging -l app=loki
# Check Loki logs
kubectl logs -n logging -l app=loki
# Test Loki API
kubectl port-forward -n logging svc/loki 3100:3100
curl http://localhost:3100/ready
Check Promtail Status
# Check Promtail pods
kubectl get pods -n logging -l name=promtail
# Check Promtail logs
kubectl logs -n logging -l name=promtail
# Check Promtail targets
kubectl port-forward -n logging <promtail-pod> 3101:3101
curl http://localhost:3101/targets
No Logs Appearing
- Verify Promtail is running
- Check Promtail configuration
- Verify Loki is accessible from Promtail
- Check labels match in queries
- Verify log files exist on nodes
Query Performance
- Use label filters before text search
- Limit time ranges
- Use query frontend for optimization
- Check chunk sizes
See Also
- Log Solutions - Other log solutions
- Grafana - Visualization platform for Loki
- Prometheus - Metrics (correlate with Loki)
- Node & Sidecar logging - Log collection patterns