Loki

Grafana Loki is a horizontally-scalable, highly-available log aggregation system inspired by Prometheus. It’s designed to be very cost-effective and easy to operate, making it an excellent choice for Kubernetes log management, especially when already using Grafana for visualization.

What is Loki?

Loki is a log aggregation system that:

Labels logs like Prometheus - Uses the same label model for easy correlation
Cost-effective - Doesn’t index log content, only labels
Integrated with Grafana - Works seamlessly with Grafana dashboards
Simple to operate - Fewer moving parts than traditional solutions
Horizontally scalable - Designed for scale-out architecture

graph TB A[Log Sources] --> B[Promtail] B --> C[Loki] D[Kubernetes Pods] --> B E[Node Logs] --> B F[Application Logs] --> B C --> G[Storage] G --> H[Object Storage] C --> I[Grafana] I --> J[LogQL Queries] I --> K[Dashboards] style A fill:#e1f5ff style B fill:#e8f5e9 style C fill:#fff4e1 style I fill:#f3e5f5

Architecture

Loki consists of several components:

Loki - Log aggregation system (write path and read path)
Promtail - Log collection agent (like Prometheus’s node_exporter)
Distributor - Receives logs and distributes to ingesters
Ingester - Writes logs to storage
Querier - Handles queries
Query Frontend - Query optimization and caching

graph TB A[Promtail] --> B[Distributor] B --> C[Ingester] C --> D[Storage] E[Querier] --> D F[Query Frontend] --> E F --> G[Grafana] D --> H[Object Storage] D --> I[Local Storage] style A fill:#e1f5ff style B fill:#e8f5e9 style C fill:#fff4e1 style E fill:#f3e5f5 style F fill:#ffe1e1

Key Concepts

Labels

Loki uses labels (like Prometheus) instead of indexing log content:

{namespace="production", pod="my-app-123", container="app"}

Benefits:

Fast queries on labels
Low storage costs
Easy correlation with metrics

Streams

A stream is a unique set of labels:

{namespace="production", pod="my-app-123"}  # Stream 1
{namespace="production", pod="my-app-456"}  # Stream 2

All logs with the same labels belong to the same stream.

Chunks

Logs are stored in chunks (compressed blocks) for efficiency.

Installation

Using Helm (Recommended)

# Add Grafana Helm repository
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

# Install Loki
helm install loki grafana/loki-stack \
  --namespace logging \
  --create-namespace \
  --set loki.enabled=true \
  --set promtail.enabled=true \
  --set grafana.enabled=false \
  --set loki.persistence.enabled=true \
  --set loki.persistence.size=50Gi

Manual Deployment

Loki Deployment

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: loki
  namespace: logging
spec:
  serviceName: loki
  replicas: 1
  selector:
    matchLabels:
      app: loki
  template:
    metadata:
      labels:
        app: loki
    spec:
      containers:
      - name: loki
        image: grafana/loki:2.9.0
        args:
        - -config.file=/etc/loki/local-config.yaml
        ports:
        - containerPort: 3100
          name: http
        volumeMounts:
        - name: config
          mountPath: /etc/loki
        - name: storage
          mountPath: /loki
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1000m"
      volumes:
      - name: config
        configMap:
          name: loki-config
      - name: storage
        persistentVolumeClaim:
          claimName: loki-pvc
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: loki-config
  namespace: logging
data:
  local-config.yaml: |
    auth_enabled: false
    
    server:
      http_listen_port: 3100
      grpc_listen_port: 9096
    
    common:
      instance_addr: 127.0.0.1
      path_prefix: /loki
      storage:
        filesystem:
          chunks_directory: /loki/chunks
          rules_directory: /loki/rules
      replication_factor: 1
      ring:
        kvstore:
          store: inmemory
    
    schema_config:
      configs:
        - from: 2020-10-24
          store: tsdb
          object_store: filesystem
          schema: v13
          index:
            prefix: index_
            period: 24h
    
    ruler:
      alertmanager_url: http://localhost:9093
    
    # By default, Loki will send anonymous, but uniquely-identifiable usage and configuration
    # analytics to Grafana Labs. These statistics are sent to https://stats.grafana.org/
    #
    # Statistics help us better understand how Loki is used, and they show us performance
    # levels for most users. This helps us prioritize features and documentation.
    # For more information on what's sent, look at
    # https://github.com/grafana/loki/blob/main/pkg/analytics/stats.go
    # Refer to the buildReport method to see what goes into a report.
    #
    # If you would like to disable reporting, uncomment the following lines:
    #analytics:
    #  reporting_enabled: false
---
apiVersion: v1
kind: Service
metadata:
  name: loki
  namespace: logging
spec:
  selector:
    app: loki
  ports:
  - port: 3100
    targetPort: 3100
    name: http

Promtail DaemonSet

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: promtail
  namespace: logging
spec:
  selector:
    matchLabels:
      name: promtail
  template:
    metadata:
      labels:
        name: promtail
    spec:
      serviceAccountName: promtail
      containers:
      - name: promtail
        image: grafana/promtail:2.9.0
        args:
        - -config.file=/etc/promtail/config.yml
        volumeMounts:
        - name: config
          mountPath: /etc/promtail
        - name: varlog
          mountPath: /var/log
          readOnly: true
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
      volumes:
      - name: config
        configMap:
          name: promtail-config
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: promtail-config
  namespace: logging
data:
  config.yml: |
    server:
      http_listen_port: 3101
      grpc_listen_port: 9095
    
    positions:
      filename: /tmp/positions.yaml
    
    clients:
      - url: http://loki.logging.svc.cluster.local:3100/loki/api/v1/push
    
    scrape_configs:
      - job_name: kubernetes-pods
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels:
              - __meta_kubernetes_pod_node_name
            target_label: __host__
          - action: labelmap
            regex: __meta_kubernetes_pod_label_(.+)
          - action: replace
            replacement: $1
            separator: /
            source_labels:
              - __meta_kubernetes_namespace
              - __meta_kubernetes_pod_name
            target_label: job
          - action: replace
            source_labels:
              - __meta_kubernetes_namespace
            target_label: namespace
          - action: replace
            source_labels:
              - __meta_kubernetes_pod_name
            target_label: pod
          - action: replace
            source_labels:
              - __meta_kubernetes_pod_container_name
            target_label: container
          - replacement: /var/log/pods/*$1/*.log
            separator: /
            source_labels:
              - __meta_kubernetes_pod_uid
              - __meta_kubernetes_pod_container_name
            target_label: __path__
          - action: replace
            replacement: /var/log/pods/*$1/*.log
            separator: /
            source_labels:
              - __meta_kubernetes_pod_uid
              - __meta_kubernetes_pod_container_name
            target_label: __path__
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: promtail
  namespace: logging
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: promtail
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: promtail
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: promtail
subjects:
- kind: ServiceAccount
  name: promtail
  namespace: logging

Promtail Configuration

Kubernetes Discovery

Promtail automatically discovers pods using Kubernetes service discovery:

scrape_configs:
  - job_name: kubernetes-pods
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      # Extract labels from Kubernetes metadata
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      # Set namespace label
      - action: replace
        source_labels: [__meta_kubernetes_namespace]
        target_label: namespace
      # Set pod name label
      - action: replace
        source_labels: [__meta_kubernetes_pod_name]
        target_label: pod

Pipeline Stages

Process and transform logs using pipeline stages:

pipeline_stages:
  # Parse JSON logs
  - json:
      expressions:
        level: level
        message: message
  
  # Add labels from parsed fields
  - labels:
      level:
  
  # Drop logs matching pattern
  - drop:
      expression: ".*debug.*"
      drop_counter_reason: "debug_logs"

Parsing Logs

Parse different log formats:

JSON Parsing:

pipeline_stages:
  - json:
      expressions:
        timestamp: timestamp
        level: level
        message: message

Regex Parsing:

pipeline_stages:
  - regex:
      expression: '^(?P<timestamp>\d{4}-\d{2}-\d{2}) (?P<level>\w+) (?P<message>.*)'

Multiline Parsing:

pipeline_stages:
  - multiline:
      firstline: '^\d{4}-\d{2}-\d{2}'
      max_wait_time: 3s

LogQL Queries

LogQL is Loki’s query language, inspired by PromQL.

Basic Queries

# All logs
{namespace="production"}

# Filter by label
{namespace="production", pod="my-app-123"}

# Text search
{namespace="production"} |= "error"

# Multiple filters
{namespace="production"} |= "error" |~ "timeout|connection"

Log Filters

Line filters:

|= "text" - Contains text
!= "text" - Doesn’t contain text
|~ "regex" - Matches regex
!~ "regex" - Doesn’t match regex

Label filters:

{label="value"} - Exact match
{label!="value"} - Not equal
{label=~"regex"} - Matches regex
{label!~"regex"} - Doesn’t match regex

Aggregations

# Count logs
count_over_time({namespace="production"}[5m])

# Rate of logs
rate({namespace="production"}[5m])

# Top labels
topk(10, count by (pod) ({namespace="production"}))

Range Queries

# Count over time range
count_over_time({namespace="production"}[1h])

# Bytes over time
bytes_over_time({namespace="production"}[1h])

# Rate calculation
rate({namespace="production"} |= "error" [5m])

Examples

Error logs:

{namespace="production"} |= "error"

Error rate:

sum(rate({namespace="production"} |= "error" [5m]))

Top error pods:

topk(10, count by (pod) ({namespace="production"} |= "error"))

Logs by level:

{namespace="production", level="ERROR"}

Grafana Integration

Adding Loki as Data Source

Go to Configuration > Data Sources
Click Add data source
Select Loki
Configure:
- URL: http://loki.logging.svc.cluster.local:3100
- Access: Server (default)
Click Save & Test

Exploring Logs

Go to Explore
Select Loki data source
Enter LogQL query
View logs in table or live view

Creating Log Panels

Create new dashboard
Add panel
Select Loki data source
Enter LogQL query
Choose visualization (Logs, Table, etc.)

Correlating with Metrics

Link logs with Prometheus metrics:

Use same label names in both systems
Create dashboards with both metrics and logs
Use Grafana’s correlation features

Storage Configuration

Filesystem Storage (Development)

schema_config:
  configs:
    - from: 2020-10-24
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

storage_config:
  filesystem:
    directory: /loki/chunks

Object Storage (Production)

S3:

storage_config:
  aws:
    s3: s3://my-bucket/loki
    region: us-east-1

GCS:

storage_config:
  gcs:
    bucket_name: my-loki-bucket

Azure:

storage_config:
  azure:
    account_name: myaccount
    account_key: mykey
    container_name: loki

Best Practices

1. Label Design

Good labels:

namespace - Kubernetes namespace
pod - Pod name
container - Container name
service - Service name
level - Log level (if parsed)

Avoid high-cardinality labels:

Timestamps
Request IDs (unless needed)
User IDs (unless needed)
Random values

2. Retention Policies

Configure retention:

limits_config:
  retention_period: 720h  # 30 days

Or use compactor:

compactor:
  working_directory: /loki/compactor
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 2h
  retention_delete_worker_count: 150

3. Resource Limits

Set appropriate limits:

resources:
  requests:
    memory: "512Mi"
    cpu: "500m"
  limits:
    memory: "2Gi"
    cpu: "1000m"

4. High Availability

Run multiple Loki instances:

Multiple ingesters
Multiple queriers
Query frontend for load balancing

5. Performance

Use object storage for production
Enable query caching
Use query frontend for optimization
Monitor chunk size

6. Security

Enable authentication (if using Grafana Enterprise)
Use TLS for transport
Configure RBAC in Grafana
Secure Promtail communication

Troubleshooting

Check Loki Status

# Check Loki pods
kubectl get pods -n logging -l app=loki

# Check Loki logs
kubectl logs -n logging -l app=loki

# Test Loki API
kubectl port-forward -n logging svc/loki 3100:3100
curl http://localhost:3100/ready

Check Promtail Status

# Check Promtail pods
kubectl get pods -n logging -l name=promtail

# Check Promtail logs
kubectl logs -n logging -l name=promtail

# Check Promtail targets
kubectl port-forward -n logging <promtail-pod> 3101:3101
curl http://localhost:3101/targets

No Logs Appearing

Verify Promtail is running
Check Promtail configuration
Verify Loki is accessible from Promtail
Check labels match in queries
Verify log files exist on nodes

Query Performance

Use label filters before text search
Limit time ranges
Use query frontend for optimization
Check chunk sizes