Skip to content

Loki Log Aggregation

Grafana Loki 3.6.7 is the central log store for the CWIQ observability stack. All 22 servers forward logs to Loki via Alloy agents, and logs are stored in S3 with 30-day retention.


Overview

Property Value
Version 3.6.7
Host loki-shared-cwiq-io (VPC 10.0.15.157)
Container observability-loki
Port 3100 (HTTP, all interfaces)
Storage S3 bucket cwiq-shared-loki-data (us-west-2)
Mode Monolithic (target: all)
Retention 720h (30 days)
Playbook loki/deploy-loki.yml

Loki has no standalone UI. Access logs through Grafana Explore at https://grafana.shared.cwiq.io.


How Logs Arrive

Alloy agent (every host)
    | push via HTTP
    v
loki-shared-cwiq-io:3100
    | index + chunks
    v
S3: cwiq-shared-loki-data (us-west-2)
    |
    v
grafana-shared-cwiq-io (Loki datasource queries via VPC private IP)

Authentication for S3 is handled by the EC2 IAM instance role on loki-shared-cwiq-io. No credentials are stored in the configuration.


Network Access

Cross-VPC: DEV servers must use Tailscale hostname

Alloy agents on DEV servers use loki-shared-cwiq-io:3100 (Tailscale hostname). FQDN (loki.shared.cwiq.io) resolves to a Shared VPC private IP that is not routable from the DEV VPC.

Source Endpoint Protocol
Alloy agents (Shared VPC) loki.shared.cwiq.io:3100 Route53 private DNS
Alloy agents (DEV VPC) loki-shared-cwiq-io:3100 Tailscale hostname
Grafana datasource http://10.0.15.157:3100 VPC private IP (cross-host)

Accessing Logs

  1. Open https://grafana.shared.cwiq.io
  2. Navigate to Explore (compass icon)
  3. Select Loki from the datasource dropdown
  4. Enter a LogQL query

LogQL Reference

Stream Selectors

All queries start with a label selector in {}:

# All logs from a host
{host="orchestrator-dev-cwiq-io"}

# Docker container logs
{job="docker", container="orchestrator-server"}

# Systemd journal for a unit
{job="journal"} |= "alloy"

# All errors across all containers
{job="docker"} |= "error"

# By environment
{environment="development"}

# Specific compose project
{job="docker", compose_project="orchestrator"}

Filter Operators

Operator Description
\|= "text" Line contains text
!= "text" Line does not contain text
\|~ "regex" Line matches regex
!~ "regex" Line does not match regex
\| json Parse JSON log lines and expose fields as labels

Metric Queries

# Error rate per container over 5m
sum by (container) (rate({job="docker"} |= "error" [5m]))

# Log volume per host
sum by (host) (rate({job="docker"}[5m]))

Available Labels

Label Values Set By
host Tailscale hostname of the server Alloy agent
environment development, demo, shared alloy_environment variable
job docker, journal Alloy pipeline
container Docker container name Alloy Docker log collection
service Docker Compose service name Alloy Docker log collection
compose_project Docker Compose project name Alloy Docker log collection
unit systemd unit name Alloy journal log collection

Configuration Variables

Variable Default Description
loki_version 3.6.7 Docker image tag
loki_data_dir /data/loki Host path for config and data
loki_port 3100 HTTP port
loki_s3_bucket cwiq-shared-loki-data S3 bucket name
loki_s3_region us-west-2 AWS region
loki_retention_period 720h Log retention (30 days)
loki_ingestion_rate_mb 16 Max ingestion rate per tenant (MB/s)
loki_ingestion_burst_size_mb 32 Burst size for ingestion (MB)

Deployment

ssh ansible@ansible-shared-cwiq-io
ansible-helper
cd loki
cp group_vars/all.yml.template group_vars/all.yml
ansible-playbook -i inventory/shared.yml deploy-loki.yml

Health Check Commands

# Readiness check
curl http://loki-shared-cwiq-io:3100/ready
# Returns "ready" when all components are healthy

# Container status
ssh ec2-user@loki-shared-cwiq-io \
  "docker ps --filter 'name=observability-loki' \
  --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'"

# Live logs
ssh ec2-user@loki-shared-cwiq-io \
  "docker logs observability-loki --tail 50 --follow"

Operational Playbooks

Playbook Purpose
deploy-loki.yml Full deployment (config + image pull + start)
healthcheck.yml Check /ready and /metrics endpoints
restart.yml Restart the observability-loki container
stop.yml Stop the container