Loki Log Aggregation¶
Grafana Loki 3.6.7 is the central log store for the CWIQ observability stack. All 22 servers forward logs to Loki via Alloy agents, and logs are stored in S3 with 30-day retention.
Overview¶
| Property | Value |
|---|---|
| Version | 3.6.7 |
| Host | loki-shared-cwiq-io (VPC 10.0.15.157) |
| Container | observability-loki |
| Port | 3100 (HTTP, all interfaces) |
| Storage | S3 bucket cwiq-shared-loki-data (us-west-2) |
| Mode | Monolithic (target: all) |
| Retention | 720h (30 days) |
| Playbook | loki/deploy-loki.yml |
Loki has no standalone UI. Access logs through Grafana Explore at https://grafana.shared.cwiq.io.
How Logs Arrive¶
Alloy agent (every host)
| push via HTTP
v
loki-shared-cwiq-io:3100
| index + chunks
v
S3: cwiq-shared-loki-data (us-west-2)
|
v
grafana-shared-cwiq-io (Loki datasource queries via VPC private IP)
Authentication for S3 is handled by the EC2 IAM instance role on loki-shared-cwiq-io. No credentials are stored in the configuration.
Network Access¶
Cross-VPC: DEV servers must use Tailscale hostname
Alloy agents on DEV servers use loki-shared-cwiq-io:3100 (Tailscale hostname). FQDN (loki.shared.cwiq.io) resolves to a Shared VPC private IP that is not routable from the DEV VPC.
| Source | Endpoint | Protocol |
|---|---|---|
| Alloy agents (Shared VPC) | loki.shared.cwiq.io:3100 |
Route53 private DNS |
| Alloy agents (DEV VPC) | loki-shared-cwiq-io:3100 |
Tailscale hostname |
| Grafana datasource | http://10.0.15.157:3100 |
VPC private IP (cross-host) |
Accessing Logs¶
- Open
https://grafana.shared.cwiq.io - Navigate to Explore (compass icon)
- Select Loki from the datasource dropdown
- Enter a LogQL query
LogQL Reference¶
Stream Selectors¶
All queries start with a label selector in {}:
# All logs from a host
{host="orchestrator-dev-cwiq-io"}
# Docker container logs
{job="docker", container="orchestrator-server"}
# Systemd journal for a unit
{job="journal"} |= "alloy"
# All errors across all containers
{job="docker"} |= "error"
# By environment
{environment="development"}
# Specific compose project
{job="docker", compose_project="orchestrator"}
Filter Operators¶
| Operator | Description |
|---|---|
\|= "text" |
Line contains text |
!= "text" |
Line does not contain text |
\|~ "regex" |
Line matches regex |
!~ "regex" |
Line does not match regex |
\| json |
Parse JSON log lines and expose fields as labels |
Metric Queries¶
# Error rate per container over 5m
sum by (container) (rate({job="docker"} |= "error" [5m]))
# Log volume per host
sum by (host) (rate({job="docker"}[5m]))
Available Labels¶
| Label | Values | Set By |
|---|---|---|
host |
Tailscale hostname of the server | Alloy agent |
environment |
development, demo, shared |
alloy_environment variable |
job |
docker, journal |
Alloy pipeline |
container |
Docker container name | Alloy Docker log collection |
service |
Docker Compose service name | Alloy Docker log collection |
compose_project |
Docker Compose project name | Alloy Docker log collection |
unit |
systemd unit name | Alloy journal log collection |
Configuration Variables¶
| Variable | Default | Description |
|---|---|---|
loki_version |
3.6.7 |
Docker image tag |
loki_data_dir |
/data/loki |
Host path for config and data |
loki_port |
3100 |
HTTP port |
loki_s3_bucket |
cwiq-shared-loki-data |
S3 bucket name |
loki_s3_region |
us-west-2 |
AWS region |
loki_retention_period |
720h |
Log retention (30 days) |
loki_ingestion_rate_mb |
16 |
Max ingestion rate per tenant (MB/s) |
loki_ingestion_burst_size_mb |
32 |
Burst size for ingestion (MB) |
Deployment¶
ssh ansible@ansible-shared-cwiq-io
ansible-helper
cd loki
cp group_vars/all.yml.template group_vars/all.yml
ansible-playbook -i inventory/shared.yml deploy-loki.yml
Health Check Commands¶
# Readiness check
curl http://loki-shared-cwiq-io:3100/ready
# Returns "ready" when all components are healthy
# Container status
ssh ec2-user@loki-shared-cwiq-io \
"docker ps --filter 'name=observability-loki' \
--format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'"
# Live logs
ssh ec2-user@loki-shared-cwiq-io \
"docker logs observability-loki --tail 50 --follow"
Operational Playbooks¶
| Playbook | Purpose |
|---|---|
deploy-loki.yml |
Full deployment (config + image pull + start) |
healthcheck.yml |
Check /ready and /metrics endpoints |
restart.yml |
Restart the observability-loki container |
stop.yml |
Stop the container |
Related Documentation¶
- Monitoring Overview
- Alloy Log & Metric Collection
- Grafana Dashboards
- Adding Monitoring for New Infrastructure
- Source:
ansible-playbooks/loki/docs/README.md - Source:
ansible-playbooks/loki/docs/OPERATIONS.md