Skip to content

Grafana Dashboards

Grafana 12.4.0 is the primary UI for the CWIQ observability stack, providing log exploration via Loki and metrics dashboards via Prometheus, behind Nginx SSL termination with optional Authentik SSO.


Overview

Property Value
Version 12.4.0
Host grafana-shared-cwiq-io (VPC 10.0.15.178)
External URL https://grafana.shared.cwiq.io
Grafana container observability-grafana (port 3000 localhost)
Nginx container observability-nginx (ports 80/443)
Datasources Loki (default), Prometheus (pre-provisioned)
SSO Authentik OIDC (optional, separate playbook)
Playbook grafana/deploy-grafana.yml

Pre-Provisioned Datasources

Both datasources are provisioned at startup. No manual configuration is needed.

Name Type URL Default
Loki Loki http://10.0.15.157:3100 Yes
Prometheus Prometheus http://10.0.15.9:9090 No

URLs use VPC private IPs because Loki and Prometheus run on different hosts. Docker container DNS only resolves within the same host's Docker network.


Using Grafana

Log Exploration (Loki)

  1. Open https://grafana.shared.cwiq.io
  2. Click Explore (compass icon) in the left sidebar
  3. Select Loki from the datasource dropdown
  4. Enter a LogQL query — see Loki for the full LogQL reference

Common queries:

# All logs from a specific server
{host="orchestrator-dev-cwiq-io"}

# Error logs from all orchestrator containers
{job="docker", compose_project="orchestrator"} |= "error"

# Systemd journal for the alloy service
{job="journal"} |= "alloy"

Metrics Exploration (Prometheus)

  1. Click Explore
  2. Select Prometheus from the datasource dropdown
  3. Enter a PromQL query

Common queries:

# CPU usage per host
100 - (avg by (host) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Memory used
node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes

# All currently firing alerts
ALERTS{alertstate="firing"}

Available Dashboards

Three Node Exporter dashboards are pre-provisioned:

Dashboard Shows
Node Exporter — System Overview CPU, memory, swap, load across all hosts
Node Exporter — Disk & Filesystem Disk usage and IOPS per host per mount
Node Exporter — Network Network throughput, errors, drops per interface

All dashboards include a Host dropdown populated from the host label on incoming metrics. Select a host to filter all panels.


Alert Management

The Grafana Alerting UI connects to AlertManager at http://observability-alertmanager:9093 via the Docker network on prometheus-shared-cwiq-io. Use Grafana to:

  • View currently firing alerts
  • Create and manage silences
  • Inspect the alert routing tree

AlertManager's API is not directly accessible from browsers. Access it through Grafana or via SSH on prometheus-shared-cwiq-io:

ssh ec2-user@prometheus-shared-cwiq-io "curl -s http://localhost:9093/api/v2/alerts | python3 -m json.tool"

Configuration Variables

Variable Default Description
grafana_version 12.4.0 Docker image tag
grafana_admin_password (required) Admin password — get from Vault secret/grafana/admin
grafana_data_dir /data/grafana Host path for data, plugins, dashboards
grafana_domain grafana.shared.cwiq.io External hostname
grafana_loki_url http://10.0.15.157:3100 Loki datasource URL
grafana_prometheus_url http://10.0.15.9:9090 Prometheus datasource URL

Deployment

ssh ansible@ansible-shared-cwiq-io
ansible-helper
cd grafana
cp group_vars/all.yml.template group_vars/all.yml
# Set grafana_admin_password from Vault
vi group_vars/all.yml

ansible-playbook -i inventory/shared.yml deploy-grafana.yml

The playbook creates config and data directories, writes grafana.ini and nginx config from templates, and starts both containers. It waits up to 300 seconds for /api/health to return HTTP 200.


Health Check Commands

# External health via Nginx
curl https://grafana.shared.cwiq.io/api/health

# Internal Grafana health (from host)
ssh ec2-user@grafana-shared-cwiq-io "curl http://localhost:3000/api/health"

# Container status
ssh ec2-user@grafana-shared-cwiq-io "docker ps \
  --filter 'name=observability-grafana' \
  --filter 'name=observability-nginx' \
  --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'"

# Ansible healthcheck
cd grafana
ansible-playbook -i inventory/shared.yml healthcheck.yml

Operational Playbooks

Playbook Purpose
deploy-grafana.yml Full deployment (config + image pull + start)
healthcheck.yml Check /api/health
restart.yml Restart Grafana and Nginx containers
stop.yml Stop both containers