Grafana Dashboards¶
Grafana 12.4.0 is the primary UI for the CWIQ observability stack, providing log exploration via Loki and metrics dashboards via Prometheus, behind Nginx SSL termination with optional Authentik SSO.
Overview¶
| Property | Value |
|---|---|
| Version | 12.4.0 |
| Host | grafana-shared-cwiq-io (VPC 10.0.15.178) |
| External URL | https://grafana.shared.cwiq.io |
| Grafana container | observability-grafana (port 3000 localhost) |
| Nginx container | observability-nginx (ports 80/443) |
| Datasources | Loki (default), Prometheus (pre-provisioned) |
| SSO | Authentik OIDC (optional, separate playbook) |
| Playbook | grafana/deploy-grafana.yml |
Pre-Provisioned Datasources¶
Both datasources are provisioned at startup. No manual configuration is needed.
| Name | Type | URL | Default |
|---|---|---|---|
Loki |
Loki | http://10.0.15.157:3100 |
Yes |
Prometheus |
Prometheus | http://10.0.15.9:9090 |
No |
URLs use VPC private IPs because Loki and Prometheus run on different hosts. Docker container DNS only resolves within the same host's Docker network.
Using Grafana¶
Log Exploration (Loki)¶
- Open
https://grafana.shared.cwiq.io - Click Explore (compass icon) in the left sidebar
- Select Loki from the datasource dropdown
- Enter a LogQL query — see Loki for the full LogQL reference
Common queries:
# All logs from a specific server
{host="orchestrator-dev-cwiq-io"}
# Error logs from all orchestrator containers
{job="docker", compose_project="orchestrator"} |= "error"
# Systemd journal for the alloy service
{job="journal"} |= "alloy"
Metrics Exploration (Prometheus)¶
- Click Explore
- Select Prometheus from the datasource dropdown
- Enter a PromQL query
Common queries:
# CPU usage per host
100 - (avg by (host) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# Memory used
node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes
# All currently firing alerts
ALERTS{alertstate="firing"}
Available Dashboards¶
Three Node Exporter dashboards are pre-provisioned:
| Dashboard | Shows |
|---|---|
| Node Exporter — System Overview | CPU, memory, swap, load across all hosts |
| Node Exporter — Disk & Filesystem | Disk usage and IOPS per host per mount |
| Node Exporter — Network | Network throughput, errors, drops per interface |
All dashboards include a Host dropdown populated from the host label on incoming metrics. Select a host to filter all panels.
Alert Management¶
The Grafana Alerting UI connects to AlertManager at http://observability-alertmanager:9093 via the Docker network on prometheus-shared-cwiq-io. Use Grafana to:
- View currently firing alerts
- Create and manage silences
- Inspect the alert routing tree
AlertManager's API is not directly accessible from browsers. Access it through Grafana or via SSH on prometheus-shared-cwiq-io:
ssh ec2-user@prometheus-shared-cwiq-io "curl -s http://localhost:9093/api/v2/alerts | python3 -m json.tool"
Configuration Variables¶
| Variable | Default | Description |
|---|---|---|
grafana_version |
12.4.0 |
Docker image tag |
grafana_admin_password |
(required) | Admin password — get from Vault secret/grafana/admin |
grafana_data_dir |
/data/grafana |
Host path for data, plugins, dashboards |
grafana_domain |
grafana.shared.cwiq.io |
External hostname |
grafana_loki_url |
http://10.0.15.157:3100 |
Loki datasource URL |
grafana_prometheus_url |
http://10.0.15.9:9090 |
Prometheus datasource URL |
Deployment¶
ssh ansible@ansible-shared-cwiq-io
ansible-helper
cd grafana
cp group_vars/all.yml.template group_vars/all.yml
# Set grafana_admin_password from Vault
vi group_vars/all.yml
ansible-playbook -i inventory/shared.yml deploy-grafana.yml
The playbook creates config and data directories, writes grafana.ini and nginx config from templates, and starts both containers. It waits up to 300 seconds for /api/health to return HTTP 200.
Health Check Commands¶
# External health via Nginx
curl https://grafana.shared.cwiq.io/api/health
# Internal Grafana health (from host)
ssh ec2-user@grafana-shared-cwiq-io "curl http://localhost:3000/api/health"
# Container status
ssh ec2-user@grafana-shared-cwiq-io "docker ps \
--filter 'name=observability-grafana' \
--filter 'name=observability-nginx' \
--format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'"
# Ansible healthcheck
cd grafana
ansible-playbook -i inventory/shared.yml healthcheck.yml
Operational Playbooks¶
| Playbook | Purpose |
|---|---|
deploy-grafana.yml |
Full deployment (config + image pull + start) |
healthcheck.yml |
Check /api/health |
restart.yml |
Restart Grafana and Nginx containers |
stop.yml |
Stop both containers |
Related Documentation¶
- Monitoring Overview
- Loki Log Aggregation
- Prometheus & AlertManager
- Source:
ansible-playbooks/grafana/docs/README.md