Slack Alerting¶
All infrastructure alerts route to two Slack channels based on environment. AlertManager handles Prometheus metric alerts; Icinga2 handles infrastructure health check alerts. Both systems send to the same channels.
Channel Strategy¶
| Channel | Environments | Sources |
|---|---|---|
#cwiq-shared-infra-alerts |
Shared Services (14 hosts) | AlertManager + Icinga2 master |
#cwiq-dev-infra-alerts |
DEV + Demo (7 hosts) | AlertManager + Icinga2 satellite |
All severities (warning, critical, resolved) post to the same channel, color-coded:
| Condition | Color |
|---|---|
| Critical alert firing | Red (danger) |
| Warning alert firing | Orange (warning) |
| Alert resolved | Green (good) |
Alert Routing¶
Prometheus AlertManager¶
Environment routing is based on the environment label attached by Alloy agents:
Prometheus fires alert
|
AlertManager routing tree
├── environment=shared → #cwiq-shared-infra-alerts
└── environment=development|demo → #cwiq-dev-infra-alerts
├── severity=critical → repeat every 1h
└── severity=warning → repeat every 4h
Icinga2¶
Icinga routes based on the host's vars.environment:
vars.environment |
Slack Channel |
|---|---|
shared |
#cwiq-shared-infra-alerts |
dev |
#cwiq-dev-infra-alerts |
Domain Separation¶
The two alerting systems cover complementary concerns:
| Domain | Tool | Examples |
|---|---|---|
| Infrastructure health ("is it alive?") | Icinga2 | SSH connectivity, TCP port open, SSL certificate expiry, Docker container state |
| Metric thresholds ("is it working correctly?") | AlertManager | CPU > 80%, disk filling, HTTP 5xx rate, memory exhaustion |
Do not duplicate alerts across both systems.
AlertManager Message Format¶
Each Prometheus alert message includes:
| Field | Example |
|---|---|
| Alert name (links to Grafana) | HighDiskUsage |
| Host | prometheus-shared-cwiq-io |
| Mount | /data (disk alerts) or n/a |
| Severity | warning or critical |
| Environment | shared or development |
| Description | Root filesystem at 83% on prometheus-shared-cwiq-io |
Webhook Management¶
Slack webhook URLs are stored in Vault:
vault kv put secret/slack/webhooks \
shared="https://hooks.slack.com/services/..." \
dev="https://hooks.slack.com/services/..."
Set in group_vars/all.yml on the ansible server:
alertmanager_slack_webhook_shared: "https://hooks.slack.com/services/..."
alertmanager_slack_webhook_dev: "https://hooks.slack.com/services/..."
After updating webhooks, redeploy Prometheus:
Testing Alerting¶
Test AlertManager → Slack¶
# Fire a test alert (run on prometheus-shared-cwiq-io)
curl -X POST http://localhost:9093/api/v2/alerts \
-H 'Content-Type: application/json' \
-d '[{
"labels": {
"alertname": "TestAlert",
"severity": "warning",
"environment": "shared"
},
"annotations": {
"summary": "Test alert",
"description": "Slack integration test — safe to ignore"
}
}]'
The alert appears in #cwiq-shared-infra-alerts within 30 seconds. Use "environment": "development" to test #cwiq-dev-infra-alerts.
Test Icinga → Slack¶
Run a forced check on a known host from the IcingaWeb2 UI (https://icinga.shared.cwiq.io) and verify the notification appears in the channel.
Adding Alerting for a New Environment¶
- Create Slack channel
#cwiq-{env}-infra-alertsand configure an Incoming Webhook. - Store webhook URL in Vault:
vault kv patch secret/slack/webhooks {env}="https://..." - Add AlertManager receiver in
prometheus/roles/deploy_prometheus/templates/alertmanager.yml.j2. - Add Icinga notification user in
icinga/conf.d/notifications.conf.j2. - Create Alloy inventory file at
alloy/inventory/{env}.yml. - Update documentation:
SLACK_ALERTING.md,prometheus/docs/ALERTING.md,icinga/README.md,alloy/docs/README.md.
Silencing Alerts¶
To silence a flapping or known-maintenance alert:
- Open
https://grafana.shared.cwiq.io→ Alerting → Silences - Create a silence matching
{alertname="...", host="..."}with a start/end time - Alternatively, use the AlertManager API:
Related Documentation¶
- Monitoring Overview
- Prometheus & AlertManager
- Icinga Health Checks
- Adding Monitoring for New Infrastructure
- Source:
ansible-playbooks/docs/SLACK_ALERTING.md