Slack Alerting¶

All infrastructure alerts route to two Slack channels based on environment. AlertManager handles Prometheus metric alerts; Icinga2 handles infrastructure health check alerts. Both systems send to the same channels.

Channel Strategy¶

Channel	Environments	Sources
`#cwiq-shared-infra-alerts`	Shared Services (14 hosts)	AlertManager + Icinga2 master
`#cwiq-dev-infra-alerts`	DEV + Demo (7 hosts)	AlertManager + Icinga2 satellite

All severities (warning, critical, resolved) post to the same channel, color-coded:

Condition	Color
Critical alert firing	Red (`danger`)
Warning alert firing	Orange (`warning`)
Alert resolved	Green (`good`)

Alert Routing¶

Prometheus AlertManager¶

Environment routing is based on the environment label attached by Alloy agents:

Prometheus fires alert
    |
AlertManager routing tree
    ├── environment=shared          → #cwiq-shared-infra-alerts
    └── environment=development|demo → #cwiq-dev-infra-alerts
          ├── severity=critical  → repeat every 1h
          └── severity=warning   → repeat every 4h

Icinga2¶

Icinga routes based on the host's vars.environment:

`vars.environment`	Slack Channel
`shared`	`#cwiq-shared-infra-alerts`
`dev`	`#cwiq-dev-infra-alerts`

Domain Separation¶

The two alerting systems cover complementary concerns:

Domain	Tool	Examples
Infrastructure health ("is it alive?")	Icinga2	SSH connectivity, TCP port open, SSL certificate expiry, Docker container state
Metric thresholds ("is it working correctly?")	AlertManager	CPU > 80%, disk filling, HTTP 5xx rate, memory exhaustion

Do not duplicate alerts across both systems.

AlertManager Message Format¶

Each Prometheus alert message includes:

Field	Example
Alert name (links to Grafana)	`HighDiskUsage`
Host	`prometheus-shared-cwiq-io`
Mount	`/data` (disk alerts) or `n/a`
Severity	`warning` or `critical`
Environment	`shared` or `development`
Description	`Root filesystem at 83% on prometheus-shared-cwiq-io`

Webhook Management¶

Slack webhook URLs are stored in Vault:

vault kv put secret/slack/webhooks \
  shared="https://hooks.slack.com/services/..." \
  dev="https://hooks.slack.com/services/..."

Set in group_vars/all.yml on the ansible server:

alertmanager_slack_webhook_shared: "https://hooks.slack.com/services/..."
alertmanager_slack_webhook_dev: "https://hooks.slack.com/services/..."

After updating webhooks, redeploy Prometheus:

cd prometheus
ansible-playbook -i inventory/shared.yml deploy-prometheus.yml

Testing Alerting¶

Test AlertManager → Slack¶

# Fire a test alert (run on prometheus-shared-cwiq-io)
curl -X POST http://localhost:9093/api/v2/alerts \
  -H 'Content-Type: application/json' \
  -d '[{
    "labels": {
      "alertname": "TestAlert",
      "severity": "warning",
      "environment": "shared"
    },
    "annotations": {
      "summary": "Test alert",
      "description": "Slack integration test — safe to ignore"
    }
  }]'

The alert appears in #cwiq-shared-infra-alerts within 30 seconds. Use "environment": "development" to test #cwiq-dev-infra-alerts.

Test Icinga → Slack¶

Run a forced check on a known host from the IcingaWeb2 UI (https://icinga.shared.cwiq.io) and verify the notification appears in the channel.

Adding Alerting for a New Environment¶

Create Slack channel #cwiq-{env}-infra-alerts and configure an Incoming Webhook.
Store webhook URL in Vault: vault kv patch secret/slack/webhooks {env}="https://..."
Add AlertManager receiver in prometheus/roles/deploy_prometheus/templates/alertmanager.yml.j2.
Add Icinga notification user in icinga/conf.d/notifications.conf.j2.
Create Alloy inventory file at alloy/inventory/{env}.yml.
Update documentation: SLACK_ALERTING.md, prometheus/docs/ALERTING.md, icinga/README.md, alloy/docs/README.md.

Silencing Alerts¶

To silence a flapping or known-maintenance alert:

Open https://grafana.shared.cwiq.io → Alerting → Silences
Create a silence matching {alertname="...", host="..."} with a start/end time

Alternatively, use the AlertManager API:

# Run on prometheus-shared-cwiq-io
curl -s http://localhost:9093/api/v2/silences | python3 -m json.tool

Monitoring Overview
Prometheus & AlertManager
Icinga Health Checks
Adding Monitoring for New Infrastructure
Source: ansible-playbooks/docs/SLACK_ALERTING.md