Operations & Emergency¶
Day-to-day Vault operations: health checks, audit logs, secret rotation, and emergency procedures for sealed Vault, lost tokens, and compromised secrets.
Health Checks¶
# Full status check
curl -s https://vault.shared.cwiq.io/v1/sys/health | jq
# HTTP response codes:
# 200 — Initialized, unsealed, active (normal)
# 429 — Unsealed, standby
# 501 — Not initialized
# 503 — Sealed
# Container status on vault-shared-cwiq-io
ssh ec2-user@vault-shared-cwiq-io \
"sudo -u vault docker compose -f /data/vault/docker-compose.yml ps"
# Live container logs
ssh ec2-user@vault-shared-cwiq-io \
"sudo -u vault docker compose -f /data/vault/docker-compose.yml logs -f"
Audit Logs¶
All Vault operations are recorded at /data/vault/logs/vault-audit.log. The log is in JSON format.
ssh ec2-user@vault-shared-cwiq-io
# Live log stream
sudo tail -f /data/vault/logs/vault-audit.log | jq
# Search for access to a specific path
sudo grep "secret/data/cwiq" /data/vault/logs/vault-audit.log | jq
# View last 100 entries
sudo tail -100 /data/vault/logs/vault-audit.log | jq
Each entry includes: timestamp, requesting identity, operation (read/write/delete), and the secret path. Secret values are never written to the audit log.
Secret Rotation¶
Rotate a Secret¶
# Create a new version (KV v2 — previous version is retained)
vault kv put secret/cwiq/shared/<app>/database \
pg_password="<new-password>" \
pg_host="<same-host>" \
pg_user="<same-user>"
# Verify new version
vault kv metadata get secret/cwiq/shared/<app>/database
Rotation Workflow¶
- Update the secret in Vault (creates a new version).
- Vault Agent auto-refreshes within the
vault_agent_render_interval(default: 5 minutes). - If the application does not pick up new env vars at runtime, restart the application container.
- Verify the application is using the new credential.
RDS Backups¶
Vault's storage is RDS PostgreSQL (vault-shared-storage). Automated backups are enabled.
# List RDS snapshots
aws rds describe-db-snapshots \
--profile shared-services \
--db-instance-identifier vault-shared-storage \
--query 'DBSnapshots[*].[DBSnapshotIdentifier,SnapshotCreateTime]' \
--output table
# Create a manual snapshot before any risky operation
aws rds create-db-snapshot \
--profile shared-services \
--db-instance-identifier vault-shared-storage \
--db-snapshot-identifier vault-manual-$(date +%Y%m%d)
Emergency: Vault is Sealed¶
With AWS KMS auto-unseal, Vault automatically unseals on restart. If Vault is sealed:
Step 1: Check the seal status and KMS access¶
If KMS errors appear, the EC2 IAM role may have lost kms:Decrypt permission, or the KMS key has been disabled.
Step 2: Verify KMS connectivity¶
Step 3: Restart the Vault container¶
Vault should unseal within 30 seconds after restart if KMS is reachable.
Emergency: Revoke a Compromised Token¶
# Revoke a specific token
vault token revoke <token>
# Revoke by accessor (if token value is unknown)
vault token revoke -accessor <accessor>
# List all active token accessors (requires root)
vault list auth/token/accessors
# Revoke all AppRole tokens (emergency — breaks all sidecars)
vault token revoke -mode=path auth/approle/
After revoking AppRole tokens, the affected application sidecars will stop working. Redeploy the service after rotating the secret-id:
Emergency: Regenerate Root Token¶
If the root token is lost or compromised, use the recovery key holders:
# Start the recovery process
vault operator generate-root -init
# Returns a Nonce and OTP — save both
# Each recovery key holder runs (requires recovery key threshold)
vault operator generate-root -nonce=<nonce>
# Enter recovery key when prompted
# After the threshold is reached, decode the encoded token:
vault operator generate-root -decode=<encoded-token> -otp=<otp>
Revoke the root token after use
Once the root token has been used to complete the emergency task, revoke it immediately:
Emergency: Compromised Secret¶
If a secret (password, API token, etc.) may have been exposed:
- Immediately rotate the secret in the upstream system (e.g., reset the database password, revoke the API token).
- Update Vault with the new value:
- Restart affected application containers to force Vault Agent to re-render.
- Check the audit log to determine when and by whom the secret was accessed:
RDS Point-in-Time Recovery¶
If Vault's storage backend needs to be restored:
- Go to RDS → Databases →
vault-shared-storagein the AWS Console (shared-services account). - Actions → Restore to point in time.
- Select the recovery time point.
- Launch the recovery instance.
- Update Vault's configuration to point to the new RDS endpoint:
Prometheus Metrics¶
Vault exposes metrics at https://vault.shared.cwiq.io/v1/sys/metrics?format=prometheus.
| Metric | Description |
|---|---|
vault_core_unsealed |
1 if unsealed, 0 if sealed |
vault_token_count |
Number of active tokens |
vault_secret_kv_count |
Number of secrets in KV store |
vault_runtime_alloc_bytes |
Memory allocation |
Related Documentation¶
- Vault Architecture
- Secret Paths Reference
- AppRole & JWT Auth
- Vault Agent Sidecar
- Source:
ansible-playbooks/vault-server/docs/04-maintenance.md - Source:
ansible-playbooks/vault-server/docs/05-emergency-procedures.md - Source:
ansible-playbooks/vault-server/docs/06-monitoring.md