Adding a New Server¶
End-to-end checklist for provisioning a new EC2 instance: Terraform, Ansible, SSL, observability, and the mandatory documentation co-change. Every step is required — partial deployments leave the server unmonitored or unlisted.
Overview¶
Adding a server involves six systems that must all be updated in the same commit or MR:
- Terraform — EC2 instance, security groups, Route53 DNS
- Ansible — Service configuration and application deployment
- SSL — Let's Encrypt certificate via cert-server
- Alloy — Log and metric collection
- Icinga — Health checks
- Documentation — Monitoring docs co-change (mandatory)
Step 1: Create the Terraform EC2 Module¶
Run the script from the terraform-plan/ root:
cd terraform-plan
./scripts/new-app.sh <app-name> [environment] [description]
# Examples:
./scripts/new-app.sh langfuse dev "LangFuse LLM Observability"
./scripts/new-app.sh sonarqube shared-services "SonarQube Code Quality"
The script creates organization/environments/{env}/ec2-instances/{app-name}/ with template files. Then customise:
Edit variables.tf¶
| Variable | Required Value |
|---|---|
ami_id |
ami-02169c46e1cfcd5e7 (AlmaLinux 9.7 — default for all new instances) |
instance_type |
Size to workload. See Terraform Patterns cost table |
data_volume_size |
GiB for /data (application data) |
create_data_volume2 |
true if a separate /var/lib/containerd volume is needed |
Default OS: AlmaLinux 9
Copy the ami_id from an existing module rather than looking it up in the console:
Add route53.tf (if the server needs DNS)¶
Create route53.tf with records for both the public zone and the private/internal zone:
resource "aws_route53_record" "app_public" {
zone_id = var.route53_zone_id
name = "<app-name>.dev.cwiq.io"
type = "A"
ttl = 300
records = [module.<app_name>_dev.instance_private_ip]
}
Add security.tf (only if cross-service communication is required)¶
Security groups are only needed when the app must receive inbound traffic from other AWS resources (not Tailscale). For most apps, the shared Tailscale + NAT security groups are sufficient.
Mandatory Cost Projection¶
Before applying, calculate and document the monthly cost in the PR description. See the Terraform Patterns cost table for On-Demand rates. Apply the ~47% Compute Savings Plan discount if applicable.
Deploy the Instance¶
cd organization/environments/dev/ec2-instances/<app-name>
terraform init
terraform plan \
-var="tailscale_auth_key=$env:TAILSCALE_AUTH_KEY" \
-var="created_by=$env:CREATED_BY"
terraform apply \
-var="tailscale_auth_key=$env:TAILSCALE_AUTH_KEY" \
-var="created_by=$env:CREATED_BY"
CRITICAL: Verify account before applying
Record the outputs: instance ID, VPC private IP, Tailscale hostname.
Step 2: Create the Ansible Playbook Directory¶
In ansible-playbooks/, create a directory for the new service following the standard structure:
<app-name>/
├── inventory/
│ └── dev/
│ └── hosts.yml # Created by operator (gitignored)
├── group_vars/
│ ├── all.yml.template # Shared defaults (tracked)
│ └── all-dev.yml.template # DEV-specific template (tracked)
├── roles/
│ └── <app-name>/
│ ├── defaults/main.yml
│ ├── tasks/main.yml
│ ├── handlers/main.yml
│ └── templates/
├── setup.yml
└── deploy-<app-name>.yml
Templates only — never create actual group_vars
Only create *.yml.template files in the repo. The actual inventory/{env}/group_vars/all.yml files are created by operators on the ansible server from the templates. See Ansible Conventions.
Deploy the Service¶
ssh ansible@ansible-shared-cwiq-io
ansible-helper
git pull origin main
cd <app-name>
ansible-playbook -i inventory/dev/ setup.yml
Step 3: Add the SSL Certificate¶
Add the new host to the cert-server inventory. All paths are relative to ansible-playbooks/cert-server/.
Update inventory.yml¶
Add a host entry following this pattern:
<app-name>-dev-cwiq-io:
ansible_host: <app-name>-dev-cwiq-io # Tailscale hostname
cert_domain: <app-name>.dev.cwiq.io
cert_key_type: ecdsa # Use rsa only if SAML is required
ssl_owner: nginx # or the user running the TLS-terminating process
ssl_group: nginx
reload_command: "systemctl reload nginx"
Update ssl-deploy-all.yml¶
Ensure the new host's group is included in the deploy-all playbook.
Update cert-server/README.md¶
Add a row to the Environment Groups table for the new host.
Deploy the Certificate¶
Step 4: Add Alloy Log and Metric Collection¶
In ansible-playbooks/alloy/, update the inventory for the new server's environment.
Update inventory/dev.yml (or shared.yml)¶
<app-name>-dev-cwiq-io:
ansible_host: <app-name>-dev-cwiq-io
ansible_user: ec2-user
alloy_environment: development
alloy_scrape_app_metrics: false # Set true if the service exposes /metrics
alloy_app_metrics_targets: []
For services that expose a Prometheus /metrics endpoint:
<app-name>-dev-cwiq-io:
ansible_host: <app-name>-dev-cwiq-io
ansible_user: ec2-user
alloy_environment: development
alloy_scrape_app_metrics: true
alloy_app_metrics_targets:
- { name: <app-name>, address: "localhost:8080", metrics_path: "/metrics" }
Update alloy/docs/README.md¶
Add a row to the Monitored Servers table.
Deploy Alloy to the New Host¶
Verify Collection¶
# Check service status on the host
ssh ec2-user@<app-name>-dev-cwiq-io "sudo systemctl status alloy"
# Query Grafana to verify logs are arriving
# In Grafana Explore → Loki datasource:
# {host="<app-name>-dev-cwiq-io"}
Step 5: Add Icinga Health Checks¶
In ansible-playbooks/icinga/, create a host configuration file.
Create conf.d/hosts/dev/<app-name>-dev.conf¶
object Host "<app-name>-dev-cwiq-io" {
import "cwiq-dev-host"
address = "<app-name>-dev-cwiq-io" # Tailscale hostname
display_name = "<App Name> DEV"
vars.environment = "dev"
vars.os = "AlmaLinux"
vars.http_vhosts["HTTPS"] = {
http_address = "<app-name>.dev.cwiq.io"
http_ssl = true
http_vhost = "<app-name>.dev.cwiq.io"
http_uri = "/health"
http_port = 443
}
}
For Shared Services hosts, create the file under conf.d/hosts/shared/ and import cwiq-shared-host instead.
Update icinga/README.md¶
Add a row to the Monitored Hosts table with the new host's zone, checks, and environment.
Deploy the Icinga Configuration¶
Mandatory Documentation Co-Change Checklist¶
CRITICAL: All five files must be updated in the same MR
MRs that add infrastructure without updating monitoring documentation will be rejected.
| # | File | Action |
|---|---|---|
| 1 | alloy/inventory/{env}.yml |
Add host entry |
| 2 | alloy/docs/README.md |
Add row to Monitored Servers table |
| 3 | icinga/conf.d/hosts/{env}/<hostname>.conf |
Create host config |
| 4 | icinga/README.md |
Add row to Monitored Hosts table |
| 5 | docs/SLACK_ALERTING.md |
Add to Coverage Map and Icinga Checks tables |
Additionally update:
| File | Trigger |
|---|---|
cert-server/inventory.yml |
New server needs SSL |
cert-server/README.md |
New server needs SSL |
vault-server/docs/02-cli-operations.md |
New service stores credentials in Vault |
Step 6: Push to Vault¶
If the service has admin credentials or CI/CD service tokens, push them to Vault before the MR:
vault kv put secret/<app-name>/admin username=admin password=<generated>
vault kv put secret/<app-name>/svc-orchestrator token=<token> url=https://<app-name>.dev.cwiq.io
Report the admin username, password, and URL to the team after storing in Vault.