Skip to content

Adding a New Server

End-to-end checklist for provisioning a new EC2 instance: Terraform, Ansible, SSL, observability, and the mandatory documentation co-change. Every step is required — partial deployments leave the server unmonitored or unlisted.


Overview

Adding a server involves six systems that must all be updated in the same commit or MR:

  1. Terraform — EC2 instance, security groups, Route53 DNS
  2. Ansible — Service configuration and application deployment
  3. SSL — Let's Encrypt certificate via cert-server
  4. Alloy — Log and metric collection
  5. Icinga — Health checks
  6. Documentation — Monitoring docs co-change (mandatory)

Step 1: Create the Terraform EC2 Module

Run the script from the terraform-plan/ root:

cd terraform-plan
./scripts/new-app.sh <app-name> [environment] [description]

# Examples:
./scripts/new-app.sh langfuse dev "LangFuse LLM Observability"
./scripts/new-app.sh sonarqube shared-services "SonarQube Code Quality"

The script creates organization/environments/{env}/ec2-instances/{app-name}/ with template files. Then customise:

cd organization/environments/dev/ec2-instances/<app-name>

Edit variables.tf

Variable Required Value
ami_id ami-02169c46e1cfcd5e7 (AlmaLinux 9.7 — default for all new instances)
instance_type Size to workload. See Terraform Patterns cost table
data_volume_size GiB for /data (application data)
create_data_volume2 true if a separate /var/lib/containerd volume is needed

Default OS: AlmaLinux 9

Copy the ami_id from an existing module rather than looking it up in the console:

grep ami_id terraform-plan/organization/environments/shared-services/ec2-instances/gitlab/variables.tf

Add route53.tf (if the server needs DNS)

Create route53.tf with records for both the public zone and the private/internal zone:

resource "aws_route53_record" "app_public" {
  zone_id = var.route53_zone_id
  name    = "<app-name>.dev.cwiq.io"
  type    = "A"
  ttl     = 300
  records = [module.<app_name>_dev.instance_private_ip]
}

Add security.tf (only if cross-service communication is required)

Security groups are only needed when the app must receive inbound traffic from other AWS resources (not Tailscale). For most apps, the shared Tailscale + NAT security groups are sufficient.

Mandatory Cost Projection

Before applying, calculate and document the monthly cost in the PR description. See the Terraform Patterns cost table for On-Demand rates. Apply the ~47% Compute Savings Plan discount if applicable.

Deploy the Instance

cd organization/environments/dev/ec2-instances/<app-name>
terraform init
terraform plan \
  -var="tailscale_auth_key=$env:TAILSCALE_AUTH_KEY" \
  -var="created_by=$env:CREATED_BY"
terraform apply \
  -var="tailscale_auth_key=$env:TAILSCALE_AUTH_KEY" \
  -var="created_by=$env:CREATED_BY"

CRITICAL: Verify account before applying

aws sts get-caller-identity --profile dev
# Expected: "Account": "686123185567"

Record the outputs: instance ID, VPC private IP, Tailscale hostname.


Step 2: Create the Ansible Playbook Directory

In ansible-playbooks/, create a directory for the new service following the standard structure:

<app-name>/
├── inventory/
│   └── dev/
│       └── hosts.yml          # Created by operator (gitignored)
├── group_vars/
│   ├── all.yml.template       # Shared defaults (tracked)
│   └── all-dev.yml.template   # DEV-specific template (tracked)
├── roles/
│   └── <app-name>/
│       ├── defaults/main.yml
│       ├── tasks/main.yml
│       ├── handlers/main.yml
│       └── templates/
├── setup.yml
└── deploy-<app-name>.yml

Templates only — never create actual group_vars

Only create *.yml.template files in the repo. The actual inventory/{env}/group_vars/all.yml files are created by operators on the ansible server from the templates. See Ansible Conventions.

Deploy the Service

ssh ansible@ansible-shared-cwiq-io
ansible-helper
git pull origin main

cd <app-name>
ansible-playbook -i inventory/dev/ setup.yml

Step 3: Add the SSL Certificate

Add the new host to the cert-server inventory. All paths are relative to ansible-playbooks/cert-server/.

Update inventory.yml

Add a host entry following this pattern:

<app-name>-dev-cwiq-io:
  ansible_host: <app-name>-dev-cwiq-io   # Tailscale hostname
  cert_domain: <app-name>.dev.cwiq.io
  cert_key_type: ecdsa                   # Use rsa only if SAML is required
  ssl_owner: nginx                        # or the user running the TLS-terminating process
  ssl_group: nginx
  reload_command: "systemctl reload nginx"

Update ssl-deploy-all.yml

Ensure the new host's group is included in the deploy-all playbook.

Update cert-server/README.md

Add a row to the Environment Groups table for the new host.

Deploy the Certificate

cd cert-server
ansible-playbook -i inventory.yml ssl-deploy-<app-name>.yml

Step 4: Add Alloy Log and Metric Collection

In ansible-playbooks/alloy/, update the inventory for the new server's environment.

Update inventory/dev.yml (or shared.yml)

<app-name>-dev-cwiq-io:
  ansible_host: <app-name>-dev-cwiq-io
  ansible_user: ec2-user
  alloy_environment: development
  alloy_scrape_app_metrics: false        # Set true if the service exposes /metrics
  alloy_app_metrics_targets: []

For services that expose a Prometheus /metrics endpoint:

<app-name>-dev-cwiq-io:
  ansible_host: <app-name>-dev-cwiq-io
  ansible_user: ec2-user
  alloy_environment: development
  alloy_scrape_app_metrics: true
  alloy_app_metrics_targets:
    - { name: <app-name>, address: "localhost:8080", metrics_path: "/metrics" }

Update alloy/docs/README.md

Add a row to the Monitored Servers table.

Deploy Alloy to the New Host

cd alloy
ansible-playbook -i inventory/dev.yml deploy-alloy.yml \
  --limit <app-name>-dev-cwiq-io

Verify Collection

# Check service status on the host
ssh ec2-user@<app-name>-dev-cwiq-io "sudo systemctl status alloy"

# Query Grafana to verify logs are arriving
# In Grafana Explore → Loki datasource:
# {host="<app-name>-dev-cwiq-io"}

Step 5: Add Icinga Health Checks

In ansible-playbooks/icinga/, create a host configuration file.

Create conf.d/hosts/dev/<app-name>-dev.conf

object Host "<app-name>-dev-cwiq-io" {
  import "cwiq-dev-host"

  address = "<app-name>-dev-cwiq-io"    # Tailscale hostname
  display_name = "<App Name> DEV"

  vars.environment = "dev"
  vars.os = "AlmaLinux"

  vars.http_vhosts["HTTPS"] = {
    http_address = "<app-name>.dev.cwiq.io"
    http_ssl = true
    http_vhost = "<app-name>.dev.cwiq.io"
    http_uri = "/health"
    http_port = 443
  }
}

For Shared Services hosts, create the file under conf.d/hosts/shared/ and import cwiq-shared-host instead.

Update icinga/README.md

Add a row to the Monitored Hosts table with the new host's zone, checks, and environment.

Deploy the Icinga Configuration

cd icinga
ansible-playbook -i inventory/shared.yml deploy-config.yml --tags dev

Mandatory Documentation Co-Change Checklist

CRITICAL: All five files must be updated in the same MR

MRs that add infrastructure without updating monitoring documentation will be rejected.

# File Action
1 alloy/inventory/{env}.yml Add host entry
2 alloy/docs/README.md Add row to Monitored Servers table
3 icinga/conf.d/hosts/{env}/<hostname>.conf Create host config
4 icinga/README.md Add row to Monitored Hosts table
5 docs/SLACK_ALERTING.md Add to Coverage Map and Icinga Checks tables

Additionally update:

File Trigger
cert-server/inventory.yml New server needs SSL
cert-server/README.md New server needs SSL
vault-server/docs/02-cli-operations.md New service stores credentials in Vault

Step 6: Push to Vault

If the service has admin credentials or CI/CD service tokens, push them to Vault before the MR:

vault kv put secret/<app-name>/admin username=admin password=<generated>
vault kv put secret/<app-name>/svc-orchestrator token=<token> url=https://<app-name>.dev.cwiq.io

Report the admin username, password, and URL to the team after storing in Vault.