Skip to content

SSL: Architecture

All CWIQ SSL certificates are issued and managed centrally by a single cert-server (ansible-shared-cwiq-io). The cert-server uses Let's Encrypt DNS-01 validation via Route53 and distributes certificates to 23 hosts across three environments.

How It Works

┌──────────────────────────────────────────────────────────────────┐
│  Cert-Server (ansible-shared-cwiq-io)                            │
│                                                                  │
│  certbot + python3-certbot-dns-route53                           │
│  /etc/letsencrypt/live/<domain>/                                 │
│                                                                  │
│  systemd timer: ssl-renew-deploy.timer                           │
│  Schedule: twice daily (00:00, 12:00 UTC) + up to 1h jitter     │
└──────────────────────────────────────────────────────────────────┘
        ┌───────────┴───────────┐
        │                       │
        ▼                       ▼
┌───────────────────┐   ┌─────────────────────────┐
│ Deploy to EC2     │   │ Import to AWS ACM        │
│ ssl-deploy-*.yml  │   │ acm-import.yml           │
│                   │   │                          │
│ Certificate files │   │ sso.shared.cwiq.io       │
│ copied to host:   │   │ gitlab.shared.cwiq.io    │
│ /data/ssl/<domain>│   └─────────────────────────┘
│ Mounted into      │               │
│ containers        │               ▼
└───────────────────┘   ┌─────────────────────────┐
                        │ Application Load         │
                        │ Balancers (ALB)          │
                        │                          │
                        │ SSL terminated at ALB,   │
                        │ HTTP forwarded to EC2    │
                        └─────────────────────────┘

DNS-01 Validation

All certificates use DNS-01 challenge validation. The cert-server EC2 instance has an IAM role that allows it to create and delete Route53 TXT records in the shared.cwiq.io and dev.cwiq.io zones. This means certificate issuance never requires a publicly reachable HTTP server — it works even for Tailscale-only services.

Renewal Cycle

Let's Encrypt certificates expire after 90 days. Certbot renews any certificate expiring within 30 days. The ssl-renew-deploy.yml playbook runs the full pipeline:

  1. certbot renew --quiet — renew any certificates due within 30 days
  2. ssl-deploy-all.yml — deploy renewed certs to all 23 EC2 hosts
  3. acm-import.yml — re-import to ACM for sso.shared.cwiq.io
  4. acm-import.yml — re-import to ACM for gitlab.shared.cwiq.io

Two Deployment Patterns

CWIQ uses two distinct SSL deployment patterns depending on whether a service is directly accessible or sits behind an ALB.

Pattern Used For SSL Termination Point Certificate Storage
Direct EC2 All Tailscale-only and most services Application container (nginx, GitLab nginx) /data/ssl/<domain>/ on the EC2 host
ALB + ACM Authentik HA, GitLab Shared (public internet) AWS ALB AWS Certificate Manager

Direct EC2 Pattern

The cert-server copies fullchain.pem and privkey.pem from /etc/letsencrypt/live/<domain>/ to /data/ssl/<domain>/ on the target host. The files are owned by the service user (e.g., authentik:authentik, vault:vault). Service containers mount /data/ssl/<domain>/ as a volume.

After deployment, the playbook runs the reload_command for that host — typically a docker restart <nginx-container> or docker exec <app> <app>-ctl hup nginx.

ALB + ACM Pattern

For services with public internet access through an ALB, the certificate cannot be mounted directly into a container. Instead, acm-import.yml reads the cert from /etc/letsencrypt/live/<domain>/ on the cert-server and calls the AWS ACM ImportCertificate API. The ALB listener is pre-configured to use that ACM certificate ARN. ACM automatically serves the updated certificate without any ALB reconfiguration.

Key Type: RSA vs ECDSA

sso.shared.cwiq.io must use RSA

sso.shared.cwiq.io is issued as an RSA certificate. All other 22 domains use ECDSA. This is required because AWS Identity Center uses the Authentik SSO certificate for SAML signing — Identity Center does not support ECDSA for SAML federation keys at this time.

Domain Key Type Reason
sso.shared.cwiq.io RSA AWS Identity Center SAML signing requirement
All other 22 domains ECDSA Smaller keys, better performance

Environment Groups

Environment Host Count Access Method
Dev 7 hosts Tailscale-only
Shared-Services 15 hosts (including 2x Authentik HA) Mix: Tailscale + public ALB
Demo 1 host Tailscale-only

See SSL: Inventory for the complete 23-host list.

Certificate Paths

On the Cert-Server

All issued certificates are stored under /etc/letsencrypt/live/ on ansible-shared-cwiq-io:

/etc/letsencrypt/live/
├── gitlab.dev.cwiq.io/
├── taiga.dev.cwiq.io/
├── icinga.dev.cwiq.io/
├── support.dev.cwiq.io/
├── nexus.dev.cwiq.io/
├── orchestrator.dev.cwiq.io/
├── open-project.dev.cwiq.io/
├── sso.shared.cwiq.io/          ← RSA
├── vault.shared.cwiq.io/
├── gitlab.shared.cwiq.io/
├── nexus.shared.cwiq.io/
├── semaphore.shared.cwiq.io/
├── grafana.shared.cwiq.io/
├── prometheus.shared.cwiq.io/
├── sonarqube.shared.cwiq.io/
├── icinga.shared.cwiq.io/
├── defectdojo.shared.cwiq.io/
├── reportportal.shared.cwiq.io/
├── openldap.shared.cwiq.io/
├── langfuse.dev.cwiq.io/
└── orchestrator.demo.cwiq.io/

Each directory contains fullchain.pem, privkey.pem, cert.pem, and chain.pem.

On Target Servers

Certificates are deployed to /data/ssl/<domain>/ on each application server. Each directory contains only fullchain.pem and privkey.pem.

Infrastructure Requirements

Cert-Server IAM Role

The cert-server needs two sets of IAM permissions:

Permission Set Actions Purpose
Route53 route53:ChangeResourceRecordSets, route53:ListHostedZones, route53:GetChange DNS-01 challenge TXT records
ACM acm:ImportCertificate, acm:ListCertificates, acm:DescribeCertificate, acm:AddTagsToCertificate Importing certs for ALB-backed services

SSH Connectivity

The cert-server deploys certs via SSH over Tailscale. The ansible user on the cert-server has the ~/.ssh/cwiq-ansible key for all target hosts.

Nexus Shared hostname workaround

nexus-shared-cwiq-io is accessed via Tailscale IP (100.67.249.34) rather than its MagicDNS hostname because the hostname is not resolvable from the ansible server. This is noted in inventory.yml.

Adding a New Service

  1. Register the domain in cert-server/group_vars/all.yml under cert_domains with key_type: ecdsa (RSA only for services that require it)
  2. Add the host to cert-server/inventory.yml with cert_domain, ssl_owner, ssl_group, and reload_command
  3. Issue the certificate: ansible-playbook ssl-issue-all.yml
  4. Deploy it: ansible-playbook -i inventory.yml ssl-deploy-all.yml --limit <hostname>
  5. If ALB-backed: ansible-playbook -i inventory.yml acm-import.yml -e "cert_domain=<domain>"
  6. Update this documentation and SSL: Inventory