Deploy Patterns¶

How CWIQ CI/CD pipelines deploy services to the DEV environment: the SSH-based deployment flow, rollback mechanism, Temporal worker restart, and the three ways to trigger a deployment.

Scope of CI/CD Deployments¶

CI/CD deploys to DEV only

GitLab CI/CD pipelines deploy exclusively to the DEV environment. All other environments — Demo, Staging, and Production — are deployed through Ansible playbooks run from the ansible server (ansible-shared-cwiq-io). Never attempt to configure a pipeline to deploy directly to Demo, Staging, or Production.

Environment	Deployment Method	Trigger
DEV	GitLab CI/CD (`deploy-dev` job)	Automatic on `main` push
Demo	Ansible — `deploy-orchestrator.yml`	Manual (ansible server)
Staging	Ansible — `deploy-orchestrator.yml`	Manual (ansible server)
Production	Ansible — `deploy-orchestrator.yml`	Manual (ansible server)

DEV Deployment Flow¶

The deploy-dev job executes the following steps in order:

flowchart TD
    A[1. Read IMAGE_TAG from build.env] --> B[2. SSH to DEV server<br/>using SSH_PRIVATE_KEY + SSH_USER]
    B --> C[3. Pull new Docker image<br/>from Nexus port 8444]
    C --> D[4. Save current IMAGE_TAG<br/>to rollback file]
    D --> E[5. Write .env file<br/>with environment variables]
    E --> F[6. docker compose up -d<br/>with new image]
    F --> G[7. Health check<br/>30 retries x 5s delay]
    G --> H{HTTP 200?}
    H -- Yes --> I[Deploy succeeded]
    H -- No --> J[8. Rollback to previous image<br/>docker compose up -d with saved tag]
    J --> K[Pipeline fails]

Step-by-Step¶

1. Read IMAGE_TAG — The build.env dotenv artifact from the build stage is injected automatically. $IMAGE_TAG is available as an environment variable (e.g., main-a1b2c3d).

2. SSH to DEV server — The runner pod SSHs to $DEV_SERVER_IP (the VPC private IP from the group-level CI/CD variable) as $SSH_USER using the $SSH_PRIVATE_KEY secret variable. Tailscale is not used because runner pods do not have Tailscale access.

3. Pull new image — On the DEV server, the job runs:

docker pull nexus.shared.cwiq.io:8444/orchestrator-{service}:${IMAGE_TAG}

Images are pulled from port 8444 (the Nexus Docker group repository, which includes the proxy cache).

4. Save rollback state — The current running image tag is saved to ~/.last-deploy-{service} before the new image is started. This is used by the rollback step.

5. Write .env file — The job writes the service's environment file at the expected path on the DEV server (typically /data/cwiq/{service}/.env). Environment variable values are fetched from Vault JWT at this step using the runner's CI_JOB_JWT_V2.

6. docker compose up -d — The compose file is updated with the new IMAGE_TAG and the service is restarted. Docker Compose performs a rolling replacement of the container.

7. Health check — The job polls the service's health endpoint at http://localhost:{port}/api/health with up to 30 retries at 5-second intervals (150 seconds total). A 200 response marks the deployment as successful.

8. Rollback (on failure) — If the health check fails after all retries, the job reads the saved rollback tag from ~/.last-deploy-{service} and redeploys the previous image. The pipeline job exits with a non-zero code, marking the pipeline as failed and preventing the migrate and verify stages from running.

Temporal Worker Restart¶

Some services — notably the main server — run a Temporal worker as a host systemd service (cwiq-agent-orchestrator.service) in addition to their Docker container. The deploy job restarts this service after the container is up:

sudo systemctl restart cwiq-agent-orchestrator.service

The Temporal worker runs as the cwiq-agent-runner user (group cwiq-agents) and uses the virtual environment at /opt/cwiq/orchestrator-venv. Its environment is loaded from /etc/cwiq/cwiq-agent-orchestrator.env.

Services that do not have a Temporal worker (most microservices) skip this step.

Verification Stage¶

After deploy-dev and migrate-dev complete, the verify-dev job performs a final external health check from inside the runner pod:

curl --fail --retry 10 --retry-delay 3 \
  "https://orchestrator.dev.cwiq.io/api/health"

This differs from the in-deployment health check (which uses localhost) by going through the full external path — Nginx reverse proxy, TLS termination, and the application layer. A failure here indicates a problem with the proxy configuration, certificate, or network routing rather than the application itself.

Pipeline Trigger Patterns¶

Normal Push to `main` (Full Pipeline)¶

The default behaviour. Every push to main runs the complete pipeline automatically:

validate → test → build → push → scan → deploy-dev → migrate → verify

Deploy-Only (`DEPLOY_ONLY=true`)¶

Skips all stages except deploy-dev and verify. Uses the most recent image tag from the last successful build. Use this after a config change, secret rotation, or infrastructure update when no code has changed.

TOKEN="your-personal-access-token"
curl -s -X POST "https://gitlab.shared.cwiq.io/api/v4/projects/5/pipeline" \
  --header "PRIVATE-TOKEN: $TOKEN" \
  -F "ref=main" \
  -F "variables[DEPLOY_ONLY]=true"

Manual Deploy (`MANUAL_DEPLOY=true`)¶

Runs the full pipeline but makes the deploy-dev job manual (requires a click in the GitLab UI to proceed). Useful when you want to review scan results before committing to a deployment.

curl -s -X POST "https://gitlab.shared.cwiq.io/api/v4/projects/5/pipeline" \
  --header "PRIVATE-TOKEN: $TOKEN" \
  -F "ref=main" \
  -F "variables[MANUAL_DEPLOY]=true"

Database Migrations (migrate-dev)¶

Services with a PostgreSQL database run alembic upgrade head over SSH after the container deployment succeeds. The migration job:

SSHs to the DEV server
Executes alembic upgrade head inside the running container via docker exec
Checks the exit code — a non-zero exit fails the pipeline

Migrations run after the new container is up but before the verify stage. If a migration fails, the verify stage does not run and the team is alerted to investigate before the broken migration state causes problems.

Services without a database (workers, agent, CLI) do not define a migrate-dev job and skip this stage automatically.

Service Health Endpoints¶

Every CWIQ service exposes a /api/health endpoint. The deploy and verify jobs use these endpoints to confirm a successful deployment.

Service	Internal Health Check	External Health Check
server	`http://localhost:8000/api/health`	`https://orchestrator.dev.cwiq.io/api/health`
iam-api	`http://localhost:8004/api/health`	`https://orchestrator.dev.cwiq.io/iam/health`
audit-api	`http://localhost:8007/api/health`	`https://orchestrator.dev.cwiq.io/audit/health`
ai-catalogue-api	`http://localhost:8006/api/health`	`https://orchestrator.dev.cwiq.io/catalogue/health`
monitoring-api	`http://localhost:8008/api/health`	`https://orchestrator.dev.cwiq.io/monitoring/health`
notification-api	`http://localhost:8009/api/health`	`https://orchestrator.dev.cwiq.io/notification/health`
runner-api	`http://localhost:8003/api/health`	`https://orchestrator.dev.cwiq.io/runner/health`

The server service also exposes an aggregated health endpoint that fans out to all microservices:

curl -s "https://orchestrator.dev.cwiq.io/api/health?detailed=true" | python3 -m json.tool

Branch Rules & Workflow — When deploy jobs run and how to trigger them manually
Pipeline Stages — The full stage sequence including migrate and verify
Kaniko Docker Builds — How images are built and IMAGE_TAG is produced
Runners (EKS K8s) — Why VPC private IPs are required for SSH from runner pods