Skip to content

Deploy Patterns

How CWIQ CI/CD pipelines deploy services to the DEV environment: the SSH-based deployment flow, rollback mechanism, Temporal worker restart, and the three ways to trigger a deployment.


Scope of CI/CD Deployments

CI/CD deploys to DEV only

GitLab CI/CD pipelines deploy exclusively to the DEV environment. All other environments — Demo, Staging, and Production — are deployed through Ansible playbooks run from the ansible server (ansible-shared-cwiq-io). Never attempt to configure a pipeline to deploy directly to Demo, Staging, or Production.

Environment Deployment Method Trigger
DEV GitLab CI/CD (deploy-dev job) Automatic on main push
Demo Ansible — deploy-orchestrator.yml Manual (ansible server)
Staging Ansible — deploy-orchestrator.yml Manual (ansible server)
Production Ansible — deploy-orchestrator.yml Manual (ansible server)

DEV Deployment Flow

The deploy-dev job executes the following steps in order:

flowchart TD
    A[1. Read IMAGE_TAG from build.env] --> B[2. SSH to DEV server<br/>using SSH_PRIVATE_KEY + SSH_USER]
    B --> C[3. Pull new Docker image<br/>from Nexus port 8444]
    C --> D[4. Save current IMAGE_TAG<br/>to rollback file]
    D --> E[5. Write .env file<br/>with environment variables]
    E --> F[6. docker compose up -d<br/>with new image]
    F --> G[7. Health check<br/>30 retries x 5s delay]
    G --> H{HTTP 200?}
    H -- Yes --> I[Deploy succeeded]
    H -- No --> J[8. Rollback to previous image<br/>docker compose up -d with saved tag]
    J --> K[Pipeline fails]

Step-by-Step

1. Read IMAGE_TAG — The build.env dotenv artifact from the build stage is injected automatically. $IMAGE_TAG is available as an environment variable (e.g., main-a1b2c3d).

2. SSH to DEV server — The runner pod SSHs to $DEV_SERVER_IP (the VPC private IP from the group-level CI/CD variable) as $SSH_USER using the $SSH_PRIVATE_KEY secret variable. Tailscale is not used because runner pods do not have Tailscale access.

3. Pull new image — On the DEV server, the job runs:

docker pull nexus.shared.cwiq.io:8444/orchestrator-{service}:${IMAGE_TAG}
Images are pulled from port 8444 (the Nexus Docker group repository, which includes the proxy cache).

4. Save rollback state — The current running image tag is saved to ~/.last-deploy-{service} before the new image is started. This is used by the rollback step.

5. Write .env file — The job writes the service's environment file at the expected path on the DEV server (typically /data/cwiq/{service}/.env). Environment variable values are fetched from Vault JWT at this step using the runner's CI_JOB_JWT_V2.

6. docker compose up -d — The compose file is updated with the new IMAGE_TAG and the service is restarted. Docker Compose performs a rolling replacement of the container.

7. Health check — The job polls the service's health endpoint at http://localhost:{port}/api/health with up to 30 retries at 5-second intervals (150 seconds total). A 200 response marks the deployment as successful.

8. Rollback (on failure) — If the health check fails after all retries, the job reads the saved rollback tag from ~/.last-deploy-{service} and redeploys the previous image. The pipeline job exits with a non-zero code, marking the pipeline as failed and preventing the migrate and verify stages from running.


Temporal Worker Restart

Some services — notably the main server — run a Temporal worker as a host systemd service (cwiq-agent-orchestrator.service) in addition to their Docker container. The deploy job restarts this service after the container is up:

sudo systemctl restart cwiq-agent-orchestrator.service

The Temporal worker runs as the cwiq-agent-runner user (group cwiq-agents) and uses the virtual environment at /opt/cwiq/orchestrator-venv. Its environment is loaded from /etc/cwiq/cwiq-agent-orchestrator.env.

Services that do not have a Temporal worker (most microservices) skip this step.


Verification Stage

After deploy-dev and migrate-dev complete, the verify-dev job performs a final external health check from inside the runner pod:

curl --fail --retry 10 --retry-delay 3 \
  "https://orchestrator.dev.cwiq.io/api/health"

This differs from the in-deployment health check (which uses localhost) by going through the full external path — Nginx reverse proxy, TLS termination, and the application layer. A failure here indicates a problem with the proxy configuration, certificate, or network routing rather than the application itself.


Pipeline Trigger Patterns

Normal Push to main (Full Pipeline)

The default behaviour. Every push to main runs the complete pipeline automatically:

validate → test → build → push → scan → deploy-dev → migrate → verify

Deploy-Only (DEPLOY_ONLY=true)

Skips all stages except deploy-dev and verify. Uses the most recent image tag from the last successful build. Use this after a config change, secret rotation, or infrastructure update when no code has changed.

TOKEN="your-personal-access-token"
curl -s -X POST "https://gitlab.shared.cwiq.io/api/v4/projects/5/pipeline" \
  --header "PRIVATE-TOKEN: $TOKEN" \
  -F "ref=main" \
  -F "variables[DEPLOY_ONLY]=true"

Manual Deploy (MANUAL_DEPLOY=true)

Runs the full pipeline but makes the deploy-dev job manual (requires a click in the GitLab UI to proceed). Useful when you want to review scan results before committing to a deployment.

curl -s -X POST "https://gitlab.shared.cwiq.io/api/v4/projects/5/pipeline" \
  --header "PRIVATE-TOKEN: $TOKEN" \
  -F "ref=main" \
  -F "variables[MANUAL_DEPLOY]=true"

Database Migrations (migrate-dev)

Services with a PostgreSQL database run alembic upgrade head over SSH after the container deployment succeeds. The migration job:

  1. SSHs to the DEV server
  2. Executes alembic upgrade head inside the running container via docker exec
  3. Checks the exit code — a non-zero exit fails the pipeline

Migrations run after the new container is up but before the verify stage. If a migration fails, the verify stage does not run and the team is alerted to investigate before the broken migration state causes problems.

Services without a database (workers, agent, CLI) do not define a migrate-dev job and skip this stage automatically.


Service Health Endpoints

Every CWIQ service exposes a /api/health endpoint. The deploy and verify jobs use these endpoints to confirm a successful deployment.

Service Internal Health Check External Health Check
server http://localhost:8000/api/health https://orchestrator.dev.cwiq.io/api/health
iam-api http://localhost:8004/api/health https://orchestrator.dev.cwiq.io/iam/health
audit-api http://localhost:8007/api/health https://orchestrator.dev.cwiq.io/audit/health
ai-catalogue-api http://localhost:8006/api/health https://orchestrator.dev.cwiq.io/catalogue/health
monitoring-api http://localhost:8008/api/health https://orchestrator.dev.cwiq.io/monitoring/health
notification-api http://localhost:8009/api/health https://orchestrator.dev.cwiq.io/notification/health
runner-api http://localhost:8003/api/health https://orchestrator.dev.cwiq.io/runner/health

The server service also exposes an aggregated health endpoint that fans out to all microservices:

curl -s "https://orchestrator.dev.cwiq.io/api/health?detailed=true" | python3 -m json.tool