Deploy Patterns¶
How CWIQ CI/CD pipelines deploy services to the DEV environment: the SSH-based deployment flow, rollback mechanism, Temporal worker restart, and the three ways to trigger a deployment.
Scope of CI/CD Deployments¶
CI/CD deploys to DEV only
GitLab CI/CD pipelines deploy exclusively to the DEV environment. All other environments — Demo, Staging, and Production — are deployed through Ansible playbooks run from the ansible server (ansible-shared-cwiq-io). Never attempt to configure a pipeline to deploy directly to Demo, Staging, or Production.
| Environment | Deployment Method | Trigger |
|---|---|---|
| DEV | GitLab CI/CD (deploy-dev job) |
Automatic on main push |
| Demo | Ansible — deploy-orchestrator.yml |
Manual (ansible server) |
| Staging | Ansible — deploy-orchestrator.yml |
Manual (ansible server) |
| Production | Ansible — deploy-orchestrator.yml |
Manual (ansible server) |
DEV Deployment Flow¶
The deploy-dev job executes the following steps in order:
flowchart TD
A[1. Read IMAGE_TAG from build.env] --> B[2. SSH to DEV server<br/>using SSH_PRIVATE_KEY + SSH_USER]
B --> C[3. Pull new Docker image<br/>from Nexus port 8444]
C --> D[4. Save current IMAGE_TAG<br/>to rollback file]
D --> E[5. Write .env file<br/>with environment variables]
E --> F[6. docker compose up -d<br/>with new image]
F --> G[7. Health check<br/>30 retries x 5s delay]
G --> H{HTTP 200?}
H -- Yes --> I[Deploy succeeded]
H -- No --> J[8. Rollback to previous image<br/>docker compose up -d with saved tag]
J --> K[Pipeline fails]
Step-by-Step¶
1. Read IMAGE_TAG — The build.env dotenv artifact from the build stage is injected automatically. $IMAGE_TAG is available as an environment variable (e.g., main-a1b2c3d).
2. SSH to DEV server — The runner pod SSHs to $DEV_SERVER_IP (the VPC private IP from the group-level CI/CD variable) as $SSH_USER using the $SSH_PRIVATE_KEY secret variable. Tailscale is not used because runner pods do not have Tailscale access.
3. Pull new image — On the DEV server, the job runs:
Images are pulled from port8444 (the Nexus Docker group repository, which includes the proxy cache).
4. Save rollback state — The current running image tag is saved to ~/.last-deploy-{service} before the new image is started. This is used by the rollback step.
5. Write .env file — The job writes the service's environment file at the expected path on the DEV server (typically /data/cwiq/{service}/.env). Environment variable values are fetched from Vault JWT at this step using the runner's CI_JOB_JWT_V2.
6. docker compose up -d — The compose file is updated with the new IMAGE_TAG and the service is restarted. Docker Compose performs a rolling replacement of the container.
7. Health check — The job polls the service's health endpoint at http://localhost:{port}/api/health with up to 30 retries at 5-second intervals (150 seconds total). A 200 response marks the deployment as successful.
8. Rollback (on failure) — If the health check fails after all retries, the job reads the saved rollback tag from ~/.last-deploy-{service} and redeploys the previous image. The pipeline job exits with a non-zero code, marking the pipeline as failed and preventing the migrate and verify stages from running.
Temporal Worker Restart¶
Some services — notably the main server — run a Temporal worker as a host systemd service (cwiq-agent-orchestrator.service) in addition to their Docker container. The deploy job restarts this service after the container is up:
The Temporal worker runs as the cwiq-agent-runner user (group cwiq-agents) and uses the virtual environment at /opt/cwiq/orchestrator-venv. Its environment is loaded from /etc/cwiq/cwiq-agent-orchestrator.env.
Services that do not have a Temporal worker (most microservices) skip this step.
Verification Stage¶
After deploy-dev and migrate-dev complete, the verify-dev job performs a final external health check from inside the runner pod:
This differs from the in-deployment health check (which uses localhost) by going through the full external path — Nginx reverse proxy, TLS termination, and the application layer. A failure here indicates a problem with the proxy configuration, certificate, or network routing rather than the application itself.
Pipeline Trigger Patterns¶
Normal Push to main (Full Pipeline)¶
The default behaviour. Every push to main runs the complete pipeline automatically:
Deploy-Only (DEPLOY_ONLY=true)¶
Skips all stages except deploy-dev and verify. Uses the most recent image tag from the last successful build. Use this after a config change, secret rotation, or infrastructure update when no code has changed.
TOKEN="your-personal-access-token"
curl -s -X POST "https://gitlab.shared.cwiq.io/api/v4/projects/5/pipeline" \
--header "PRIVATE-TOKEN: $TOKEN" \
-F "ref=main" \
-F "variables[DEPLOY_ONLY]=true"
Manual Deploy (MANUAL_DEPLOY=true)¶
Runs the full pipeline but makes the deploy-dev job manual (requires a click in the GitLab UI to proceed). Useful when you want to review scan results before committing to a deployment.
curl -s -X POST "https://gitlab.shared.cwiq.io/api/v4/projects/5/pipeline" \
--header "PRIVATE-TOKEN: $TOKEN" \
-F "ref=main" \
-F "variables[MANUAL_DEPLOY]=true"
Database Migrations (migrate-dev)¶
Services with a PostgreSQL database run alembic upgrade head over SSH after the container deployment succeeds. The migration job:
- SSHs to the DEV server
- Executes
alembic upgrade headinside the running container viadocker exec - Checks the exit code — a non-zero exit fails the pipeline
Migrations run after the new container is up but before the verify stage. If a migration fails, the verify stage does not run and the team is alerted to investigate before the broken migration state causes problems.
Services without a database (workers, agent, CLI) do not define a migrate-dev job and skip this stage automatically.
Service Health Endpoints¶
Every CWIQ service exposes a /api/health endpoint. The deploy and verify jobs use these endpoints to confirm a successful deployment.
| Service | Internal Health Check | External Health Check |
|---|---|---|
| server | http://localhost:8000/api/health |
https://orchestrator.dev.cwiq.io/api/health |
| iam-api | http://localhost:8004/api/health |
https://orchestrator.dev.cwiq.io/iam/health |
| audit-api | http://localhost:8007/api/health |
https://orchestrator.dev.cwiq.io/audit/health |
| ai-catalogue-api | http://localhost:8006/api/health |
https://orchestrator.dev.cwiq.io/catalogue/health |
| monitoring-api | http://localhost:8008/api/health |
https://orchestrator.dev.cwiq.io/monitoring/health |
| notification-api | http://localhost:8009/api/health |
https://orchestrator.dev.cwiq.io/notification/health |
| runner-api | http://localhost:8003/api/health |
https://orchestrator.dev.cwiq.io/runner/health |
The server service also exposes an aggregated health endpoint that fans out to all microservices:
Related Documentation¶
- Branch Rules & Workflow — When deploy jobs run and how to trigger them manually
- Pipeline Stages — The full stage sequence including migrate and verify
- Kaniko Docker Builds — How images are built and
IMAGE_TAGis produced - Runners (EKS K8s) — Why VPC private IPs are required for SSH from runner pods