Skip to content

EKS Cluster

cwiq-dev-eks-cluster is the Kubernetes cluster hosting all GitLab CI/CD runners. It replaced the legacy Fleeting/EC2 Autoscaling runners on 2026-02-26 and is now the sole active runner system.


Cluster Reference

Attribute Value
Cluster Name cwiq-dev-eks-cluster
Account dev (686123185567)
Kubernetes Version 1.31
Node Autoscaler Karpenter v1.1.1
Region us-west-2
kubectl Access Ansible server only (ansible-shared-cwiq-io)

Node Groups

System Node Group (cwiq-dev-eks-system)

Fixed node group for Karpenter controller, CoreDNS, and VPC CNI pods:

Attribute Value
Instance Type t3.medium
Desired/Min/Max 1 / 1 / 2
EC2 Name Tag cwiq-dev-eks-system

Single system node behavior

With desired_size=1, one Karpenter replica will always show Pending — this is expected. Scale to 2 when the team grows and runner concurrency increases.

Karpenter-Managed Runner Nodes

Karpenter dynamically provisions EC2 nodes for pipeline jobs. Instance type selection is defined in the NodePool resource.

Karpenter IAM constraint

Karpenter's ec2:RunInstances IAM permission cannot use aws:RequestTag conditions. The IAM policy must use resource-level conditions instead. This is a known Karpenter limitation.


GitLab Runner Tiers

Three runners are registered, selected by pipeline job tags:

Runner GitLab ID Tag Typical Use Karpenter Instance
k8s-small 19 small Most jobs (validate, test, deploy-dev) t3.medium
k8s-medium 20 medium UI Kaniko builds (8 GB RAM required) t3.large
k8s-large 21 large Executor Nuitka + rpmbuild t3.xlarge

UI builds require medium runner

Kaniko image builds for the React UI require medium tag (t3.large, 8 GB RAM). Using small (t3.medium, 4 GB) will OOM and fail the build.


Pod Networking

No Tailscale in EKS pods

GitLab runner pods use the VPC CNI plugin and receive real VPC IP addresses in subnets 10.1.34.0/24 and 10.1.35.0/24. Pods have no Tailscale connectivity.

Any job that must reach a server (e.g., deploy-dev SSH to the orchestrator) must use the VPC private IP, not the Tailscale IP or Tailscale hostname.

# Correct — deploy-dev job configuration
variables:
  # Set at GitLab group level (group 9) — DO NOT define per-project
  DEV_SERVER_IP: "10.1.35.46"  # VPC private IP of orchestrator-dev-cwiq-io

Pods can reach shared-services resources over VPC peering: - Nexus: nexus.shared.cwiq.io → resolves to VPC private IP via private Route53 zone - SonarQube: sonarqube.shared.cwiq.io → resolves to 10.0.10.8 - Vault: vault.shared.cwiq.io → Vault API port 8200


Runner Security Context

runAsNonRoot default

GitLab Runner on Kubernetes defaults to runAsNonRoot: true. This causes most container jobs to fail unless the runner TOML config overrides it:

[[runners.kubernetes.pod_security_context]]
  run_as_non_root = false

This is already configured on all three CWIQ runners.


Legacy Runners (PAUSED)

The legacy Fleeting/EC2 Autoscaling runners are paused and the Runner Manager EC2 is stopped:

Runner ID Status
dev-small 16 Paused (2026-02-27)
dev-medium 17 Paused (2026-02-27)
dev-large 18 Paused (2026-02-27)
Runner Manager EC2 i-0af3f2d4bf8a4f1d2 Stopped (can restart if needed)

Remaining cleanup: terminate the Runner Manager EC2 and delete runners 16-18 from GitLab.


kubectl Access

kubectl is configured only on the ansible server. Do not attempt to use kubectl from the dev server or local workstation without first configuring kubeconfig.

ssh ansible@ansible-shared-cwiq-io
ansible-helper
kubectl get nodes
kubectl get pods -n gitlab-runner

Terraform Location

terraform-plan/organization/environments/dev/
├── eks-cluster/          ← Cluster, node group, Karpenter
├── eks-karpenter/        ← Karpenter NodePool and NodeClass
└── gitlab-runners-k8s/   ← Runner Helm chart deployment