EKS Cluster¶

cwiq-dev-eks-cluster is the Kubernetes cluster hosting all GitLab CI/CD runners. It replaced the legacy Fleeting/EC2 Autoscaling runners on 2026-02-26 and is now the sole active runner system.

Cluster Reference¶

Attribute	Value
Cluster Name	`cwiq-dev-eks-cluster`
Account	dev (`686123185567`)
Kubernetes Version	1.31
Node Autoscaler	Karpenter v1.1.1
Region	`us-west-2`
kubectl Access	Ansible server only (`ansible-shared-cwiq-io`)

Node Groups¶

System Node Group (`cwiq-dev-eks-system`)¶

Fixed node group for Karpenter controller, CoreDNS, and VPC CNI pods:

Attribute	Value
Instance Type	`t3.medium`
Desired/Min/Max	1 / 1 / 2
EC2 Name Tag	`cwiq-dev-eks-system`

Single system node behavior

With desired_size=1, one Karpenter replica will always show Pending — this is expected. Scale to 2 when the team grows and runner concurrency increases.

Karpenter-Managed Runner Nodes¶

Karpenter dynamically provisions EC2 nodes for pipeline jobs. Instance type selection is defined in the NodePool resource.

Karpenter IAM constraint

Karpenter's ec2:RunInstances IAM permission cannot use aws:RequestTag conditions. The IAM policy must use resource-level conditions instead. This is a known Karpenter limitation.

GitLab Runner Tiers¶

Three runners are registered, selected by pipeline job tags:

Runner	GitLab ID	Tag	Typical Use	Karpenter Instance
k8s-small	19	`small`	Most jobs (validate, test, deploy-dev)	t3.medium
k8s-medium	20	`medium`	UI Kaniko builds (8 GB RAM required)	t3.large
k8s-large	21	`large`	Executor Nuitka + rpmbuild	t3.xlarge

UI builds require medium runner

Kaniko image builds for the React UI require medium tag (t3.large, 8 GB RAM). Using small (t3.medium, 4 GB) will OOM and fail the build.

Pod Networking¶

No Tailscale in EKS pods

GitLab runner pods use the VPC CNI plugin and receive real VPC IP addresses in subnets 10.1.34.0/24 and 10.1.35.0/24. Pods have no Tailscale connectivity.

Any job that must reach a server (e.g., deploy-dev SSH to the orchestrator) must use the VPC private IP, not the Tailscale IP or Tailscale hostname.

# Correct — deploy-dev job configuration
variables:
  # Set at GitLab group level (group 9) — DO NOT define per-project
  DEV_SERVER_IP: "10.1.35.46"  # VPC private IP of orchestrator-dev-cwiq-io

Pods can reach shared-services resources over VPC peering: - Nexus: nexus.shared.cwiq.io → resolves to VPC private IP via private Route53 zone - SonarQube: sonarqube.shared.cwiq.io → resolves to 10.0.10.8 - Vault: vault.shared.cwiq.io → Vault API port 8200

Runner Security Context¶

runAsNonRoot default

GitLab Runner on Kubernetes defaults to runAsNonRoot: true. This causes most container jobs to fail unless the runner TOML config overrides it:

[[runners.kubernetes.pod_security_context]]
  run_as_non_root = false

This is already configured on all three CWIQ runners.

Legacy Runners (PAUSED)¶

The legacy Fleeting/EC2 Autoscaling runners are paused and the Runner Manager EC2 is stopped:

Runner	ID	Status
dev-small	16	Paused (2026-02-27)
dev-medium	17	Paused (2026-02-27)
dev-large	18	Paused (2026-02-27)
Runner Manager EC2	`i-0af3f2d4bf8a4f1d2`	Stopped (can restart if needed)

Remaining cleanup: terminate the Runner Manager EC2 and delete runners 16-18 from GitLab.

kubectl Access¶

kubectl is configured only on the ansible server. Do not attempt to use kubectl from the dev server or local workstation without first configuring kubeconfig.

ssh ansible@ansible-shared-cwiq-io
ansible-helper
kubectl get nodes
kubectl get pods -n gitlab-runner

Terraform Location¶

terraform-plan/organization/environments/dev/
├── eks-cluster/          ← Cluster, node group, Karpenter
├── eks-karpenter/        ← Karpenter NodePool and NodeClass
└── gitlab-runners-k8s/   ← Runner Helm chart deployment

Dev Account — Dev server that runners deploy to
VPC & Networking — Subnet CIDRs for pod IP ranges
Security Groups — EKS cluster SG rules
GitLab Runner Architecture — Pipeline configuration