GKE Gitlab Runner Platform¶
This section explains the DevOps team's Google Kubernetes Engine (GKE) GitLab Runner platform configuration. The gitlab-runner-infrastructure repository contains the Terraform configuration for this deployment.
Why host GitLab runners on GKE?¶
The DevOps team has a pool of shared runner VMs which run CI/CD jobs for projects within the
uis/devops
GitLab group by default. So why deploy more runners on GKE? One reason is
authentication. The shared VM runners are great for jobs which do not need to authenticate to other
services, for example building container images or running unit tests. However, once a job needs to
authenticate to a 3rd party service (such as Google Cloud resources) you can end up in the
situation where you're storing credentials/access tokens to powerful service accounts in plain text
CI/CD variables and relying on GitLab permissions to restrict who can see those variables. Not only
is this a potential security issue but managing credentials/tokens in this way can be very
cumbersome.
By deploying per-product runners to a Workload Identity-enabled GKE cluster, it's possible to give each product's runner a dedicated Kubernetes service account identity. This identity can then be granted IAM permissions to access Google Cloud resources via Workload Identity. Then, when CI jobs run for a particular product, the pod that the job is executed in authenticates as the configured service account by default, removing the need to provide service account keys or API access tokens via CI/CD variables. The Authentication and impersonation section below explains this in more detail.
Authentication and impersonation¶
How it works¶
Each runner is deployed to a dedicated Kubernetes namespace and with a dedicated Kubernetes service
account. The namespace name is usually the product's slug (e.g. gaobase
) and the service account
is always named gke-ci-run
.
When a product runner executes a CI job it does so in a pod running in the product's namespace and
in the context of the product's gke-ci-run
service account. This allows Google IAM roles to be
assigned to the service account via Workload Identity to allow CI jobs to access Google services
which may require authentication.
Service account impersonation¶
While it's possible to assign roles directly to the gke-ci-run
service account using Workload
Identity, this is not recommended. This often leads to a single service account becoming very
powerful, having multiple IAM roles granted to it to cater for the various CI jobs that a product
might need to run.
Instead, the recommendation is to create individual Google IAM service accounts for each task/job as
required. These service accounts are assigned only the roles required to perform their specific
tasks. The gke-ci-run
Kubernetes service account is then granted permission to impersonate the
required Google IAM service accounts to perform the tasks required in the CI jobs.
Using docker in CI jobs¶
The GKE-hosted runners are configured in privileged mode which allows the use of docker-in-docker
(dind) as a service in CI jobs. The dind service provides access to the docker
command in CI jobs
which is useful for tasks such as building container images, and running docker compose
commands.
To enable the dind service for a particular CI job you must define the image
, service
and variables
as follows.
test-job:
image: docker:24-git
services:
- docker:24-dind
variables:
DOCKER_HOST: tcp://docker:2376
DOCKER_TLS_CERTDIR: "/certs"
DOCKER_TLS_VERIFY: 1
DOCKER_CERT_PATH: "/certs/client"
before_script:
- until docker info > /dev/null 2>&1; do sleep 1; done
The before_script
is optional, but recommended for most use cases since the docker service container
takes some time to prepare the certificates. This is to avoid the
known issue with docker initialization.
Note
These variables are specific to the Kubernetes GitLab runner executor and may not be required for other runner types.