GKE Gitlab Runner Platform¶

This section explains the DevOps team's Google Kubernetes Engine (GKE) GitLab Runner platform configuration. The gitlab-runner-infrastructure repository contains the Terraform configuration for this deployment.

Why host GitLab runners on GKE?¶

The DevOps team has a pool of shared runner VMs which run CI/CD jobs for projects within the uis/devops GitLab group by default. So why deploy more runners on GKE? One reason is authentication. The shared VM runners are great for jobs which do not need to authenticate to other services, for example building container images or running unit tests. However, once a job needs to authenticate to a 3rd party service (such as Google Cloud resources) you can end up in the situation where you're storing credentials/access tokens to powerful service accounts in plain text CI/CD variables and relying on GitLab permissions to restrict who can see those variables. Not only is this a potential security issue but managing credentials/tokens in this way can be very cumbersome.

By deploying per-product runners to a Workload Identity-enabled GKE cluster, it's possible to give each product's runner a dedicated Kubernetes service account identity. This identity can then be granted IAM permissions to access Google Cloud resources via Workload Identity. Then, when CI jobs run for a particular product, the pod that the job is executed in authenticates as the configured service account by default, removing the need to provide service account keys or API access tokens via CI/CD variables. The Authentication and impersonation section below explains this in more detail.

Authentication and impersonation¶

How it works¶

Each runner is deployed to a dedicated Kubernetes namespace and with a dedicated Kubernetes service account. The namespace name is usually the product's slug (e.g. gaobase) and the service account is always named gke-ci-run.

When a product runner executes a CI job it does so in a pod running in the product's namespace and in the context of the product's gke-ci-run service account. This allows Google IAM roles to be assigned to the service account via Workload Identity to allow CI jobs to access Google services which may require authentication.

Service account impersonation¶

While it's possible to assign roles directly to the gke-ci-run service account using Workload Identity, this is not recommended. This often leads to a single service account becoming very powerful, having multiple IAM roles granted to it to cater for the various CI jobs that a product might need to run.

Instead, the recommendation is to create individual Google IAM service accounts for each task/job as required. These service accounts are assigned only the roles required to perform their specific tasks. The gke-ci-run Kubernetes service account is then granted permission to impersonate the required Google IAM service accounts to perform the tasks required in the CI jobs.

Using docker in CI jobs¶

The GKE-hosted runners are configured in privileged mode which allows the use of docker-in-docker (dind) as a service in CI jobs. The dind service provides access to the docker command in CI jobs which is useful for tasks such as building container images, and running docker compose commands.

To enable the dind service for a particular CI job you must define the image, service and variables as follows.

.gitlab-ci.yml

test-job:
  image: docker:24-git
  services:
    - docker:24-dind
  variables:
    DOCKER_HOST: tcp://docker:2376
    DOCKER_TLS_CERTDIR: "/certs"
    DOCKER_TLS_VERIFY: 1
    DOCKER_CERT_PATH: "/certs/client"
  before_script:
    - until docker info > /dev/null 2>&1; do sleep 1; done

The before_script is optional, but recommended for most use cases since the docker service container takes some time to prepare the certificates. This is to avoid the known issue with docker initialization.

Note

These variables are specific to the Kubernetes GitLab runner executor and may not be required for other runner types.