Skip to content

WIP: Do better at cleaning up Docker images

Daniel Stone requested to merge wip-docker-image-clean into master

This script is a WIP attempt to create something smarter than docker-gc.

Currently, docker-gc runs on a cron, and deletes every Docker volume it sees (not an issue for ccache as it is mounted into the host filesystem), every stopped container it sees, and every image older than 12-24h (depending on runner disk size and usage).

This is too blunt a hammer, because it means we have to constantly keep downloading common images after they get deleted. It means that we have to manually clean up disks when jobs fail, because big jobs can fill up the disk - but we also still delete images when we don't need to because there's plenty of space.

The script in this commit takes a smarter approach, by:

  • monitoring the free disk space every five minutes
  • if the free space falls below an acceptable threshold, pruning the least-recently-used images until the space comes back above the threshold, ignoring Mesa/GStreamer base images as they will very likely be immediately re-pulled
  • only as an emergency measure, deleting the Mesa/GStreamer base images if required, still on an LRU basis

However, this script is also completely useless.

Every gitlab-runner container starts the gitlab-runner-helper image, which pulls the Git repository, parses the CI YAML definitions, sets the container up to run in the defined context, then executes the actual work. The images defined in the CI job are pulled locally, but as they are not referred to by any containers, they are all equal on the LRU front with a usage time of zero.

Examining the gitlab-runner code, I tried monitoring the containerd events to see when it started each job and get details of the image, however it was silent on this - and gRPC + containerd API are really painful to use.

It would probably work to go forward with this script (or something like docuum), but taking the image-pull time as the LRU timestamp, and continuing to whitelist the base images. In order to get actual-LRU, we would have to get sideband information from gitlab-runner about which images it was using at which times, or from the GitLab daemon itself.

/cc @anholt @bentiss

Merge request reports

Loading