Some context
Recently we have been working on migrating our CI infrastructure from CircleCI to Buildkite. As a result, we are manipulating and creating a lot of YAML config files across our various apps, libraries, and tools.
Given that we have many similar and common values and nodes across pipelines, jobs and steps in each config file, we want to avoid repeating ourselves. There are a couple of techniques that CI providers offer to mutualize common actions or setups (like Orbs for CircleCI or Plugins for Buildkite), but there are also cases where we want the DRYing to be more local.
We also aim to nicely isolate some constants in our YAMLs for when we need to use the same value for a parameter (like the Xcode version, Simulator version, Android SDK, Docker plugin, etc) for multiple individual jobs and steps (like test, lint and build steps).
This is where YAML anchors come to the rescue.
Introducing YAML anchors
Anchors are a built-in YAML feature which allows us to give a node an internal name. We can then reference that internal name later, in any place where a node is expected, to get the same outcome as if we copy/pasted that node at that position.
The internal name we use for that anchor is arbitrary, and has no impact on the document’s final content; we can think of it as similar to a variable or a pointer in other languages like Swift or C.
We define an anchor using the &some_name
syntax immediately before the YAML node we want that anchor to point to. We can then use the *some_name
syntax later in the YAML to reference that anchor as many times as we want.
Let’s look at an snippet, first without using anchors:
steps:
- label: "lint"
plugins:
- docker#v3.8.0:
image: "androidsdk/android-30"
command: "./gradlew lintWordpressVanillaRelease"
- label: "Test WordPress"
plugins:
- docker#v3.8.0:
image: "androidsdk/android-30"
command: "./gradlew testWordpressVanillaRelease"
Here we can see that the name of the Docker image, from the image:
key, is the same for both steps. We can introduce an anchor on that string node, to avoid that repetition:
steps:
- label: "lint"
plugins:
- docker#v3.8.0:
image: &android-image "androidsdk/android-30"
command: "./gradlew lintWordpressVanillaRelease"
- label: "Test WordPress"
plugins:
- docker#v3.8.0:
image: *android-image
command: "./gradlew testWordpressVanillaRelease"
With this simple trick we have avoided repeating the string by giving the node an internal name – aka android-image
– and reusing that reference later.
You might have already noticed though, that not only the string, but in fact the whole node representing the docker#v3.8.0
plugin is identical in both steps. So why not make the anchor point to the whole node declaring that docker plugin?
steps:
- label: "lint"
plugins:
- &android-docker-image
docker#v3.8.0:
image: "androidsdk/android-30"
command: "./gradlew lintWordpressVanillaRelease"
- label: "Test WordPress"
plugins:
- *android-docker-image
command: "./gradlew testWordpressVanillaRelease"
Here, not only have we declared the anchor on the whole docker#v3.8.0: …
map node, but we also went to the next line between the &android-docker-image
anchor and the content of the node. This is totally valid, as newlines in that particular position in a YAML have no incidence in the interpretation of the tree’s content; and in this case results in a welcome readability improvement.
Note that this node is a map – aka dictionary – with a single key docker#v3.8.0
, and its associated value is another (sub-)dictionary with just the image:
key. An anchor always points to a whole node that follows; which means that when we use *android-docker-image
later in the YAML, this references (and “pastes”) the whole original dictionary node, including all its children.
Grouping the constants
To help clarity and maintenance, we like to move all our “node constants” at the top of our file, so that they are easier to find and give us a single place to look for when we consider updating the plugin versions, images, etc.
The trick to do that is to take advantage of the fact that most CIs (at least CircleCI and Buildkite) ignore any top-level keys that they don’t recognize and that don’t have any special meaning to them. We can leverage this to move all our constant, reusable nodes under such an arbitrary key at the top, and organize those constants any way we want there.
# arbitrary top-level key we use to "namespace" our anchors
reusable-nodes:
- &android-docker-image
docker#v3.8.0:
image: "androidsdk/android-30"
# official "steps" key, used by Buildkite to describe our pipeline
steps:
- label: "lint"
plugins:
- *android-docker-image
command: "./gradlew lintWordpressVanillaRelease"
- label: "Test WordPress"
plugins:
- *android-docker-image
command: "./gradlew testWordpressVanillaRelease"
As we can see above, this trick also helps us get rid of the slight asymmetry that we previously had in our YAML, where the first node needing that plugin – being the one where we declare our anchor – looked a bit different and unbalanced compared to the other places needing that same plugin but only using the *android-docker-image
alias.
Better conventions
This non-official, top-level key we have been using so far is merely a way for us to isolate our anchor definitions, and avoid the risk of them colliding with any official top-level keys used by Buildkite. In practice, we can use whatever name and substructures we want to organize those anchors there. After all, only the node that is after an &anchor
declaration will be part of that anchor definition, so the parent nodes those are defined in won’t have any impact on the anchors themselves.
Having said that, it is always good to have some conventions to keep things nice, consistent and tidy. Here are the ones we have started to adopt in our own pipeline.yml
files:
- Declare a single, top-level key named
common_params
at the top of the file - As the value for that key, use a sub-dictionary to organize our anchors into “namespaces”.
- By convention we make each key of that sub-dictionary match the name of the key where the anchors are aimed to be used later in the
steps
. This helps us remember the intended context those anchors are supposed to be used in. - If we need to define multiple anchors in that namespace, use an array as the value for that sub-key/namespace, each item of the array declaring an anchor and its associated constant node.
- By convention we make each key of that sub-dictionary match the name of the key where the anchors are aimed to be used later in the
- To help readability, insert a newline between the
&anchor_name
and the node it defines.
This makes our typical pipeline files look like this – with an iOS example this time, to shake things up:
# Nodes with values to reuse in the pipeline.
common_params:
plugins:
- &bash_plugin
automattic/bash-cache#v1.5.0
- &s3cache_plugin
automattic/git-s3-cache#v1.1.0:
bucket: "a8c-repo-mirrors"
repo: "wordpress-mobile/wordpress-ios/"
- &docker-gplint
docker#v3.8.0:
image: "public.ecr.aws/automattic/glotpress-validator:1.0.0"
- &common_plugins
[*bash_plugin, *s3cache_plugin]
env: &common_env
IMAGE_ID: xcode-12.5.1
# This is the default pipeline – it will build and test the app
steps:
- label: "Build"
key: "build"
command: ".buildkite/commands/build-for-testing.sh"
env: *common_env
plugins: *common_plugins
- label: "Unit Tests"
command: ".buildkite/commands/run-unit-tests.sh"
depends_on: "build"
env: *common_env
plugins: *common_plugins
- label: "Lint Translations"
command: "gplint /workdir/WordPress/Resources/AppStoreStrings.po"
plugins:
- *docker-gplint
agents:
queue: "default"
- label: "UI Tests (iPhone)"
command: .buildkite/commands/run-ui-tests.sh WordPressUITests 'iPhone 11' 14.1
depends_on: "build"
env: *common_env
plugins: *common_plugins
artifact_paths:
- "build/results/"
- label: "UI Tests (iPad)"
command: .buildkite/commands/run-ui-tests.sh WordPressUITests "iPad Air (4th generation)" 14.1
depends_on: "build"
env: *common_env
plugins: *common_plugins
artifact_paths:
- "build/results/"
Notice how we use those anchors quite liberally here:
- We grouped all the anchors declaring Buildkite plugins under the
plugins:
sub-key, and the anchor related to the list of environment variables under theenv:
key.- As per our conventions, those key names were picked according to which key in the steps those anchors were supposed to be used with.
- The
plugins:
key we use as namespace is an array, as we have different plugin constants to declare; but theenv:
key directly contains the one and only anchor we need to reuse in all steps.
- We also declared an anchor called
&common_plugins
that is a bit different than the others:- It declares an array using the
[x,y,z]
flow notation of YAML, which is similar to JSON, and an alternate and equivalent – but more compact – notation than the one using the dashed list syntax you might be more familiar with. - This array itself contains references to the two anchors
*bash_plugin
and*s3cache_plugin
we previously defined a couple of lines above. - This is pushing the concept a bit to the extreme – as we could have just used
*bash_plugin
and*s3cache_plugin
in theplugins:
keys of each of oursteps
instead. But this is still useful given that the majority of our steps end up using both of those plugins. So this is still a nice DRY, and shows how far we can go when playing with anchors 😉.
- It declares an array using the
With this setup, it’s easy for instance to update the version of the bash plugin we want to use in that pipeline, without having to replace it at multiple places (single source of truth) or having to search deep in the pipeline where it was defined: it’s right there at the top with all our other constants!
Conclusion
Anchors are a lesser-known YAML feature, yet they can be very useful to DRY our files, make them more readable, define reusable constants, and help maintainability.
There are many other useful YAML features – like the merge operator <<:
, the ability to represent base64 values, and ways to avoid ambiguities for nodes that could be equally interpreted as string, boolean, or numbers. If you are interested to learn more about them, you can follow along in this article series that goes into more details.