Clean config files with YAML anchors

Some context

Recently we have been working on migrating our CI infrastructure from CircleCI to Buildkite. As a result, we are manipulating and creating a lot of YAML config files across our various apps, libraries, and tools.

Given that we have many similar and common values and nodes across pipelines, jobs and steps in each config file, we want to avoid repeating ourselves. There are a couple of techniques that CI providers offer to mutualize common actions or setups (like Orbs for CircleCI or Plugins for Buildkite), but there are also cases where we want the DRYing to be more local.

We also aim to nicely isolate some constants in our YAMLs for when we need to use the same value for a parameter (like the Xcode version, Simulator version, Android SDK, Docker plugin, etc) for multiple individual jobs and steps (like test, lint and build steps).

This is where YAML anchors come to the rescue.

Introducing YAML anchors

Anchors are a built-in YAML feature which allows us to give a node an internal name. We can then reference that internal name later, in any place where a node is expected, to get the same outcome as if we copy/pasted that node at that position.

The internal name we use for that anchor is arbitrary, and has no impact on the document’s final content; we can think of it as similar to a variable or a pointer in other languages like Swift or C.

We define an anchor using the &some_name syntax immediately before the YAML node we want that anchor to point to. We can then use the *some_name syntax later in the YAML to reference that anchor as many times as we want.

Let’s look at an snippet, first without using anchors:

steps:
  - label: "lint"
    plugins:
    - docker#v3.8.0:
        image: "androidsdk/android-30"
    command: "./gradlew lintWordpressVanillaRelease"
  - label: "Test WordPress"
    plugins:
    - docker#v3.8.0:
        image: "androidsdk/android-30"
    command: "./gradlew testWordpressVanillaRelease"

Here we can see that the name of the Docker image, from the image: key, is the same for both steps. We can introduce an anchor on that string node, to avoid that repetition:

steps:
  - label: "lint"
    plugins:
    - docker#v3.8.0:
        image: &android-image "androidsdk/android-30"
    command: "./gradlew lintWordpressVanillaRelease"
  - label: "Test WordPress"
    plugins:
    - docker#v3.8.0:
        image: *android-image
    command: "./gradlew testWordpressVanillaRelease"

With this simple trick we have avoided repeating the string by giving the node an internal name – aka android-image – and reusing that reference later.

You might have already noticed though, that not only the string, but in fact the whole node representing the docker#v3.8.0 plugin is identical in both steps. So why not make the anchor point to the whole node declaring that docker plugin?

steps:
  - label: "lint"
    plugins:
    - &android-docker-image
      docker#v3.8.0:
        image: "androidsdk/android-30"
    command: "./gradlew lintWordpressVanillaRelease"
  - label: "Test WordPress"
    plugins:
    - *android-docker-image
    command: "./gradlew testWordpressVanillaRelease"

Here, not only have we declared the anchor on the whole docker#v3.8.0: … map node, but we also went to the next line between the &android-docker-image anchor and the content of the node. This is totally valid, as newlines in that particular position in a YAML have no incidence in the interpretation of the tree’s content; and in this case results in a welcome readability improvement.

Note that this node is a map – aka dictionary – with a single key docker#v3.8.0, and its associated value is another (sub-)dictionary with just the image: key. An anchor always points to a whole node that follows; which means that when we use *android-docker-image later in the YAML, this references (and “pastes”) the whole original dictionary node, including all its children.

Grouping the constants

To help clarity and maintenance, we like to move all our “node constants” at the top of our file, so that they are easier to find and give us a single place to look for when we consider updating the plugin versions, images, etc.

The trick to do that is to take advantage of the fact that most CIs (at least CircleCI and Buildkite) ignore any top-level keys that they don’t recognize and that don’t have any special meaning to them. We can leverage this to move all our constant, reusable nodes under such an arbitrary key at the top, and organize those constants any way we want there.

# arbitrary top-level key we use to "namespace" our anchors
reusable-nodes:
 - &android-docker-image
   docker#v3.8.0:
     image: "androidsdk/android-30"

# official "steps" key, used by Buildkite to describe our pipeline
steps:
  - label: "lint"
    plugins:
    - *android-docker-image
    command: "./gradlew lintWordpressVanillaRelease"
  - label: "Test WordPress"
    plugins:
    - *android-docker-image
    command: "./gradlew testWordpressVanillaRelease"

As we can see above, this trick also helps us get rid of the slight asymmetry that we previously had in our YAML, where the first node needing that plugin – being the one where we declare our anchor – looked a bit different and unbalanced compared to the other places needing that same plugin but only using the *android-docker-image alias.

Better conventions

This non-official, top-level key we have been using so far is merely a way for us to isolate our anchor definitions, and avoid the risk of them colliding with any official top-level keys used by Buildkite. In practice, we can use whatever name and substructures we want to organize those anchors there. After all, only the node that is after an &anchor declaration will be part of that anchor definition, so the parent nodes those are defined in won’t have any impact on the anchors themselves.

Having said that, it is always good to have some conventions to keep things nice, consistent and tidy. Here are the ones we have started to adopt in our own pipeline.yml files:

Declare a single, top-level key named common_params at the top of the file
As the value for that key, use a sub-dictionary to organize our anchors into “namespaces”.
- By convention we make each key of that sub-dictionary match the name of the key where the anchors are aimed to be used later in the steps. This helps us remember the intended context those anchors are supposed to be used in.
- If we need to define multiple anchors in that namespace, use an array as the value for that sub-key/namespace, each item of the array declaring an anchor and its associated constant node.
To help readability, insert a newline between the &anchor_name and the node it defines.

This makes our typical pipeline files look like this – with an iOS example this time, to shake things up:

# Nodes with values to reuse in the pipeline.
common_params:
  plugins:
    - &bash_plugin
      automattic/bash-cache#v1.5.0
    - &s3cache_plugin
      automattic/git-s3-cache#v1.1.0:
        bucket: "a8c-repo-mirrors"
        repo: "wordpress-mobile/wordpress-ios/"
    - &docker-gplint
      docker#v3.8.0:
        image: "public.ecr.aws/automattic/glotpress-validator:1.0.0"
    - &common_plugins
      [*bash_plugin, *s3cache_plugin]
  env: &common_env
    IMAGE_ID: xcode-12.5.1

# This is the default pipeline – it will build and test the app
steps:
  - label: "Build"
    key: "build"
    command: ".buildkite/commands/build-for-testing.sh"
    env: *common_env
    plugins: *common_plugins

  - label: "Unit Tests"
    command: ".buildkite/commands/run-unit-tests.sh"
    depends_on: "build"
    env: *common_env
    plugins: *common_plugins

  - label: "Lint Translations"
    command: "gplint /workdir/WordPress/Resources/AppStoreStrings.po"
    plugins:
      - *docker-gplint
    agents:
      queue: "default"

  - label: "UI Tests (iPhone)"
    command: .buildkite/commands/run-ui-tests.sh WordPressUITests 'iPhone 11' 14.1
    depends_on: "build"
    env: *common_env
    plugins: *common_plugins
    artifact_paths:
      - "build/results/"

  - label: "UI Tests (iPad)"
    command: .buildkite/commands/run-ui-tests.sh WordPressUITests "iPad Air (4th generation)" 14.1
    depends_on: "build"
    env: *common_env
    plugins: *common_plugins
    artifact_paths:
      - "build/results/"

Notice how we use those anchors quite liberally here:

We grouped all the anchors declaring Buildkite plugins under the plugins: sub-key, and the anchor related to the list of environment variables under the env: key.
- As per our conventions, those key names were picked according to which key in the steps those anchors were supposed to be used with.
- The plugins: key we use as namespace is an array, as we have different plugin constants to declare; but the env: key directly contains the one and only anchor we need to reuse in all steps.
We also declared an anchor called &common_plugins that is a bit different than the others:
- It declares an array using the [x,y,z] flow notation of YAML, which is similar to JSON, and an alternate and equivalent – but more compact – notation than the one using the dashed list syntax you might be more familiar with.
- This array itself contains references to the two anchors *bash_plugin and *s3cache_plugin we previously defined a couple of lines above.
- This is pushing the concept a bit to the extreme – as we could have just used *bash_plugin and *s3cache_plugin in the plugins: keys of each of our steps instead. But this is still useful given that the majority of our steps end up using both of those plugins. So this is still a nice DRY, and shows how far we can go when playing with anchors 😉.

With this setup, it’s easy for instance to update the version of the bash plugin we want to use in that pipeline, without having to replace it at multiple places (single source of truth) or having to search deep in the pipeline where it was defined: it’s right there at the top with all our other constants!

Conclusion

Anchors are a lesser-known YAML feature, yet they can be very useful to DRY our files, make them more readable, define reusable constants, and help maintainability.

There are many other useful YAML features – like the merge operator <<:, the ability to represent base64 values, and ways to avoid ambiguities for nodes that could be equally interpreted as string, boolean, or numbers. If you are interested to learn more about them, you can follow along in this article series that goes into more details.

Clean config files with YAML anchors

Some context

Introducing YAML anchors

Grouping the constants

Better conventions

Conclusion

Published by Olivier Halligon

Leave a comment

Cancel reply

Some context

Introducing YAML anchors

Grouping the constants

Better conventions

Conclusion

Share this:

Published by Olivier Halligon

Leave a comment

Cancel reply