Skip to content

GitHub Actions Dynamic Matrix

GitHub Actions is a powerful CI/CD tool that allows you to automate your software development workflow. It provides a wide range of features and capabilities.

One of the features that I found very useful is the ability to define a matrix strategy for your jobs. This allows you to run the same job with different parameters, such as different versions of a programming language.

However, there are times when you need to define the matrix dynamically based on the output of a previous job. For example, you may want to run a job for each directory if and only if the directory contains a specific file or has changed since the last commit.

In this post, I will show you how to define a dynamic strategy matrix in GitHub Actions using a real-world example.

First, a Static Matrix

Let's start with a simple example. Let's suppose we want to build our Rust application for different platforms.

To get started, we'll create the project as below.

Bash
cargo new hello-world

This will give me the following directory structure.

.
└── hello-world
    ├── Cargo.toml
    └── src
        └── main.rs

Now, let's create a GitHub Actions workflow file.

.github/workflows/ci.yml
name: ci

concurrency:
  group: ci-${{ github.ref }}-${{ github.event_name }}
  cancel-in-progress: ${{ ! startsWith(github.ref, 'refs/tags/') }}

on:
  push:
    tags:
      - "*"

env:
  RUST_VERSION: nightly
  BINARY_NAME: hello-world

permissions:
  contents: write

jobs:
  build:
    runs-on: ${{ matrix.image }}
    strategy:
      matrix:
        include:
          - image: ubuntu-latest
            target: x86_64-unknown-linux-gnu
          - image: ubuntu-latest
            target: x86_64-unknown-linux-musl
      fail-fast: false
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Cache cargo
        uses: actions/cache@v4
        with:
          path: |
            ~/.cargo/registry
            ~/.cargo/git
            target
          key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}
      - name: Install Rust ${{ env.RUST_VERSION }}
        run: |
          rustup toolchain install ${{ env.RUST_VERSION }}
          rustup target add ${{ matrix.target }}
      - name: Build
        run: |
          cargo build --release --target ${{ matrix.target }}
      - name: Rename binary
        run: |
          cp target/${{ matrix.target }}/release/${{ env.BINARY_NAME }} target/${{ matrix.target }}/release/${{ env.BINARY_NAME }}-${{ matrix.target }}
      - name: Checksum
        run: |
          cd target/${{ matrix.target }}/release
          sha256sum ${{ env.BINARY_NAME }}-${{ matrix.target }} > ${{ env.BINARY_NAME }}-${{ matrix.target }}.sha256
      - name: Upload artifacts
        uses: actions/upload-artifact@v4
        with:
          name: ${{ env.BINARY_NAME }}-${{ matrix.target }}
          path: |
            target/${{ matrix.target }}/release/${{ env.BINARY_NAME }}-${{ matrix.target }}
            target/${{ matrix.target }}/release/${{ env.BINARY_NAME }}-${{ matrix.target }}.sha256
          if-no-files-found: error

The highlighted lines are the focus of this post. We will expand on this as we go along.

Dynamic Matrix

Now, the CI workflow above is great, and it works perfectly fine. Here's proof of the successful run and its uploaded artifacts.

Successful run
Static matrix result

However, there are some cases where you might benefit from having the matrix defined in a dynamic way. That way, you have more control and flexibility over which of those matrix items should be included in the build.

For example, let's say you have a monorepo with multiple services, and you want to build a Docker image if and only if the service has changed since the last build.

Let's see how we can achieve this.

Step 0: Separating the jobs

You might have noticed that in the first example, we explicitly specified the jobs we wanted to run. That is static yet very simple and straightforward.

In our mission to have a dynamic matrix, we need to separate the jobs into at least two jobs. This way, we can prepare the list of parallel jobs from the first pipeline and then use that list to pass on to the matrix in the second.

Step 1: Fetching a list of changed files

Since we aim to build the Docker image if only the service has changed, we need a way to determine if the service has changed.

There are different ways we can achieve this. One way we're employing in this post is to use a community GitHub Action.

Let's see how.

Fetch changed files
name: ci

on:
  push:
    branches:
      - main

permissions:
  contents: read

jobs:
  prepare:
    runs-on: ubuntu-latest
    outputs:
      matrix: ${{ steps.prepare.outputs.matrix }}
      length: ${{ steps.prepare.outputs.length }}
    steps:
      - name: Checkout
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - name: Get changed files
        id: changed-files
        uses: tj-actions/changed-files@v42
        with:
          since_last_remote_commit: true
          files: |
            **/Dockerfile

The key here is to fetch all the repository, hence the fetch-depth: 0. This will ensure that the since_last_remote_commit: true in line 23 is accurate.

The rationale is that there might be cases where we push one or more commits to the repository, and none of them change any of the services that aim to trigger the Docker image build.

Step 2: Did any of the services change?

Next step is to realize if any of the changed files in the previous step modified any of the services we're interested in.

Filter services only
# ...truncated...
      - name: Prepare
        id: prepare
        shell: python
        run: |
          import json
          import os
          from pathlib import Path


          def discover() -> list[str]:
              for changed in ${{ steps.changed-markdown-files.outputs.all_changed_files || '[]' }}:
                  path = Path(changed)
                  if Path(path.parts[0]).is_dir():
                      yield path.parts[0]


          def jsonify(item_lists: list[str]) -> str:
              return json.dumps(item_lists, separators=(",", ":"))


          def main():
              item_lists = list(set(discover()))
              length = len(item_lists)
              json_modules = jsonify({"service": item_lists})

              github_output = f"matrix={json_modules}\n"

              with open(os.environ["GITHUB_OUTPUT"], "a") as f:
                  f.write(github_output)
                  f.write(f"length={length}\n")

          if __name__ == "__main__":
              main()

Wait a minute! There's a lot going on here. Let's break it down.

Changed files or an empty list

In this loop, we will either get a list of changed files from the previous step in the happy path or resort to an empty list if the output of the last step is empty. (1)

  1. This is the syntax of GitHub Expressions. You can read more about it in their documentations1.
              for changed in ${{ steps.changed-markdown-files.outputs.all_changed_files || '[]' }}:

Filter only the top-level directories

Since we're holding a monorepo, all the services are in the top-level directories. This will allow us to trim down on all the files that are not inherently related to the services.

                  if Path(path.parts[0]).is_dir():

It's important to mention here that the actions/checkout is necessary before this step to ensure that we have access to the repository structure.

Minify the JSON output

In our Python code, we make sure to remove any spaces after the , and : in the json.dumps to avoid running into issue at later steps when decoding the JSON.

              return json.dumps(item_lists, separators=(",", ":"))

Prepare the matrix

Finally, we prepare the matrix for the next step. This is a list of all the directories that have changed since the last commit.

We will also proactively set the length of the list as an output so that the next GitHub job can use it as a conditional on whether or not to run. It may happen that you changes haven't affected any of the services, and in that case we don't want to run the Docker image build job.

              json_modules = jsonify({"service": item_lists})
                  f.write(f"length={length}\n")

Step 3: Build the Docker image(s)

The idea is that this step will only execute if the previous step has determined that there are changes in the services. We have provided the length as a hint for this next step to ensure no unnecessary job runs, nor do we hit any error due to an empty list in the matrix input.

Build the Image
  build:
    needs: prepare
    runs-on: ubuntu-latest
    if: ${{ needs.prepare.outputs.length > 0 }}
    strategy:
      fail-fast: false
      matrix: ${{ fromJson(needs.prepare.outputs.matrix) }}
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Set up QEMU
        uses: docker/setup-qemu-action@v3
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      - name: Login to GitHub Container Registry
        uses: docker/login-action@v3
        with:
          password: ${{ secrets.GITHUB_TOKEN }}
          registry: ghcr.io
          username: ${{ github.actor }}
      - id: meta
        name: Docker metadata
        uses: docker/metadata-action@v5
        with:
          images: ghcr.io/${{ github.repository }}/${{ matrix.service }}
      - id: build-push
        name: Build and push
        uses: docker/build-push-action@v3
        with:
          cache-from: type=gha
          cache-to: type=gha,mode=max
          context: ${{ matrix.service }}
          labels: ${{ steps.meta.outputs.labels }}
          platforms: linux/amd64,linux/arm64
          push: true
          tags: ${{ steps.meta.outputs.tags }}

As you see in the conditional, this job will only run if the length of the list is greater than zero, i.e., there are changes in the services.

The matrix value has taken a new form in this case compared to our initial example. In this case, we're asking GitHub to parse the JSON string and pass the value to the matrix input.

If two of our services have changed, the matrix will take the following form.

  build:
    strategy:
      fail-fast: false
      matrix:
        services:
          - service1
          - service2
      matrix: ${{ fromJson(needs.prepare.outputs.matrix) }}

Lastly, the matrix passed from the first job is accessed in lines 25 and 32. The key service is explicitly defined in the earlier job and is not a reserved keyword in GitHub Actions, nor a special keyword in the matrix.

What does it look like?

Now that you've seen the definitions, let's see how it looks like in action (Click to zoom in).

  • Preparing the dynamic matrix
    Step 0: Preparing the dynamic matrix

  • Running the dynamic matrix
    Step 1: Running the dynamic matrix

  • Expand the two jobs
    Step 2: Click open the two jobs

  • Successful run of all jobs
    Step 3: Successful run of all jobs

And in case you push a commit that hasn't changed any service, the second job will be skipped, as expected.

no-run
Skipped build
The full definition of the CI workflow
.github/workflows/ci.yml
name: ci

concurrency:
  group: ci-${{ github.ref_name }}-${{ github.event_name }}
  cancel-in-progress: ${{ ! startsWith(github.ref, 'refs/tags/') }}

on:
  push:
    branches:
      - main

permissions:
  contents: read
  packages: write

jobs:
  prepare:
    runs-on: ubuntu-latest
    outputs:
      matrix: ${{ steps.prepare.outputs.matrix }}
      length: ${{ steps.prepare.outputs.length }}
    steps:
      - name: Checkout
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - name: Get changed files
        id: changed-files
        uses: tj-actions/changed-files@v42
        with:
          since_last_remote_commit: true
          files: |
            **/Dockerfile
      - name: Prepare
        id: prepare
        shell: python
        run: |
          import json
          import os
          from pathlib import Path


          def discover() -> list[str]:
              for changed in ${{ steps.changed-markdown-files.outputs.all_changed_files || '[]' }}:
                  path = Path(changed)
                  if Path(path.parts[0]).is_dir():
                      yield path.parts[0]


          def jsonify(item_lists: list[str]) -> str:
              return json.dumps(item_lists, separators=(",", ":"))


          def main():
              item_lists = list(set(discover()))
              length = len(item_lists)
              json_modules = jsonify({"service": item_lists})

              github_output = f"matrix={json_modules}\n"

              with open(os.environ["GITHUB_OUTPUT"], "a") as f:
                  f.write(github_output)
                  f.write(f"length={length}\n")

          if __name__ == "__main__":
              main()

  build:
    needs: prepare
    runs-on: ubuntu-latest
    if: ${{ needs.prepare.outputs.length > 0 }}
    strategy:
      fail-fast: false
      matrix: ${{ fromJson(needs.prepare.outputs.matrix) }}
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Set up QEMU
        uses: docker/setup-qemu-action@v3
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      - name: Login to GitHub Container Registry
        uses: docker/login-action@v3
        with:
          password: ${{ secrets.GITHUB_TOKEN }}
          registry: ghcr.io
          username: ${{ github.actor }}
      - id: meta
        name: Docker metadata
        uses: docker/metadata-action@v5
        with:
          images: ghcr.io/${{ github.repository }}/${{ matrix.service }}
      - id: build-push
        name: Build and push
        uses: docker/build-push-action@v3
        with:
          cache-from: type=gha
          cache-to: type=gha,mode=max
          context: ${{ matrix.service }}
          labels: ${{ steps.meta.outputs.labels }}
          platforms: linux/amd64,linux/arm64
          push: true
          tags: |
            ${{ steps.meta.outputs.tags }}

Conclusion

That's the whole story. We started with a simple static matrix and then moved to a dynamic matrix that is more flexible and gives us more control of what workflows we want to run.

Knowing that CI/CD costs really dollar money 🤑, it's important to optimize your workloads and only run the necessary jobs. This will enhance your cost efficiency and reduce your bill at the end of the month.

I hope you found this post useful. If you have any questions or comments, feel free to reach out to me.

Until next time, ciao, and happy hacking!

Source Code

To access the source code for this post, head over to the corresponding GitHub repository2.