GitHub Actions Dynamic Matrix¶
GitHub Actions is a powerful CI/CD tool that allows you to automate your software development workflow. It provides a wide range of features and capabilities.
One of the features that I found very useful is the ability to define a matrix strategy for your jobs. This allows you to run the same job with different parameters, such as different versions of a programming language.
However, there are times when you need to define the matrix dynamically based on the output of a previous job. For example, you may want to run a job for each directory if and only if the directory contains a specific file or has changed since the last commit.
In this post, I will show you how to define a dynamic strategy matrix in GitHub Actions using a real-world example.
Github Actions Matrix¶
Let's start with a simple example. Let's suppose we want to build our Rust application for different platforms.
To get started, we'll create the project as below.
This will give me the following directory structure.
Now, let's create a GitHub Actions workflow file.
name: ci
concurrency:
group: ci-${{ github.ref }}-${{ github.event_name }}
cancel-in-progress: ${{ ! startsWith(github.ref, 'refs/tags/') }}
on:
push:
tags:
- "*"
env:
RUST_VERSION: nightly
BINARY_NAME: hello-world
permissions:
contents: write
jobs:
build:
runs-on: ${{ matrix.image }}
strategy:
matrix:
include:
- image: ubuntu-latest
target: x86_64-unknown-linux-gnu
- image: ubuntu-latest
target: x86_64-unknown-linux-musl
fail-fast: false
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Cache cargo
uses: actions/cache@v4
with:
path: |
~/.cargo/registry
~/.cargo/git
target
key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}
- name: Install Rust ${{ env.RUST_VERSION }}
run: |
rustup toolchain install ${{ env.RUST_VERSION }}
rustup target add ${{ matrix.target }}
- name: Build
run: |
cargo build --release --target ${{ matrix.target }}
- name: Rename binary
run: |
cp target/${{ matrix.target }}/release/${{ env.BINARY_NAME }} target/${{ matrix.target }}/release/${{ env.BINARY_NAME }}-${{ matrix.target }}
- name: Checksum
run: |
cd target/${{ matrix.target }}/release
sha256sum ${{ env.BINARY_NAME }}-${{ matrix.target }} > ${{ env.BINARY_NAME }}-${{ matrix.target }}.sha256
- name: Upload artifacts
uses: actions/upload-artifact@v4
with:
name: ${{ env.BINARY_NAME }}-${{ matrix.target }}
path: |
target/${{ matrix.target }}/release/${{ env.BINARY_NAME }}-${{ matrix.target }}
target/${{ matrix.target }}/release/${{ env.BINARY_NAME }}-${{ matrix.target }}.sha256
if-no-files-found: error
The highlighted lines are the focus of this post. We will expand on this as we go along.
Dynamic Matrix¶
Now, the CI workflow above is great, and it works perfectly fine. Here's proof of the successful run and its uploaded artifacts.
However, there are some cases where you might benefit from having the matrix defined in a dynamic way. That way, you have more control and flexibility over which of those matrix items should be included in the build.
For example, let's say you have a monorepo with multiple services, and you want to build a Docker image if and only if the service has changed since the last build.
Let's see how we can achieve this.
Step 0: Separating the jobs¶
You might have noticed that in the first example, we explicitly specified the jobs we wanted to run. That is static yet very simple and straightforward.
In our mission to have a dynamic matrix, we need to separate the jobs into at least two jobs. This way, we can prepare the list of parallel jobs from the first pipeline and then use that list to pass on to the matrix
in the second.
Step 1: Fetching a list of changed files¶
Since we aim to build the Docker image if only the service has changed, we need a way to determine if the service has changed.
There are different ways we can achieve this. One way we're employing in this post is to use a community GitHub Action.
Let's see how.
name: ci
on:
push:
branches:
- main
permissions:
contents: read
jobs:
prepare:
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.prepare.outputs.matrix }}
length: ${{ steps.prepare.outputs.length }}
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Get changed files
id: changed-files
uses: tj-actions/changed-files@v42
with:
since_last_remote_commit: true
files: |
**/Dockerfile
The key here is to fetch all the repository, hence the fetch-depth: 0
. This will ensure that the since_last_remote_commit: true
in line 23 is accurate.
The rationale is that there might be cases where we push one or more commits to the repository, and none of them change any of the services that aim to trigger the Docker image build.
Step 2: Did any of the services change?¶
Next step is to realize if any of the changed files in the previous step modified any of the services we're interested in.
# ...truncated...
- name: Prepare
id: prepare
shell: python
run: |
import json
import os
from pathlib import Path
def discover() -> list[str]:
for changed in ${{ steps.changed-files.outputs.all_changed_files || '[]' }}:
path = Path(changed)
if Path(path.parts[0]).is_dir():
yield path.parts[0]
def jsonify(item_lists: list[str]) -> str:
return json.dumps(item_lists, separators=(",", ":"))
def main():
item_lists = list(set(discover()))
length = len(item_lists)
json_modules = jsonify({"service": item_lists})
github_output = f"matrix={json_modules}\n"
with open(os.environ["GITHUB_OUTPUT"], "a") as f:
f.write(github_output)
f.write(f"length={length}\n")
if __name__ == "__main__":
main()
Wait a minute! There's a lot going on here. Let's break it down.
Changed files or an empty list¶
In this loop, we will either get a list of changed files from the previous step in the happy path or resort to an empty list if the output of the last step is empty. (1)
- This is the syntax of GitHub Expressions. You can read more about it in their documentations1.
Filter only the top-level directories¶
Since we're holding a monorepo, all the services are in the top-level directories. This will allow us to trim down on all the files that are not inherently related to the services.
It's important to mention here that the actions/checkout
is necessary before this step to ensure that we have access to the repository structure.
Minify the JSON output¶
In our Python code, we make sure to remove any spaces after the ,
and :
in the json.dumps
to avoid running into issue at later steps when decoding the JSON.
Prepare the matrix¶
Finally, we prepare the matrix for the next step. This is a list of all the directories that have changed since the last commit.
We will also proactively set the length of the list as an output so that the next GitHub job can use it as a conditional on whether or not to run. It may happen that you changes haven't affected any of the services, and in that case we don't want to run the Docker image build job.
Step 3: Build the Docker image(s)¶
The idea is that this step will only execute if the previous step has determined that there are changes in the services. We have provided the length
as a hint for this next step to ensure no unnecessary job runs, nor do we hit any error due to an empty list in the matrix
input.
build:
needs: prepare
runs-on: ubuntu-latest
if: ${{ needs.prepare.outputs.length > 0 }}
strategy:
fail-fast: false
matrix: ${{ fromJson(needs.prepare.outputs.matrix) }}
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to GitHub Container Registry
uses: docker/login-action@v3
with:
password: ${{ secrets.GITHUB_TOKEN }}
registry: ghcr.io
username: ${{ github.actor }}
- id: meta
name: Docker metadata
uses: docker/metadata-action@v5
with:
images: ghcr.io/${{ github.repository }}/${{ matrix.service }}
- id: build-push
name: Build and push
uses: docker/build-push-action@v3
with:
cache-from: type=gha
cache-to: type=gha,mode=max
context: ${{ matrix.service }}
labels: ${{ steps.meta.outputs.labels }}
platforms: linux/amd64,linux/arm64
push: true
tags: ${{ steps.meta.outputs.tags }}
As you see in the conditional, this job will only run if the length of the list is greater than zero, i.e., there are changes in the services.
The matrix
value has taken a new form in this case compared to our initial example. In this case, we're asking GitHub to parse the JSON string and pass the value to the matrix
input.
If two of our services have changed, the matrix
will take the following form.
Lastly, the matrix
passed from the first job is accessed in lines 25 and 32. The key service
is explicitly defined in the earlier job and is not a reserved keyword in GitHub Actions, nor a special keyword in the matrix
.
What does it look like?¶
Now that you've seen the definitions, let's see how it looks like in action (Click to zoom in).
And in case you push a commit that hasn't changed any service, the second job will be skipped, as expected.
The full definition of the CI workflow
name: ci
concurrency:
group: ci-${{ github.ref_name }}-${{ github.event_name }}
cancel-in-progress: ${{ ! startsWith(github.ref, 'refs/tags/') }}
on:
push:
branches:
- main
permissions:
contents: read
packages: write
jobs:
prepare:
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.prepare.outputs.matrix }}
length: ${{ steps.prepare.outputs.length }}
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Get changed files
id: changed-files
uses: tj-actions/changed-files@v42
with:
since_last_remote_commit: true
files: |
**/Dockerfile
- name: Prepare
id: prepare
shell: python
run: |
import json
import os
from pathlib import Path
def discover() -> list[str]:
for changed in ${{ steps.changed-files.outputs.all_changed_files || '[]' }}:
path = Path(changed)
if Path(path.parts[0]).is_dir():
yield path.parts[0]
def jsonify(item_lists: list[str]) -> str:
return json.dumps(item_lists, separators=(",", ":"))
def main():
item_lists = list(set(discover()))
length = len(item_lists)
json_modules = jsonify({"service": item_lists})
github_output = f"matrix={json_modules}\n"
with open(os.environ["GITHUB_OUTPUT"], "a") as f:
f.write(github_output)
f.write(f"length={length}\n")
if __name__ == "__main__":
main()
build:
needs: prepare
runs-on: ubuntu-latest
if: ${{ needs.prepare.outputs.length > 0 }}
strategy:
fail-fast: false
matrix: ${{ fromJson(needs.prepare.outputs.matrix) }}
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to GitHub Container Registry
uses: docker/login-action@v3
with:
password: ${{ secrets.GITHUB_TOKEN }}
registry: ghcr.io
username: ${{ github.actor }}
- id: meta
name: Docker metadata
uses: docker/metadata-action@v5
with:
images: ghcr.io/${{ github.repository }}/${{ matrix.service }}
- id: build-push
name: Build and push
uses: docker/build-push-action@v3
with:
cache-from: type=gha
cache-to: type=gha,mode=max
context: ${{ matrix.service }}
labels: ${{ steps.meta.outputs.labels }}
platforms: linux/amd64,linux/arm64
push: true
tags: |
${{ steps.meta.outputs.tags }}
Conclusion¶
That's the whole story. We started with a simple static matrix
and then moved to a dynamic matrix
that is more flexible and gives us more control of what workflows we want to run.
Knowing that CI/CD costs really dollar money , it's important to optimize your workloads and only run the necessary jobs. This will enhance your cost efficiency and reduce your bill at the end of the month.
I hope you found this post useful. If you have any questions or comments, feel free to reach out to me.
Until next time, ciao, and happy hacking!
Source Code¶
To access the source code for this post, head over to the corresponding GitHub repository2.
If you enjoyed this blog post, consider sharing it with these buttons . Please leave a comment for us at the end, we read & love 'em all.
Share on Share on Share on Share on