Grant Kubernetes Pods Access to AWS Services Using OpenID Connect¶
Learn how to establish a trust relationship between a Kubernetes cluster and AWS IAM to grant cluster generated Service Account tokens access to AWS services using OIDC & without storing long-lived credentials.
In our previous post, we discussed what OpenID Connect (OIDC) is and how to use it to authenticate identities from one system to another.
We covered why it is crucial to avoid storing long-lived credentials and the benefits of employing OIDC for the task of authentication.
If you haven't read that one already, here's a recap:
- OIDC is an authentication protocol that allows the identities in one system to authenticate to another system.
- It is based on OAuth 2.0 and JSON Web Tokens (JWT).
- Storing long-lived credentials is risky and should be avoided at all cost if possible.
- OIDC provides a secure way to authenticate identities without storing long-lived credentials.
- It is widely used in modern applications and systems.
- The hard requirements is that both the Service Provider and the Identity Provider must be OIDC compliant.
- With OIDC you will only keep the identities and their credentials in one system and authenticate them to another system without storing any long-lived credentials. The former is called the Identity Provider and the latter is called the Service Provider.
We also covered a practical example of authenticating GitHub runners to AWS IAM by establishing a trust relationship between GitHub and AWS using OIDC.
In this post, we will take it one step further and provide a way for the pods of our Kubernetes cluster to authenticate to AWS services using OIDC.
This post will provide a walkthrough of granting such access to a bare-metal Kubernetes cluster (k3s1) using only the power of OpenID Connect protocol. In a later post, we'll show you how easy it is to achieve the same with a managed Kubernetes cluster like Azure Kubernetes Service (AKS)2. But, first let's understand the fundamentals by trying it on a bare-metal cluster.
We will not store any credentials in our pods and as such, won't ever have to worry about other security concerns such as secret rotations!
With that intro out of the way, let's dive in!
Make sure you have the following prerequisites in place before proceeding:
A Kubernetes cluster that can be exposed to the internet. (1)
A local Kubernetes cluster will do, however, you will need to expose the required endpoints to the internet. This can be done using a service like ngrok3.
Not the topic of today's post!
An AWS account to create an OIDC provider and IAM roles.
- A verified root domain name that YOU own. Skip this if you're using a managed Kubernetes cluster.
- OpenTofu v1.64
- Ansible v2.165
Let's see what we are trying to achieve in this guide.
Our end goal is to create an Identity Provider (IdP) in AWS6. After doing so, we will be able to create an IAM Role7 with a trust relationship to the IdP.
Ultimately, the pods in our Kubernetes cluster that have the desired Service Account(s)8 will be able to talk to the AWS services.
To achieve this, and as per the OIDC specification, the following endpoints must be exposed through an HTTPS endpoint with a verified TLS certificate9:
: This is a MUST for OIDC compliance./openid/v1/jwks
: This is configurable through the first endpoint as you'll see later.
These endpoints provide the information of the OIDC provider and the public keys used to sign the JWT tokens, respectively. The former will be used by the service provider to validate the OIDC provider and the latter will be used to validate the JWT access tokens provided by the entities that want to talk to the Serivce Provider.
Service Provider
Service Provider refers to the host that provides the service. In our example, AWS is the service provider.
Exposing such endpoints will make our OIDC provider compliant with the OIDC specification. In that regard, any OIDC compliant service provider will be able to trust our OIDC provider.
OIDC Compliant
For an OIDC provider and a Service Provider to trust each other, they must both be OIDC compliant. This means that the OIDC provider must expose certain endpoints and the Service Provider must be able to validate the OIDC provider through those endpoints.
In practice, we will need the following two absolute URLs to be accessible publicly through internet with a verified TLS certificate signed by a trusted Certificate Authority (CA):
Again, and just to reiterate, as per the OIDC specification the HTTPS is a must and the TLS certificate has to be signed by a trusted Certificate Authority (CA).
When all this is set up, we shall be able to add the
to the AWS as an OIDC provider.
Step 0: Directory Structure¶
There are a lot of codes we will cover in this post. It is good to know that to expect. Here's the layout of the directories we will be working with:
- Ansible role to test out the setup in the end.
- TF files that will create OIDC provider in AWS after the
stack is applied. - Inventory files for Ansible to use. The TF files in
will create the inventory files. - Ansible role to bootstrap the Kubernetes cluster. Including the Cilium CNI installation, TLS certificate fetching and the static web server setup.
- Our main playbook and the Ansible entrypoint for all the tasks we do against the target
. - The starting point for this guide begins here where we provision a server in Hetzner Cloud and spin up a lightweight Kubernetes cluster using
. - The Ansible collection requirements file.
- The host-specific variables that will be used in the Ansible tasks.
Step 1: Dedicated Domain Name¶
As mentioned, we need to assign a dedicated domain name to the OIDC provider. This will be the address we will add to the AWS IAM as an Identity Provider.
Any DNS provider will do, but for our example, we're using Cloudflare.
variable "hetzner_api_token" {
type = string
nullable = false
sensitive = true
variable "cloudflare_api_token" {
type = string
nullable = false
sensitive = true
variable "stack_name" {
type = string
default = "k3s-cluster"
variable "primary_ip_datacenter" {
type = string
default = "nbg1-dc3"
variable "root_domain" {
type = string
default = ""
terraform {
required_providers {
hcloud = {
source = "hetznercloud/hcloud"
version = "~> 1.46"
cloudflare = {
source = "cloudflare/cloudflare"
version = "~> 4.30"
random = {
source = "hashicorp/random"
version = "~> 3.6"
provider "hcloud" {
token = var.hetzner_api_token
provider "cloudflare" {
api_token = var.cloudflare_api_token
resource "hcloud_primary_ip" "this" {
for_each = toset(["ipv4", "ipv6"])
name = "${var.stack_name}-${each.key}"
datacenter = var.primary_ip_datacenter
type = each.key
assignee_type = "server"
auto_delete = false
data "cloudflare_zone" "this" {
name = var.root_domain
resource "random_uuid" "this" {}
resource "cloudflare_record" "this" {
zone_id =
name = "${}.${var.root_domain}"
proxied = false
ttl = 60
type = "A"
value = hcloud_primary_ip.this["ipv4"].ip_address
resource "cloudflare_record" "this_v6" {
zone_id =
name = "${}.${var.root_domain}"
proxied = false
ttl = 60
type = "AAAA"
value = hcloud_primary_ip.this["ipv6"].ip_address
output "public_ip" {
value = hcloud_primary_ip.this["ipv4"].ip_address
output "public_ipv6" {
value = hcloud_primary_ip.this["ipv6"].ip_address
We would need the required access token which you can get from their respective account settings. If you want to apply the stack, you will need a Cloudflare token10 and a Hetzner API token11.
export TF_VAR_cloudflare_api_token="PLACEHOLDER"
export TF_VAR_hetzner_api_token="PLACEHOLDER"
tofu plan -out tfplan
tofu apply tfplan
Step 2: A Live Kubernetes Cluster¶
At this point, we should have a live Kuberntes cluster. We've already covered how to set up a lightweight Kubernetes cluster on a Ubuntu 22.04 machine before and so, we won't go too deep into that.
But for the sake of completeness, we'll resurface the code one more time, with some minor tweaks here and there.
variable "hetzner_api_token" {
type = string
nullable = false
sensitive = true
variable "cloudflare_api_token" {
type = string
nullable = false
sensitive = true
variable "stack_name" {
type = string
default = "k3s-cluster"
variable "primary_ip_datacenter" {
type = string
default = "nbg1-dc3"
variable "root_domain" {
type = string
default = ""
variable "server_datacenter" {
type = string
default = "nbg1"
variable "username" {
type = string
default = "k8s"
terraform {
required_providers {
hcloud = {
source = "hetznercloud/hcloud"
version = "~> 1.46"
cloudflare = {
source = "cloudflare/cloudflare"
version = "~> 4.30"
random = {
source = "hashicorp/random"
version = "~> 3.6"
http = {
source = "hashicorp/http"
version = "~> 3.4"
tls = {
source = "hashicorp/tls"
version = "~> 4.0"
provider "hcloud" {
token = var.hetzner_api_token
provider "cloudflare" {
api_token = var.cloudflare_api_token
resource "tls_private_key" "this" {
algorithm = "ECDSA"
ecdsa_curve = "P384"
resource "hcloud_ssh_key" "this" {
name = var.stack_name
public_key = tls_private_key.this.public_key_openssh
resource "hcloud_server" "this" {
name = var.stack_name
server_type = "cax11"
image = "ubuntu-22.04"
location = "nbg1"
ssh_keys = [,
public_net {
ipv4 = hcloud_primary_ip.this["ipv4"].id
ipv6 = hcloud_primary_ip.this["ipv6"].id
user_data = <<-EOF
- name: ${var.username}
groups: users, admin, adm
shell: /bin/bash
- ${tls_private_key.this.public_key_openssh}
- certbot
package_update: true
package_upgrade: true
- sed -i -e '/^\(#\|\)PermitRootLogin/s/^.*$/PermitRootLogin no/' /etc/ssh/sshd_config
- sed -i -e '/^\(#\|\)PasswordAuthentication/s/^.*$/PasswordAuthentication no/' /etc/ssh/sshd_config
- sed -i '$a AllowUsers ${var.username}' /etc/ssh/sshd_config
- |
curl | \
INSTALL_K3S_VERSION="v1.29.3+k3s1" \
INSTALL_K3S_EXEC="--disable traefik
--flannel-backend none
--write-kubeconfig /home/${var.username}/.kube/config
--secrets-encryption" \
sh -
- chown -R ${var.username}:${var.username} /home/${var.username}/.kube/
- |
curl -L --fail --remote-name-all$CILIUM_CLI_VERSION/cilium-linux-$CLI_ARCH.tar.gz{,.sha256sum}
sha256sum --check cilium-linux-$CLI_ARCH.tar.gz.sha256sum
sudo tar xzvfC cilium-linux-$CLI_ARCH.tar.gz /usr/local/bin
- kubectl completion bash | tee /etc/bash_completion.d/kubectl
- k3s completion bash | tee /etc/bash_completion.d/k3s
- |
cat << 'EOF2' >> /home/${var.username}/.bashrc
alias k=kubectl
complete -F __start_kubectl k
- reboot
data "http" "this" {
url = ""
resource "hcloud_firewall" "this" {
name = var.stack_name
rule {
direction = "in"
protocol = "tcp"
port = 22
source_ips = [format("%s/32", trimspace(data.http.this.response_body))]
description = "Allow SSH access from my IP"
rule {
direction = "in"
protocol = "tcp"
port = 80
source_ips = [
description = "Allow HTTP access from everywhere"
rule {
direction = "in"
protocol = "tcp"
port = 443
source_ips = [
description = "Allow HTTPS access from everywhere"
depends_on = [
resource "hcloud_firewall_attachment" "this" {
firewall_id =
server_ids = []
output "public_ip" {
value = hcloud_primary_ip.this["ipv4"].ip_address
output "public_ipv6" {
value = hcloud_primary_ip.this["ipv6"].ip_address
output "ssh_private_key" {
value = tls_private_key.this.private_key_pem
sensitive = true
output "ansible_inventory_yaml" {
value = <<-EOF
ansible_host: ${hcloud_server.this.ipv4_address}
ansible_user: ${var.username}
ansible_ssh_private_key_file: ~/.ssh/k3s-cluster
ansible_ssh_common_args: '-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o PasswordAuthentication=no'
output "ansible_vars" {
value = <<-EOF
domain_name: ${}
output "oidc_provider_url" {
value =
Notice the lines where we specify the OIDC issuer URL & JWK URL for the Kubernetes API server to be a publicly accessible address and pass it as an argument to the k3s
If not specified, the rest of this guide won't work and additional configuration is required. In summary, these are the URLs that will be used by the Service Provider when trying to verify the OIDC provider & the access tokens of the Service Accounts.
Business as usual, we apply the stack as below.
And for connecting to the machine:
tofu output -raw ssh_private_key > ~/.ssh/k3s-cluster
chmod 600 ~/.ssh/k3s-cluster
IP_ADDRESS=$(tofu output -raw public_ip)
ssh -i ~/.ssh/k3s-cluster k8s@$IP_ADDRESS
To be able to use the Ansible playbook in the next steps, we shall write the inventory where Ansible expects them.
become = false
cache_timeout = 3600
fact_caching = ansible.builtin.jsonfile
fact_caching_connection = /tmp/ansible_facts
gather_facts = false
interpreter_python = auto_silent
inventory = ./inventory
log_path = /tmp/ansible.log
roles_path = ~/.ansible/roles:./roles
ssh_common_args = -o ConnectTimeout=5
verbosity = 2
cache = true
cache_connection = /tmp/ansible_inventory
enable_plugins = 'host_list', 'script', 'auto', 'yaml', 'ini', 'toml', 'azure_rm', 'aws_ec2', 'auto'
mkdir -p ../inventory/group_vars
tofu output -raw ansible_inventory_yaml > ../inventory/k3s-cluster.yml
tofu output -raw ansible_vars > ../inventory/group_vars/all.yml
ansible-inventory --list
"_meta": {
"hostvars": {
"k3s-cluster": {
"ansible_host": "XX.XX.XX.XX",
"ansible_ssh_common_args": "-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o PasswordAuthentication=no",
"ansible_ssh_private_key_file": "~/.ssh/k3s-cluster",
"ansible_user": "k8s",
"discovered_interpreter_python": {
"__ansible_unsafe": "/usr/bin/python3"
"all": {
"children": [
"k8s": {
"hosts": [
At this stage we're ready to move on to the next step.
Step 3: Bootstrap the Cluster¶
At this point we have installed the Cilium binary in our host machine, yet we haven't installed the CNI plugin in our Kubernetes cluster.
Let's create an Ansible role and a playbook to take care of all the Day 1 operations.
The first step is to install the Cilium CNI.
- name: Install cilium
cmd: cilium install --set kubeProxyReplacement=true --wait --version {{ cilium_version }}
register: cilium_install
changed_when: false
ignore_errors: true
KUBECONFIG: /etc/rancher/k3s/k3s.yaml
- name: Bootstrap k8s node
hosts: k3s-cluster
gather_facts: false
become: true
- k8s
To run the playbook:
Step 4: Fetch the TLS Certificate¶
At this point, we need a CA verified TLS certificate for the domain name we created in the first step.
We will carry our tasks with Ansible throughout the entire Day 1 to Day n operations.
Description=Wellknown Server
ExecStartPre=/bin/mkdir -p {{ acme_home }}/.well-known/acme-challenge
ExecStart=/usr/bin/python3 -m http.server -d {{ acme_home }} 80
WorkingDirectory={{ acme_home }}
- name: Restart wellknown-server
name: wellknown-server
state: restarted
daemon_reload: true
- name: Create acme group
name: acme
state: present
system: true
- name: Create acme user
name: acme
state: present
group: acme
shell: /bin/false
system: true
create_home: false
- name: Create working dir for acme user
path: "{{ acme_home }}"
state: directory
owner: acme
group: acme
mode: "0755"
- name: Create an standalone server to respond to challenges
src: wellknown-server.service.j2
dest: /etc/systemd/system/wellknown-server.service
owner: root
group: root
mode: "0644"
notify: Restart wellknown-server
- name: Start the wellknown-server
name: wellknown-server
state: started
enabled: true
daemon_reload: true
- name: Use certbot to fetch TLS certificate for {{ domain_name }}
cmd: >-
certbot certonly
-w {{ acme_home }}
--email {{ domain_email }}
--domains {{ domain_name }}
creates: /etc/letsencrypt/live/{{ domain_name }}/fullchain.pem
- name: Cilium
ansible.builtin.import_tasks: cilium.yml
- cilium
- name: Certbot
ansible.builtin.import_tasks: certbot.yml
- certbot
- name: Bootstrap k8s node
hosts: k3s-cluster
gather_facts: false
become: true
- k8s
Certificate Renewal
Although not required, one of the benefits of using certbot
for TLS certificates is the ease of renewal.
After your initial certbot
command, you will find the following two systemd
files in your system.
ExecStart=/usr/bin/certbot -q renew
Description=Run certbot twice daily
OnCalendar=*-*-* 00,12:00:00
Although on the same host, you will find a crontab entry for the certbot
as you see below:
# /etc/cron.d/certbot: crontab entries for the certbot package
# Upstream recommends attempting renewal twice a day
# Eventually, this will be an opportunity to validate certificates
# haven't been revoked, etc. Renewal will only occur if expiration
# is within 30 days.
# Important Note! This cronjob will NOT be executed if you are
# running systemd as your init system. If you are running systemd,
# the cronjob.timer function takes precedence over this cronjob. For
# more details, see the systemd.timer manpage, or use systemctl show
# certbot.timer.
0 */12 * * * root test -x /usr/bin/certbot -a \! -d /run/systemd/system && perl -e 'sleep int(rand(43200))' && certbot -q renew
All of these files are created by the certbot
binary during the initial run. You are free to modify and customize it, although it's unlikely that you will need to.
After adding another task to our Ansible role, we can run the new tasks with the following command:
Step 5: Expose OIDC Configuration to the Internet¶
We've prepared all these works so far for this next step.
In here, we will fetch the OIDC configuration from the Kubernetes API server and expose them to the internet on HTTPS using the newly acquired TLS certificate with the help of static web server12.
cilium_version: 1.15.4
acme_home: /var/www/html
static_web_server_home: /var/www/static-web-server
kubeconfig: /etc/rancher/k3s/k3s.yaml
- name: Restart wellknown-server
name: wellknown-server
state: restarted
daemon_reload: true
- name: Restart static-web-server-prepare
name: static-web-server-prepare
state: restarted
daemon_reload: true
- name: Restart static-web-server
name: static-web-server
state: restarted
daemon_reload: true
Description=Static Web Server
ExecStartPre=/usr/bin/test -s cert.pem
ExecStartPre=/usr/bin/test -s key.pem
ExecStartPre=/usr/bin/test -s .well-known/openid-configuration
ExecStartPre=/usr/bin/test -s openid/v1/jwks
ExecStart=/usr/local/bin/static-web-server \
--host \
--port 443 \
--root . \
--log-level info \
--http2 \
--http2-tls-cert cert.pem \
--http2-tls-key key.pem \
--compression \
--health \
WorkingDirectory={{ static_web_server_home }}
- name: Create static-web-server group
name: static-web-server
state: present
system: true
- name: Create static-web-server user
name: static-web-server
state: present
group: static-web-server
shell: /bin/false
system: true
create_home: false
- name: Create working dir for static-web-server user
path: "{{ static_web_server_home }}"
state: directory
owner: static-web-server
group: static-web-server
mode: "0755"
- name: Download static web server binary
url: "{{ static_web_server_download_url }}"
dest: "/tmp/{{ static_web_server_download_url | basename }}"
checksum: "sha256:{{ static_web_server_checksum }}"
owner: root
group: root
mode: "0644"
register: download_static_web_server
- name: Extract static web server binary
src: "{{ download_static_web_server.dest }}"
dest: /usr/local/bin/
owner: root
group: root
mode: "0755"
remote_src: true
- --strip-components=1
- --wildcards
- "**/static-web-server"
notify: Restart static-web-server
- name: Create static-web-server-prepare script
dest: /usr/local/bin/static-web-server-prepare
owner: root
group: root
mode: "0755"
notify: Restart static-web-server-prepare
- name: Create static-web-server-prepare service
src: static-web-server-prepare.service.j2
dest: /etc/systemd/system/static-web-server-prepare.service
owner: root
group: root
mode: "0644"
notify: Restart static-web-server-prepare
- name: Create static-web-server-prepare timer
src: static-web-server-prepare.timer.j2
dest: /etc/systemd/system/static-web-server-prepare.timer
owner: root
group: root
mode: "0644"
notify: Restart static-web-server-prepare
- name: Start static-web-server-prepare
name: static-web-server-prepare.timer
state: started
enabled: true
daemon_reload: true
- name: Create static-web-server service
src: static-web-server.service.j2
dest: /etc/systemd/system/static-web-server.service
owner: root
group: root
mode: "0644"
notify: Restart static-web-server
- name: Start static-web-server service
name: static-web-server
state: started
enabled: true
daemon_reload: true
- name: Cilium
ansible.builtin.import_tasks: cilium.yml
- cilium
- name: Certbot
ansible.builtin.import_tasks: certbot.yml
- certbot
- name: Static web server
ansible.builtin.import_tasks: static-server.yml
- static-web-server
static_web_server_checksum: 492dda3749af5083e5387d47573b43278083ce62de09b2699902e1ba40bf1e45
- name: Bootstrap k8s node
hosts: k3s-cluster
gather_facts: true
become: true
- vars/{{ ansible_architecture }}.yml
- k8s
- provision
Running this will be as follows:
You can notice that we have turned on fact gathering in this step. This is due to our desire to include host-specific variables as you see with vars_files
From the above tasks, there are references to a couple of important files. One is the static-web-server-prepare
which has both a service
file as well as a timer
This gives us flexibility to define oneshot
services which will only run to completion on every tick of the timer
. Effectively, we'll be able to separate the executable task and the scheduling of the task.
The definitions for those files are as following:
#!/usr/bin/env sh
# This script will run as root to prepare the files for the static web server
set -eu
mkdir -p {{ static_web_server_home }}/.well-known \
{{ static_web_server_home }}/openid/v1
kubectl get --raw /.well-known/openid-configuration > \
{{ static_web_server_home }}/.well-known/openid-configuration
kubectl get --raw /openid/v1/jwks > \
{{ static_web_server_home }}/openid/v1/jwks
cp /etc/letsencrypt/live/{{ domain_name }}/fullchain.pem \
{{ static_web_server_home }}/cert.pem
cp /etc/letsencrypt/live/{{ domain_name }}/privkey.pem \
{{ static_web_server_home }}/key.pem
chown -R static-web-server:static-web-server {{ static_web_server_home }}
Notice how we are manually fetching the OIDC configurations from the Kubernetes as well as the TLS certificate. This is due to a possibility of renewal for any of the given files:
- Firstly, the Kubernetes API server might rotate its Service Account issuer key pair and with that, the JWKs URL will have different output.
- Secondly, the TLS certificate will be renewed by
in the background and we have to keep up with that.
Now, let's take a look at our preparation service and timer definition.
Description=Update TLS & K8s OIDC Config
Environment=KUBECONFIG={{ kubeconfig }}
ExecStartPre=/bin/mkdir -p .well-known openid/v1
WorkingDirectory={{ static_web_server_home }}
Description=Update TLS & K8s OIDC Config Every Minute
OnCalendar=*-*-* *:*:00
Notice that the service file specifies the working directory for the script. Which means the static-web-server-prepare
shell script will be executed in the specified directory.
Also, watch out for oneshot
systemd service type. These services are not long-running processes in an infitie loop. Instead, they will run to completion and the systemd will not report their state as Active
as it would with simple
Step 6: Add the OIDC Provider to AWS¶
That's it. We have done all the hard work. Anything after this will be a breeze compared to what we've done so far as you shall see shortly.
Now, we have a domain name that is publishing its OIDC configuration and JWKs over the HTTPS endpoint and is ready to be used as a trusted OIDC provider.
All we need right now, is a couple of TF resource in the AWS account and after that, we can test the setup using a sample Job that takes a Service Account in its definition and uses its token to talk to AWS.
Note that we're starting a new TF module below.
terraform {
required_providers {
tls = {
source = "hashicorp/tls"
version = "~> 4.0"
aws = {
source = "hashicorp/aws"
version = "~> 5.46"
data "terraform_remote_state" "k8s" {
backend = "local"
config = {
path = "../provision-k8s/terraform.tfstate"
data "tls_certificate" "this" {
url = "https://${data.terraform_remote_state.k8s.outputs.oidc_provider_url}"
resource "aws_iam_openid_connect_provider" "this" {
url = "https://${data.terraform_remote_state.k8s.outputs.oidc_provider_url}"
# JWT token audience (aud)
client_id_list = [
thumbprint_list = [
Let's apply this stack:
Believe it or not, but after all these efforts, it is finally done.
Now it is time for the test.
In order to be able to assume a role from inside a pod of our cluster, we will create a sample IAM Role with a trust relationship to the OIDC provider we just created.
data "aws_iam_policy_document" "this" {
statement {
actions = [
effect = "Allow"
principals {
type = "Federated"
identifiers = [
condition {
test = "StringEquals"
variable = "${aws_iam_openid_connect_provider.this.url}:aud"
values = [
condition {
test = "StringEquals"
variable = "${aws_iam_openid_connect_provider.this.url}:sub"
values = [
resource "aws_iam_role" "this" {
name = "k3s-demo-app"
assume_role_policy = data.aws_iam_policy_document.this.json
managed_policy_arns = [
output "iam_role_arn" {
value = aws_iam_role.this.arn
output "service_account_namespace" {
value = var.service_account_namespace
output "service_account_name" {
value = var.service_account_name
The AWS IAM Role trust relationship will look something like this:
"Statement": [
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"": "",
"": "system:serviceaccount:default:demo-service-account"
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::XXXXXXXXXXXX:oidc-provider/"
"Version": "2012-10-17"
This, of course, shouldn't come as a surprise. We have already seen this in the TF definition above.
Step 7: Test the Setup¶
We have created the IAM Role with the trust relationship to the OIDC provider of the cluster. With the conditional in the AWS IAM Role you se in the previous step, only the Service Accounts with the specified audience, in the default
namespace and with the Service Account name demo-service-account
will be able to assume the role.
That said, let's use create another Ansible role to create a Kuberentes Job.
We will need the Kubernetes core Ansible collection, so let's install that.
apiVersion: v1
kind: ServiceAccount
name: demo-service-account
namespace: default
apiVersion: batch/v1
kind: Job
name: demo-app
namespace: default
job-name: demo-app
job-name: demo-app
restartPolicy: Never
- image: amazon/aws-cli:2.15.40
name: demo-app
- sh
- -c
- |
aws sts get-caller-identity
aws ssm get-parameters-by-path \
--path / --recursive \
--with-decryption \
--query "Parameters[*].[Name]" \
--output text
- name: AWS_REGION
value: "{{ aws_region }}"
- name: AWS_ROLE_ARN
value: "{{ role_arn }}"
value: "{{ role_session_name }}"
value: /var/run/secrets/tokens/token
readOnlyRootFilesystem: true
- name: token
mountPath: /var/run/secrets/tokens
readOnly: true
- name: aws-config
mountPath: /root/.aws
serviceAccountName: demo-service-account
- name: token
- serviceAccountToken:
path: token
- name: aws-config
emptyDir: {}
- name: Apply the app job
template: manifest.yml
state: present
force: true
wait: true
- name: Bootstrap k8s node
hosts: k3s-cluster
gather_facts: true
become: true
- vars/{{ ansible_architecture }}.yml
- k8s
- provision
- name: Test the AWS Access
hosts: k3s-cluster
gather_facts: false
become: true
KUBECONFIG: /etc/rancher/k3s/k3s.yaml
- name: Install pip3
name: python3-pip
state: present
- name: Install kubernetes library
name: kubernetes<30
state: present
- name: Read Tofu output from ./configure-oidc
cmd: tofu output -raw iam_role_arn
chdir: "{{ playbook_dir }}/configure-oidc"
delegate_to: localhost
become: false
changed_when: false
register: configure_oidc
- name: Set the AWS role arn
role_arn: "{{ configure_oidc.stdout }}"
- app
- test
- never
A few important notes are worth mentioning here:
- The second playbook is tagged with
. That is because there is a dependency on the second TF module. We have to manually resolve it before being able to run the second playbook. As soon as the dependency is resolved, we can run the second playbook with the--tags test
flag. - There is a fact gathering in the
of the second playbook. That is, again, because of the dependency to the TF module. We will grab the output of the TF module and pass it to our next role. If you notice there is aaws_region
variable in the Jinja template that is being initialized by this fact gathering step. (1) - In the fact gathering step, there is an Ansible delegation happening. This will ensure that the task is running in our own machine and not the target machine. The reason is that the TF module and its TF state file is in our local machine. We also do not need the
and as such it is turned off. - You will notice that the job manifest is using AWS CLI Docker image. By specifying some of the expected environment variables14, we are able to use the AWS CLI without the requirement of manual
aws configure
- These two steps: playbook.yml
- name: Read Tofu output from ./configure-oidc ansible.builtin.command: cmd: tofu output -raw iam_role_arn chdir: "{{ playbook_dir }}/configure-oidc" delegate_to: localhost become: false changed_when: false register: configure_oidc - name: Set the AWS role arn ansible.builtin.set_fact: role_arn: "{{ configure_oidc.stdout }}"
This playbook can be run after the second TF module with the following command:
When checking the logs of the deployed Kubernetes Job, we can see that it has been successful.
kubectl logs job/demp-app
There is no AWS SSM Parameter in the target AWS account and as such, the AWS CLI will not return an empty list; it will return nothing!
Lastly, to test if the Service Account and the IAM Role trust policy plays any role in any of this, we can remove the serviceAccountToken
and try to recreate the job.
The output is as expected:
An error occurred (AccessDenied) when calling the AssumeRoleWithWebIdentity operation: Not authorized to perform sts:AssumeRoleWithWebIdentity
That's all folks! We can now wrap this up.
Bonus: JWKs URL¶
Remember at the beginning of this guide when we mentioned that the JWKs URL is configurable through the OIDC configuration endpoint?
Let's see it in action.
DOMAIN=$(grep domain_name inventory/group_vars/all.yml | awk '{print $2}')
curl https://$DOMAIN/.well-known/openid-configuration | jq -r .jwks_uri
This means that you can host your JWKs on a different server than the OIDC server. Although I don't suggest this to be a good idea because of all the maintenance overhead.
That said, if your JWKs URL is at a different server or hosted on a different endpoint, all you gotta do, is pass the value to the kube-apiserver
as you see below:
OpenID Connect is one of the most powerful protocols that powers the internet. Yet, it is so underestimated and easily overlooked. If you look closely enough on any system around you, you will see a lot of practical applications of OIDC.
One of the cues that you can look for when trying to identify applicability of OIDC is when trying to authenticate an identity of one system to another. You will almost always never need to create another identity in the target system, nor do you need to pass any credentials around. All that's needed is to establish a trust relationship between the two systems and you're good to go.
This gives you a lot of flexibility and enhances your security posture. You will also remove the overhead of secret rotations from your workload.
In this post, we have seen how to establish a trust relationship between a bare-metal Kubernetes cluster and AWS IAM to grant cluster generated Service Account tokens access to AWS services using OIDC.
Having this foundation in place, it's easy to extend this pattern to managed Kubernetes clusters such as Azure Kubernetes Service (AKS)2 or Google Kubernetes Engine (GKE)15. All you need from the managed Kubernetes cluster is the OIDC configuration endpoint, which in turn has the JWKs URL. With that, you can create the trust relationship in AWS or any other Service Provider and grant the relevant access to your services as needed.
Hope you've enjoyed reading the post as much as I've enjoyed writing it. I wish you have learned something new and useful from it.
Until next time , ciao
& happy coding!
Subscribe to Newsletter Subscribe to RSS Feed
Share on Share on Share on Share on
- ↩
- ↩
- ↩
- ↩
- ↩
- ↩