Skip to content

How to Access AWS From Azure VM Using OpenID Connect

In the day to day operations of a software application, it is common to be in a position to manage and handle multiple cloud providers at the same time.

This can be due to the business requirements or because of technological constraints on the primary cloud provider of the company.

In this blog post we will see how to grant an Azure Virtual Machine access to AWS services, without storing any long-lived credentials and with the power of OpenID Connect.

If you've worked with either cloud, or want inspiration on how to apply this technique to your setup, then this blog post is for you.

Introduction

The idea of OpenID Connect fascinates me the most. Especially knowing how useful it is in the modern day of software development and with the best practices of security as batteries included. Yet, it's really surprising that is it not as widely adopted or known for as it should be.

It's really one of the most underrated technologies of our time, yet its effectiveness in what it's promised for is unmatched.

I really wish more and more people could see the potential aspect of using OIDC in their applications and services and adopting and integrating their workflows with it.

The truth is, those in the knowing are already benefiting from it at scale, in ways not intuitively visible to naked the eyes, unless you look close enough. 🧐

If you have never used OIDC ever before, or if you're still doubtful of its potential, then this blog post is for you. We have a full archive of posts discussing various implementations and integration guides when it comes to OpenID Connect should you choose to study this topic further.

Why Should You Care?

The main objective is simple and very practical. We want to grant an Azure VM access to AWS services, e.g., to list the AWS S3 buckets and/or its objects.

Given this task to a non-informed operational engineer, you'd likely see them passing around AWS credentials into the VM; that can't be the worst problem happening since if all the other measures are in place, the VM is only accessible to the set trusted parties, e.g., through restricting the network access using security groups, i.e., firewalls.

The matters gets worse real quick when you realize that those secrets need to be passed to the VM somehow, and one of the ugliest ways you can do that is to hard-code them in a private repository.

That also cannot be the worst thing happening since if your Git service provider is never compromised (which is very unlikely in the absolute sense of the word), the very least you have to worry about is the rotation of your secrets!

This is a crucial aspect since there should be a clear and concise plan for the secrets rotation of your platform, ideally through automation and without the need for manual intervention.

I hope I was successful painting what it's like to operate in such environments. Honestly, it's not pretty and you have to seriously start planning proactively to address such shortages and maintain the excellence of your operations.

What's the Alternative, Then?

Well, OpenID Connect to the rescue. In a nutshell, OIDC ensures that you don't pass around secrets where you don't have to; that is, places where the identity of a user, a machine, or a service is the responsibility of an Identity Provider and through that, you can establish a trust relationship with a third-party service in such a way that the identities of one provider are authenticated to the other provider.

If all this sounds too jibberish, let's provide a visual diagram to illustrate the concept.

sequenceDiagram
    participant vm as Azure VM
    participant idp as Azure AD
    participant aws as AWS
    idp-->aws: Trust relationship established by administrator
    vm->>idp: Give me an access token, I am VM no. 1234
    idp->>vm: Here's your access token
    vm->>aws: I was sent by Azure AD, here's my access token, list the S3 buckets
    aws->>vm: You're authenticated, here's the list of S3 buckets

As you see in the diagram, the whole idea is that the AWS no longer takes care of keeping the identities on its side, instead, the trust relationship from Azure to AWS allows for the identities of Azure to be authenticated by AWS.

If you think about it, the AWS doesn't even need to keep the identity information of such a VM, because it is not a resource managed by AWS after all.

That's the whole idea of OpenID Connect, and in this post, we will provide the Infrastructure as Code to implement such a trust relationship between Azure and AWS.

Directory Structure

Before we start, let's give you an idea of what to expect from a directory point of view.

.
├── ansible.cfg
├── azure-vm/
├── playbook.yml
├── trust-relationship/
└── vm-identity/

Establishing the Trust Relationship

As per the diagram above, we'll establish that crucial trust relationship we've talked about. This is the core of our setup, one that we cannot live without and the rest of this guide will be useless if not done correctly.

In setting up the trust relationship, you will need to query your Azure AD tenant for its OIDC configuration endpoint1. That is the endpoint where all the key components of an OIDC compliance are stored, e.g., the jwks_uri is for the public keys that the Azure AD uses to sign the JWT tokens.

In turn, AWS will use those keys to verify the integrity and validity of the provided JWT tokens; think in terms of Azure signing off tokens with its private keys, and having its public keys open to the world, using which anyone can verify if a given token is signed by Azure or not.

Let's now get hands-on and create a trust relationship from Azure AD to AWS.

trust-relationship/versions.tf
terraform {
  required_providers {
    azuread = {
      source  = "hashicorp/azuread"
      version = "~> 2.50"
    }
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.50"
    }
    tls = {
      source  = "hashicorp/tls"
      version = "~> 4.0"
    }
  }
  required_version = "< 2"
}

provider "azurerm" {
  features {}
}
trust-relationship/variables.tf
variable "tenant_id" {
  type    = string
  default = null
}
trust-relationship/main.tf
locals {
  tenant_id  = coalesce(var.tenant_id, data.azuread_client_config.current.tenant_id)
  tenant_url = format("https://sts.windows.net/%s/", local.tenant_id)
}

data "azuread_client_config" "current" {}

data "tls_certificate" "this" {
  url = local.tenant_url
}

resource "aws_iam_openid_connect_provider" "this" {
  url = local.tenant_url

  # aka: `aud` claim
  client_id_list = ["https://management.core.windows.net/"]

  thumbprint_list = [
    data.tls_certificate.this.certificates.0.sha1_fingerprint
  ]
}
trust-relationship/outputs.tf
output "oidc_arn" {
  value = aws_iam_openid_connect_provider.this.arn
}

output "oidc_url" {
  value = aws_iam_openid_connect_provider.this.url
}

OAuth2 Provider Trust Relationship

Bear in mind that the trust relationship from one service provider to the next is a one-way street. That is, if AWS trusts the identities of Azure AD, that by no means implies that Azure AD trusts the identities of AWS in return.

Unless, the returning trust relationship is also established in and of its own.

OIDC trust relationship does not imply a two-way trust relationship.

Now, let's explain the above TF code for further clarity.

OpenID Connect Audience

Notice that the client_id_list must include a value and based on my findings so far, I couldn't find a way to customize this audience field anywhere in Azure. I'd be more than happy to be proven wrong by a diligent reader. 🤗

But, until then, it's safe to assume that the audience of the JWT token is what you see in this TF code. 🤷

However, when we get to the AWS side, we would normally want to be rest assured that not all the identities of the given Identity Provider will be able to assume our role, and that's where we place the conditional on the sub claim of the JWT token as you will see shortly.

OIDC URL

Additionally, pay close attention to the URL of our OpenID Connect provider. This is something tailored specific to Azure AD and its format is just as you see in the code above, with sts.windows.net in the hostname and the tenant ID in the http path.

trust-relationship/main.tf
  tenant_url = format("https://sts.windows.net/%s/", local.tenant_id)

Eventually, as per the OIDC compliance2, one is able to fetch the OIDC configuration from such URL by issuing the following HTTP request:

TENANT_ID="00000000-0000-0000-0000-000000000000"
curl https://sts.windows.net/$TENANT_ID/.well-known/openid-configuration

And the response is, of course, all the required and extended functionalities of Azure AD as far as OIDC is concerned.

Click to expand
{
  "authorization_endpoint": "https://login.windows.net/00000000-0000-0000-0000-000000000000/oauth2/authorize",
  "check_session_iframe": "https://login.windows.net/00000000-0000-0000-0000-000000000000/oauth2/checksession",
  "claims_supported": [
    "sub",
    "iss",
    "cloud_instance_name",
    "cloud_instance_host_name",
    "cloud_graph_host_name",
    "msgraph_host",
    "aud",
    "exp",
    "iat",
    "auth_time",
    "acr",
    "amr",
    "nonce",
    "email",
    "given_name",
    "family_name",
    "nickname"
  ],
  "cloud_graph_host_name": "graph.windows.net",
  "cloud_instance_name": "microsoftonline.com",
  "device_authorization_endpoint": "https://login.windows.net/00000000-0000-0000-0000-000000000000/oauth2/devicecode",
  "end_session_endpoint": "https://login.windows.net/00000000-0000-0000-0000-000000000000/oauth2/logout",
  "frontchannel_logout_supported": true,
  "http_logout_supported": true,
  "id_token_signing_alg_values_supported": [
    "RS256"
  ],
  "issuer": "https://sts.windows.net/00000000-0000-0000-0000-000000000000/",
  "jwks_uri": "https://login.windows.net/common/discovery/keys",
  "kerberos_endpoint": "https://login.windows.net/00000000-0000-0000-0000-000000000000/kerberos",
  "microsoft_multi_refresh_token": true,
  "msgraph_host": "graph.microsoft.com",
  "rbac_url": "https://pas.windows.net",
  "response_modes_supported": [
    "query",
    "fragment",
    "form_post"
  ],
  "response_types_supported": [
    "code",
    "id_token",
    "code id_token",
    "token id_token",
    "token"
  ],
  "scopes_supported": [
    "openid"
  ],
  "subject_types_supported": [
    "pairwise"
  ],
  "tenant_region_scope": "EU",
  "token_endpoint": "https://login.windows.net/00000000-0000-0000-0000-000000000000/oauth2/token",
  "token_endpoint_auth_methods_supported": [
    "client_secret_post",
    "private_key_jwt",
    "client_secret_basic"
  ],
  "userinfo_endpoint": "https://login.windows.net/00000000-0000-0000-0000-000000000000/openid/userinfo"
}

Applying the Stack

Applying this stack and we will have our trust relationship setup and ready for the next steps where we will leverage this trust to create an IAM role.

export AWS_PROFILE="PLACEHOLDER"
export ARM_CLIENT_ID="PLACEHOLDER"
export ARM_TENANT_ID="PLACEHOLDER"

tofu init
tofu plan -out tfplan
tofu apply tfplan

AWS IAM Role

At this point, we should head over to AWS to create a new IAM Role with the proper conditionals and trust relationship to Azure AD.

The idea is that using the newly created OpenID Connect provider in the last step, we can now instruct the AWS IAM to grant access to any identity coming from such a provider and has a specific subject claim in its JWT token.

If this all sounds a bit too vague, let's see some code to make it more clear. The TF code below will create an Azure user assigned identity3 as well as an AWS IAM Role to trust such identity.

vm-identity/versions.tf
terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.104"
    }
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.50"
    }
  }
  required_version = "< 2"
}

provider "azurerm" {
  features {}
}
vm-identity/variables.tf
variable "role_name" {
  type    = string
  default = "azure-vm"
}
vm-identity/main.tf
locals {
  oidc_arn = data.terraform_remote_state.trust_relationship.outputs.oidc_arn
  oidc_url = data.terraform_remote_state.trust_relationship.outputs.oidc_url
}

data "terraform_remote_state" "trust_relationship" {
  backend = "local"

  config = {
    path = "../trust-relationship/terraform.tfstate"
  }
}

resource "azurerm_resource_group" "this" {
  name     = "aws-oidc-rg"
  location = "Germany West Central"
}

resource "azurerm_user_assigned_identity" "this" {
  location            = azurerm_resource_group.this.location
  name                = "aws-oidc-identity"
  resource_group_name = azurerm_resource_group.this.name
}

data "aws_iam_policy_document" "trust" {
  statement {
    actions = ["sts:AssumeRoleWithWebIdentity"]
    effect  = "Allow"

    principals {
      type        = "Federated"
      identifiers = [local.oidc_arn]
    }

    condition {
      test     = "StringEquals"
      variable = "${local.oidc_url}:sub"
      values   = [azurerm_user_assigned_identity.this.principal_id]
    }
  }
}

resource "aws_iam_role" "this" {
  name               = var.role_name
  assume_role_policy = data.aws_iam_policy_document.trust.json
  managed_policy_arns = [
    "arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess",
  ]
}

# Enable later ansible-playbook to fetch this value from remote API call
resource "aws_ssm_parameter" "role_arn" {
  name  = "/azure/oidc/role-arn"
  type  = "String"
  value = aws_iam_role.this.arn
}
vm-identity/outputs.tf
output "identity_id" {
  value = azurerm_user_assigned_identity.this.id
}

output "resource_group_name" {
  value = azurerm_resource_group.this.name
}

output "location" {
  value = azurerm_resource_group.this.location
}

output "role_arn" {
  value = aws_iam_role.this.arn
}

This stack contains two main components, which we'll explain below.

1. User Assigned Identity

The first one includes creating an Azure User Assigned Identity. This will be the identity of our Virtual Machine in the next step. It is basically like a username assigned to the VM of our choice and it is guaranteed to be unique and persistent; that's the reason we can rely on its ID when placing the conditional on our IAM Role.

vm-identity/main.tf
resource "azurerm_user_assigned_identity" "this" {
  location            = azurerm_resource_group.this.location
  name                = "aws-oidc-identity"
  resource_group_name = azurerm_resource_group.this.name
}

If you're not an Azure user, one thing to keep in mind is that in Azure every single resource has to be placed in a resource group4. That makes grouping easier on the organization as well as the billing side of things.

2. IAM Role and Trust Relationship

The second component is the IAM Role itself. It is the role that will be assumed by the VM in the next step. There is only one identity in the whole world who can assume this and that is because of the conditional we placed on the sub claim of the JWT token coming to the AWS STS service, as you see below:

vm-identity/main.tf
    condition {
      test     = "StringEquals"
      variable = "${local.oidc_url}:sub"
      values   = [azurerm_user_assigned_identity.this.principal_id]
    }

This principal ID is also interchangably called the object id; as if working in Azure environment wasn't confusing enough already! 😖

In the end, once this stack is also deployed just as the one before, we will have an IAM Role similar to what you see below:

{
  "Role": {
    "Arn": "arn:aws:iam::XXXXXXXXXXXX:role/azure-vm",
    "AssumeRolePolicyDocument": {
      "Statement": [
        {
          "Action": "sts:AssumeRoleWithWebIdentity",
          "Condition": {
            "StringEquals": {
              "sts.windows.net/00000000-0000-0000-0000-000000000000/:sub": "828b741f-e7af-4737-9490-770b12926479"
            }
          },
          "Effect": "Allow",
          "Principal": {
            "Federated": "arn:aws:iam::XXXXXXXXXXXX:oidc-provider/sts.windows.net/00000000-0000-0000-0000-000000000000/"
          }
        }
      ],
      "Version": "2012-10-17"
    },
    "CreateDate": "2024-05-24T02:37:24+00:00",
    "MaxSessionDuration": 3600,
    "Path": "/",
    "RoleId": "AROA6AMOBUU5EVQ54KDCE",
    "RoleLastUsed": {
      "LastUsedDate": "2024-05-24T04:10:44+00:00",
      "Region": "eu-central-1"
    },
    "RoleName": "azure-vm"
  }
}

Azure Virtual Machine

At this point all is ready from administration and managerial point of view. We only need to create the VM, let it know which IAM Role it should assume, and make a test API call to AWS to list the S3 buckets.

If that works, all this has been successful.

Therefore, we have two main objectives:

  1. Create the Azure VM using TF code for the provisioning stage.
  2. Wait a bit for the VM to be ready and then run an Ansible playbook to take care of the rest.

In Azure, any VM with an identity attached can fetch an access token5. You can grant such identity permissions in and outside Azure cloud. For us, this is going to be AWS.

The identity of the VM is the most critical part of this next step. One which the whole operation would be meaningless otherwise. The identity of a VM in Azure is as if the VM had gotten username and password credentials during the provisioning stage, using which it would be able to fetch a short-lived access token from Azure AD6.

Let's stop talking and actually create the VM. Although, bear in mind that this stack is quite heavy and need careful attention to the details.

azure-vm/versions.tf
terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.104"
    }
    http = {
      source  = "hashicorp/http"
      version = "~> 3.4"
    }
  }
  required_version = "< 2"
}

provider "azurerm" {
  features {}
}

Notice that we're, again, using the outputs from our earlier TF stack by querying the TF state file in this code. 👇

azure-vm/main.tf
data "terraform_remote_state" "vm_identity" {
  backend = "local"

  config = {
    path = "../vm-identity/terraform.tfstate"
  }
}

locals {
  identity_id         = data.terraform_remote_state.vm_identity.outputs.identity_id
  resource_group_name = data.terraform_remote_state.vm_identity.outputs.resource_group_name
  location            = data.terraform_remote_state.vm_identity.outputs.location
  role_arn            = data.terraform_remote_state.vm_identity.outputs.role_arn
}
azure-vm/network.tf
data "http" "admin_public_ip" {
  url = "https://checkip.amazonaws.com"
}

resource "azurerm_virtual_network" "this" {
  name                = "oidc-vnet"
  address_space       = ["100.0.0.0/16"]
  location            = local.location
  resource_group_name = local.resource_group_name
}

resource "azurerm_subnet" "this" {
  name                 = "oidc-subnet"
  resource_group_name  = local.resource_group_name
  virtual_network_name = azurerm_virtual_network.this.name
  address_prefixes     = ["100.0.2.0/24"]
}

resource "azurerm_public_ip" "this" {
  name                = "oidc-pip"
  resource_group_name = local.resource_group_name
  location            = local.location
  allocation_method   = "Static"
  ip_version          = "IPv4"
}

resource "azurerm_network_interface" "this" {
  name                = "oidc-nic"
  location            = local.location
  resource_group_name = local.resource_group_name

  ip_configuration {
    name                          = "ipv4"
    subnet_id                     = azurerm_subnet.this.id
    private_ip_address_allocation = "Dynamic"
    public_ip_address_id          = azurerm_public_ip.this.id
  }
}

resource "azurerm_network_security_group" "this" {
  name                = "oidc-nsg"
  location            = local.location
  resource_group_name = local.resource_group_name

  security_rule {
    name                       = "admin"
    priority                   = 100
    direction                  = "Inbound"
    access                     = "Allow"
    protocol                   = "Tcp"
    source_port_range          = "*"
    destination_port_range     = "*"
    source_address_prefixes    = [
      trimspace(data.http.admin_public_ip.response_body),
    ]
    destination_address_prefix = "*"
  }
}

resource "azurerm_network_interface_security_group_association" "this" {
  network_interface_id      = azurerm_network_interface.this.id
  network_security_group_id = azurerm_network_security_group.this.id
}
azure-vm/compute.tf
resource "tls_private_key" "this" {
  algorithm = "RSA"
}

resource "azurerm_ssh_public_key" "this" {
  name                = "oidc-ssh-key"
  resource_group_name = local.resource_group_name
  location            = local.location
  public_key          = tls_private_key.this.public_key_openssh
}

resource "azurerm_linux_virtual_machine" "this" {
  name                = "oidc-vm"
  resource_group_name = local.resource_group_name
  location            = local.location
  size                = "Standard_B2pts_v2"
  admin_username      = "adminuser"
  network_interface_ids = [
    azurerm_network_interface.this.id,
  ]

  admin_ssh_key {
    username   = "adminuser"
    public_key = azurerm_ssh_public_key.this.public_key
  }

  os_disk {
    caching              = "None"
    storage_account_type = "Standard_LRS"
  }

  # https://learn.microsoft.com/en-us/azure/virtual-machines/linux/cli-ps-findimage
  source_image_reference {
    publisher = "Debian"
    offer     = "debian-13-daily"
    sku       = "13-arm64"
    version   = "latest"
  }

  user_data = base64encode(<<-EOF
    #cloud-config
    package_update: true
    package_upgrade: true
    packages:
      - jq
      - awscli
      - python3.12
    runcmd:
      - curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
  EOF
  )

  identity {
    type = "UserAssigned"
    identity_ids = [
      local.identity_id,
    ]
  }
}
azure-vm/outputs.tf
output "vm_public_ip" {
  value = azurerm_public_ip.this.ip_address
}

output "ssh_private_key" {
  value     = tls_private_key.this.private_key_pem
  sensitive = true
}

output "ansible_inventory_yml" {
  value = <<-EOF
    oidc:
      hosts:
        ${azurerm_public_ip.this.ip_address}:
          ansible_host: ${azurerm_public_ip.this.ip_address}
          ansible_ssh_user: adminuser
          ansible_ssh_private_key_file: /tmp/oidc-vm.pem
          ansible_ssh_common_args: "-o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o PasswordAuthentication=no"
  EOF
}

This stack deserves a good amount of explanation. Let's break down the components and provide proper details.

Networking

The networking part is similar and hefty to what AWS is in terms of resources and their relationship. Keeping the networking resources in a separate file allows for better logical grouping and readability.

The Network Security Group (NSG) below is only opening the ports to the admin public IP address; which is the IP address of the control plan machine applying this TF stack.

azure-vm/network.tf
data "http" "admin_public_ip" {
  url = "https://checkip.amazonaws.com"
}
  security_rule {
    name                       = "admin"
    priority                   = 100
    direction                  = "Inbound"
    access                     = "Allow"
    protocol                   = "Tcp"
    source_port_range          = "*"
    destination_port_range     = "*"
    source_address_prefixes    = [
      trimspace(data.http.admin_public_ip.response_body),
    ]
    destination_address_prefix = "*"
  }

SSH Keys

Sadly enough, when specifying admin_ssh_key, Azure VMs do not accept SSH keys of types other than RSA7. Otherwise, the author's preference is ED25519. 🛡

Target Image

As for source_image_reference, be very careful when trying to reference an image in Azure. They do not make it easy for you to guess an image name or find your preferred image easily. You'd have to really struggle, and it took me some time to actually come up with the following Debian image that has ARM64 support8.

azure-vm/compute.tf
  source_image_reference {
    publisher = "Debian"
    offer     = "debian-13-daily"
    sku       = "13-arm64"
    version   = "latest"
  }

User Data

For the VM user data, we're leveraging the cloud-init9. Do check them out if not already, but know that I personally find them very limited in terms of functionality. In more complex cases, I'd rather run Ansible playbooks and save the golden image for further use.

In a nutshell, in the following config we're installing Azure CLI, AWS CLI, jq, and lastly Python3.12 for our next Ansible playbook.

azure-vm/compute.tf
  user_data = base64encode(<<-EOF
    #cloud-config
    package_update: true
    package_upgrade: true
    packages:
      - jq
      - awscli
      - python3.12
    runcmd:
      - curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
  EOF
  )

Applying the Stack

Once you have the TF code ready, you can apply the stack as follows:

tofu init
tofu plan -out tfplan
tofu apply tfplan

# needed for the next step
tofu output -raw ssh_private_key > /tmp/oidc-vm.pem
chmod 600 /tmp/oidc-vm.pem
tofu output -raw ansible_inventory_yml > ../inventory.yml

In the end, this is what it will look like in Azure Portal10.

Azure Resource Group
Azure Resource Group

Ansible Playbook

It's now time to test this whole setup with a call to AWS. If that call is successful, everything we've been working so far has been fruitful. 🍇

The idea is to:

  1. Login to Azure AD from within the VM using the user assigned identity attached during the provisioning stage.
  2. Fetch an access token from Azure AD using that logged in identity.
  3. Instruct the AWS CLI to use that access token for calls to AWS services.

The Ansible playbook looks like blow.

playbook.yml
---
- name: Configure Azure VM Identity
  hosts: oidc
  become: false
  gather_facts: false
  vars:
    aws_profile: default
  tasks:
    - name: Login with identity
      ansible.builtin.command:
        cmd: az login -i --allow-no-subscriptions
        creates: ~/.azure/azureProfile.json
    - name: Fetch Azure access token
      ansible.builtin.command:
        cmd: az account get-access-token
      no_log: true
      changed_when: false
      register: azure_access_token
    - name: Place the VM identity token in ~/.azure/vm-identity-token
      ansible.builtin.copy:
        content: "{{ azure_access_token.stdout | from_json | json_query('accessToken') }}"
        dest: "~/.azure/vm-identity-token"
        mode: "0400"
    - name: Fetch role-arn from AWS SSM if not set from CLI
      ansible.builtin.set_fact:
        role_arn: "{{ lookup('amazon.aws.ssm_parameter', '/azure/oidc/role-arn', profile=aws_profile) }}"
      when: role_arn is not defined
    - name: Ensure ~/.aws/ dir exists
      ansible.builtin.file:
        path: "~/.aws"
        state: directory
        mode: "0700"
    - name: Create the AWS config
      ansible.builtin.copy:
        content: |
          [default]
          region = eu-central-1
          output = json
          web_identity_token_file = /home/{{ ansible_user }}/.azure/vm-identity-token
          role_arn = {{ role_arn }}
          role_session_name = azure-oidc-vm
        dest: "~/.aws/config"
        mode: "0400"
    - name: List AWS S3 buckets
      ansible.builtin.command:
        cmd: aws s3 ls
      changed_when: false

Let's explain this playbook a little bit.

Azure AD Login

We first need to login to Azure AD. That is only possible because we have an identity attached to the VM.

You can see the screenshot below for the user assigned identity in Azure Portal.

Azure VM User Assigned Identity
Azure VM User Assigned Identity

And this is the task that leverages on that.

playbook.yml
      ansible.builtin.command:
        cmd: az login -i --allow-no-subscriptions

Fetch Access Token

At this stage, we can use the newly logged in identity to grab an access token to be used later on by AWS CLI.

playbook.yml
    - name: Fetch Azure access token
      ansible.builtin.command:
        cmd: az account get-access-token
      no_log: true
      changed_when: false
      register: azure_access_token
    - name: Place the VM identity token in ~/.azure/vm-identity-token
      ansible.builtin.copy:
        content: "{{ azure_access_token.stdout | from_json | json_query('accessToken') }}"
        dest: "~/.azure/vm-identity-token"
        mode: "0400"

Access Token Expiry

By default, Azure AD access tokens are valid for 1 day and 5 minutes11. If you have a task that requires a valid token on every access, you can renew it before then using a cronjob or alike. ⏰

AWS Role ARN

Remember earlier when in our TF code we saved the role ARN to later be used by Ansible. This is it.

vm-identity/main.tf
resource "aws_ssm_parameter" "role_arn" {
  name  = "/azure/oidc/role-arn"
  type  = "String"
  value = aws_iam_role.this.arn
}
playbook.yml
    - name: Fetch role-arn from AWS SSM if not set from CLI
      ansible.builtin.set_fact:
        role_arn: "{{ lookup('amazon.aws.ssm_parameter', '/azure/oidc/role-arn', profile=aws_profile) }}"
      when: role_arn is not defined

AWS Configuration

All is ready for AWS to grab the token, use it to authenticate to AWS IAM, and make an AWS call to list the S3 buckets. We just need to instruct it on where to pick up the token from12.

playbook.yml
      ansible.builtin.copy:
        content: |
          [default]
          region = eu-central-1
          output = json
          web_identity_token_file = /home/{{ ansible_user }}/.azure/vm-identity-token
          role_arn = {{ role_arn }}
          role_session_name = azure-oidc-vm
        dest: "~/.aws/config"

No surprise here really, we are using the same ~/.azure/vm-identity-token path we have populated earlier by fetching an access token from Azure AD.

Running the Playbook

Of course the playbook runs and successfully lists the S3 buckets as we expected.

ansible-playbook playbook.yml -e aws_profile=$AWS_PROFILE

Bonus: JWT Claims

If you decode the access token given to the VM by Azure AD (~/.azure/vm-identity-token), you will see the following claims in the JWT token13.

{
  "aio": "E2NgYMg8Gbb806e+uKXTa+/9sfu7GwA=",
  "appid": "00000000-0000-0000-0000-000000000000",
  "appidacr": "2",
  "aud": "https://management.core.windows.net/",
  "exp": 1716610771,
  "iat": 1716524071,
  "idp": "https://sts.windows.net/00000000-0000-0000-0000-000000000000/",
  "idtyp": "app",
  "iss": "https://sts.windows.net/00000000-0000-0000-0000-000000000000/",
  "nbf": 1716524071,
  "oid": "828b741f-e7af-4737-9490-770b12926479",
  "rh": "0.AXkAC4levVMMqkiN4sz38azlP0ZIf3kAutdPukPawfj2MBOUAAA.",
  "sub": "828b741f-e7af-4737-9490-770b12926479",
  "tid": "00000000-0000-0000-0000-000000000000",
  "uti": "2BN2WjhkWE2mjPzyHC8hAA",
  "ver": "1.0",
  "xms_az_rid": "/subscriptions/00000000-0000-0000-0000-000000000000/resourcegroups/aws-oidc-rg/providers/Microsoft.Compute/virtualMachines/oidc-vm",
  "xms_mirid": "/subscriptions/00000000-0000-0000-0000-000000000000/resourcegroups/aws-oidc-rg/providers/Microsoft.ManagedIdentity/userAssignedIdentities/aws-oidc-identity",
  "xms_tcdt": "1650467252"
}

Conclusion

This blog post should wash away all doubts you might have had about the potential of OpenID Connect and how it can improve the security posture of your platform.

This is not the only use-case of OIDC, yet you can see the huge gain we've achieved by not passing around secrets where we didn't need to!

OIDC is also used whenever you, as a user, try to login to a third-party website and choose "Login with Example".

There are many benefits of integrating such a powerful technology into your platform, some of the most important ones you have already wintnessed with your own eyes in this blog post.

After writing so much about OIDC, I feel like I'm being drugged, happy and peaceful; yet, I still feel like I've only scratched the surface and there's a lot more implementation of OIDC that is yet to be explored between many of our online services.

If this blog post has piqued your interest in OIDC, stay tuned for more as I am as fascinated as you are and I will be writing more about it in the future.

Until then, happy OIDC-ing! 👋

If you enjoyed this blog post, consider sharing it with these buttons 👇. Please leave a comment for us at the end, we read & love 'em all. ❣

Share on Share on Share on Share on

Comments