Skip to content

Azure Shared Image Gallery

In recent years, Azure Cloud has provided the capability to share the VM images between regions, allowing you to create a Golden Image once and share it, whether publicly for the community, or privately within your organization.

Though, not the AzureRM OpenTofu provider, nor the Azure documentation, has a clear working example you can refer to. This is why I am sharing my struggle, so that you don't have to go through the same.

Creating the Linux VM

First things first, we need to creat the Virtual Machine. I create the Linux VM using the example provided in the OpenTofu Registry.

compute-v1.tf
# ref: https://registry.terraform.io/providers/hashicorp/azurerm/3.91.0/docs/resources/linux_virtual_machine

resource "azurerm_resource_group" "example" {
  name     = "example-resources"
  location = "West Europe"
}

resource "azurerm_virtual_network" "example" {
  name                = "example-network"
  address_space       = ["10.0.0.0/16"]
  location            = azurerm_resource_group.example.location
  resource_group_name = azurerm_resource_group.example.name
}

resource "azurerm_subnet" "example" {
  name                 = "internal"
  resource_group_name  = azurerm_resource_group.example.name
  virtual_network_name = azurerm_virtual_network.example.name
  address_prefixes     = ["10.0.2.0/24"]
}

resource "azurerm_network_interface" "example" {
  name                = "example-nic"
  location            = azurerm_resource_group.example.location
  resource_group_name = azurerm_resource_group.example.name

  ip_configuration {
    name                          = "internal"
    subnet_id                     = azurerm_subnet.example.id
    private_ip_address_allocation = "Dynamic"
  }
}

resource "azurerm_linux_virtual_machine" "example" {
  name                = "example-machine"
  resource_group_name = azurerm_resource_group.example.name
  location            = azurerm_resource_group.example.location
  size                = "Standard_F2"
  admin_username      = "adminuser"
  network_interface_ids = [
    azurerm_network_interface.example.id,
  ]

  admin_ssh_key {
    username   = "adminuser"
    public_key = file("~/.ssh/id_rsa.pub")
  }

  os_disk {
    caching              = "ReadWrite"
    storage_account_type = "Standard_LRS"
  }

  source_image_reference {
    publisher = "Canonical"
    offer     = "0001-com-ubuntu-server-jammy"
    sku       = "22_04-lts"
    version   = "latest"
  }
}

This setup works just alright, except that it has no public IP address and I won't be able to SSH into machine for any possible reason.

This public access will also require a proper firewall rule.

On top of that, it also will require a public SSH key for the authentication.

That's why, the modified version will look like the following.

compute-v2.tf
resource "azurerm_resource_group" "example" {
  name     = "example-resources"
  location = "West Europe"
}

resource "azurerm_virtual_network" "example" {
  name                = "example-network"
  address_space       = ["10.0.0.0/16"]
  location            = azurerm_resource_group.example.location
  resource_group_name = azurerm_resource_group.example.name
}

resource "azurerm_subnet" "example" {
  name                 = "internal"
  resource_group_name  = azurerm_resource_group.example.name
  virtual_network_name = azurerm_virtual_network.example.name
  address_prefixes     = ["10.0.2.0/24"]
}

resource "azurerm_public_ip" "example" {
  name                = "example-public-ip"
  location            = azurerm_resource_group.example.location
  resource_group_name = azurerm_resource_group.example.name
  sku                 = "Standard"
  ip_version          = "IPv4"
  allocation_method   = "Static"
}

resource "azurerm_network_interface" "example" {
  name                = "example-nic"
  location            = azurerm_resource_group.example.location
  resource_group_name = azurerm_resource_group.example.name

  ip_configuration {
    name                          = "internal"
    subnet_id                     = azurerm_subnet.example.id
    private_ip_address_allocation = "Dynamic"

    public_ip_address_id = azurerm_public_ip.example.id
  }
}

resource "tls_private_key" "example" {
  algorithm = "RSA"
  rsa_bits  = 3072
}

resource "azurerm_ssh_public_key" "example" {
  name                = "example-ssh-public-key"
  location            = azurerm_resource_group.example.location
  resource_group_name = azurerm_resource_group.example.name
  public_key          = tls_private_key.example.public_key_openssh
}


resource "azurerm_linux_virtual_machine" "example" {
  name                = "example-machine"
  resource_group_name = azurerm_resource_group.example.name
  location            = azurerm_resource_group.example.location
  size                = "Standard_F2"
  admin_username      = "adminuser"
  network_interface_ids = [
    azurerm_network_interface.example.id,
  ]

  admin_ssh_key {
    username   = "adminuser"
    public_key = azurerm_ssh_public_key.example.public_key
  }

  os_disk {
    caching              = "ReadWrite"
    storage_account_type = "Standard_LRS"
  }

  source_image_reference {
    publisher = "Canonical"
    offer     = "0001-com-ubuntu-server-jammy"
    sku       = "22_04-lts"
    version   = "latest"
  }
}

data "http" "my_ip" {
  url    = "https://ifconfig.me"
  method = "GET"
}

resource "azurerm_network_security_group" "example" {
  name                = "example-nsg"
  location            = azurerm_resource_group.example.location
  resource_group_name = azurerm_resource_group.example.name

  security_rule {
    name                       = "SSH"
    priority                   = 1000
    direction                  = "Inbound"
    access                     = "Allow"
    protocol                   = "Tcp"
    source_port_range          = "*"
    destination_port_range     = "22"
    source_address_prefix      = data.http.my_ip.body
    destination_address_prefix = "*"
  }
}

resource "azurerm_network_interface_security_group_association" "example" {
  network_interface_id      = azurerm_network_interface.example.id
  network_security_group_id = azurerm_network_security_group.example.id
}

Perfect! Now I have a VM machine in my Azure account that I can SSH into for further customization before creating the image.

Customize the VM

To keep things simple, let's just install a MongDB community edition on it and be on with it.

I am using ansible here, but you're free to SSH directly into the machine and run the ad-hoc commands.

Before being able to run Ansible on the target machine, I will need to create my inventory.

inventory.tf
locals {
  cwd = path.cwd
  key_filepath = "${path.cwd}/azure_vm.key"
}

resource "local_sensitive_file" "ssh_private_key" {
  content         = tls_private_key.example.private_key_pem
  filename        = local.key_filepath
  file_permission = "0400"
}


resource "local_file" "inventory" {
  content = <<-EOT
    azure:
      hosts:
        azure-vm0:
          ansible_host: ${azurerm_public_ip.example.ip_address}
          ansible_user: adminuser
          ansible_ssh_private_key_file: ${local.key_filepath}
          ansible_ssh_common_args: '-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null'
  EOT

  filename        = "${local.cwd}/inventory.yml"
  file_permission = "0640"
}

And now, I can either use null resource, or run the ansible-playbook from the CLI. I prefer the former, since it is replicatable across runs.

playbook.tf
resource "null_resource" "bootstrap" {
  connection {
    type        = "ssh"
    host        = azurerm_public_ip.example.ip_address
    user        = "adminuser"
    private_key = tls_private_key.example.private_key_pem
  }

  provisioner "local-exec" {
    # To account for cloud-init operations in the new created VM
    command = "sleep 120"
  }

  provisioner "local-exec" {
    command = "cd ${local.cwd} && ansible-playbook bootstrap.yml"
  }

  provisioner "remote-exec" {
    inline = [
      "sudo waagent -deprovision+user -force",
    ]
  }

  provisioner "local-exec" {
    command = "az vm deallocate --resource-group ${azurerm_resource_group.example.name} --name example-machine"
  }

  provisioner "local-exec" {
    command = "az vm generalize --resource-group ${azurerm_resource_group.example.name} --name example-machine"
  }

  triggers = {
    vm_id = azurerm_linux_virtual_machine.example.id,
  }

  depends_on = [
    local_file.inventory,
    azurerm_linux_virtual_machine.example,
    azurerm_public_ip.example,
  ]
}

Installing the MongoDB

One last piece to customize the VM is to install the dependencies we need. Here's the playbook I am using.

bootstrap.yml
- name: Install curl & gnupg
  ansible.builtin.apt:
    name: "{{ item }}"
    state: present
    update_cache: true
  with_items:
    - curl
    - gnupg
- name: Install Mongo dependencies
  block:
    - name: Add jammy-security repository to sources.list.d
      ansible.builtin.lineinfile:
        path: /etc/apt/sources.list.d/jammy-security.list
        line: "deb http://security.ubuntu.com/ubuntu jammy-security main"
        create: true
        state: present
        mode: "0644"
    - name: Install MongoDB GPG key
      ansible.builtin.get_url:
        url: https://pgp.mongodb.com/server-6.0.asc
        dest: /usr/share/keyrings/mongodb-server-6.0.asc
        mode: "0644"
- name: Add MongoDB repository to sources.list.d
  ansible.builtin.lineinfile:
    path: /etc/apt/sources.list.d/mongodb-org-6.0.list
    line: "deb [ arch=amd64,arm64 signed-by=/usr/share/keyrings/mongodb-server-6.0.asc ] https://repo.mongodb.org/apt/ubuntu jammy/mongodb-org/6.0 multiverse"
    create: true
    state: present
    mode: "0644"
- name: Install MongoDB community version
  ansible.builtin.apt:
    name: mongodb-org
    state: present
    update_cache: true
- name: Hold MongoDB packages
  ansible.builtin.dpkg_selections:
    name: "{{ item }}"
    selection: hold
  with_items:
    - mongodb-org
    - mongodb-org-database
    - mongodb-org-server
    - mongodb-mongosh
    - mongodb-org-mongos
    - mongodb-org-tools
- name: Set ulimit
  ansible.builtin.lineinfile:
    path: /etc/security/limits.d/99-mongodb-nproc.conf
    line: |
      limit fsize unlimited unlimited    # (file size)
      limit cpu unlimited unlimited      # (cpu time)
      limit as unlimited unlimited       # (virtual memory size)
      limit memlock unlimited unlimited  # (locked-in-memory size)
      limit nofile 64000 64000           # (open files)
      limit nproc 64000 64000            # (processes/threads)
    create: true
    state: present
    mode: "0644"
- name: Set configuration
  ansible.builtin.copy:
    content: |
      storage:
        dbPath: "/var/lib/mongodb"
        directoryPerDB: true
      systemLog:
        destination: file
        path: "/var/log/mongodb/mongod.log"
        logAppend: true
      processManagement:
        fork: true
      net:
        bindIp: 127.0.0.1
        port: 27017
      setParameter:
        enableLocalhostAuthBypass: true
      security:
        authorization: enabled
    dest: /etc/mongod.conf
    mode: "0644"
    owner: mongodb
    group: mongodb
- name: Start service
  ansible.builtin.systemd:
    name: mongod
    state: started
    enabled: true
    daemon_reload: true

That's it. After applying this stack with tofu apply, I will have a generalized VM ready to take a VM image from.

The generlization is something you should consider for yourself, as there are pros and cons to having either a generalized or a specialized image. For the purpose of this article, I am using a generalized VM image because there is nothing special about my image, nor do I have any of the conditions that will stop me from having such an image.

Create the Image

Running the stack so far will create a generalize VM, with my special dependencies installed. Now I am ready to create an image from it.

One requirement here is that I want to be able to use this image in other Azure regions. At the time of writing, the Azure cloud has recently provided the Azure Compute Gallery that will allow to replicate the same image across different regions.

The alternative is to create the same image in each region, which is an obvious waste of resource and money.

Let's create the image with the following resources.

image-v1.tf
# ref: https://registry.terraform.io/providers/hashicorp/azurerm/3.91.0/docs/resources/shared_image

resource "azurerm_resource_group" "example" {
  name     = "example-resources"
  location = "West Europe"
}

resource "azurerm_shared_image_gallery" "example" {
  name                = "example_image_gallery"
  resource_group_name = azurerm_resource_group.example.name
  location            = azurerm_resource_group.example.location
  description         = "Shared images and things."

  tags = {
    Hello = "There"
    World = "Example"
  }
}

resource "azurerm_shared_image" "example" {
  name                = "my-image"
  gallery_name        = azurerm_shared_image_gallery.example.name
  resource_group_name = azurerm_resource_group.example.name
  location            = azurerm_resource_group.example.location
  os_type             = "Linux"

  identifier {
    publisher = "PublisherName"
    offer     = "OfferName"
    sku       = "ExampleSku"
  }
}

Now this is where it gets tricky, because so far, this will only create the gallery and an image definition only. It doesn't give you the image, nor does it allow you to create VM instances out of it later on.

For that, you will need to create an image version.

image-v2.tf
resource "azurerm_resource_group" "example" {
  name     = "example-resources"
  location = "West Europe"
}

resource "azurerm_shared_image_gallery" "example" {
  name                = "example_image_gallery"
  resource_group_name = azurerm_resource_group.example.name
  location            = azurerm_resource_group.example.location
  description         = "Shared images and things."

  tags = {
    Hello = "There"
    World = "Example"
  }
}

resource "azurerm_shared_image" "example" {
  name                = "my-image"
  gallery_name        = azurerm_shared_image_gallery.example.name
  resource_group_name = azurerm_resource_group.example.name
  location            = azurerm_resource_group.example.location
  os_type             = "Linux"

  identifier {
    publisher = "PublisherName"
    offer     = "OfferName"
    sku       = "ExampleSku"
  }
}

resource "azurerm_shared_image_version" "example" {
  name                = "0.0.1"
  gallery_name        = azurerm_shared_image.example.gallery_name
  image_name          = azurerm_shared_image.example.name
  resource_group_name = azurerm_shared_image.example.resource_group_name
  location            = azurerm_shared_image.example.location

  target_region {
    name                   = azurerm_shared_image.example.location
    regional_replica_count = 1
    storage_account_type   = "Standard_LRS"
  }
}

Now, you might go happy about it and call it a day. But this will throw an error with the following content.

Bash
 "managed_image_id": one of `blob_uri,managed_image_id,os_disk_snapshot_id`
 must be specified

Troubleshooting

What does this mean then in simple English?

In simple terms, it means that the "version" you are trying to create, will actually be a simple tag. Think of Docker tags if it helps with the analogy.

But the whole point of this article is that you will not get through without creating and actual azurerm_image resource. That is the true image that will be created underneath. Without that, you cannot have an image version.

Again, if it helps with the analog, imagine trying to create a docker tag without having the image in the first place.

That's what this whole thing is about.

And to get around it, you will need to create the image as well.

Just as you see below.

image-v3.tf
resource "azurerm_resource_group" "example" {
  name     = "example-resources"
  location = "West Europe"
}

resource "azurerm_shared_image_gallery" "example" {
  name                = "example_image_gallery"
  resource_group_name = azurerm_resource_group.example.name
  location            = azurerm_resource_group.example.location
  description         = "Shared images and things."

  tags = {
    Hello = "There"
    World = "Example"
  }
}

resource "azurerm_shared_image" "example" {
  name                = "my-image"
  gallery_name        = azurerm_shared_image_gallery.example.name
  resource_group_name = azurerm_resource_group.example.name
  location            = azurerm_resource_group.example.location
  os_type             = "Linux"

  identifier {
    publisher = "PublisherName"
    offer     = "OfferName"
    sku       = "ExampleSku"
  }
}

resource "azurerm_image" "example" {
  name                      = "exampleimage"
  location                  = azurerm_linux_virtual_machine.example.location
  resource_group_name       = azurerm_linux_virtual_machine.example.name
  source_virtual_machine_id = azurerm_linux_virtual_machine.example.id
}

resource "azurerm_shared_image_version" "example" {
  name                = "0.0.1"
  gallery_name        = azurerm_shared_image.example.gallery_name
  image_name          = azurerm_shared_image.example.name
  resource_group_name = azurerm_shared_image.example.resource_group_name
  location            = azurerm_shared_image.example.location
  managed_image_id    = azurerm_image.example.id

  target_region {
    name                   = azurerm_shared_image.example.location
    regional_replica_count = 5
    storage_account_type   = "Standard_LRS"
  }
}

Versions

To help with reproducibility, I will include the versions of the providers in this post.

versions.tf
terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.92"
    }
    tls = {
      source  = "hashicorp/tls"
      version = "~> 4.0"
    }
    local = {
      source  = "hashicorp/local"
      version = "~> 2.4"
    }
    null = {
      source  = "hashicorp/null"
      version = "~> 3.2"
    }
  }

  required_version = "< 2"
}

Source Code

The code for this post is available from the following link.

Source code

Conclusion

That pretty much solves everything. I can't imagine having done it this way. But hey, this is Azure cloud we're talking about.

The things I've seen in Azure are the kind that I haven't seen elsewhere.

In no particular order, and in a non-exhaustive list, here are some horror stories:

  • Creating a parent and a child resource, updating the parent which forces a replacement and then the provided complains not being able to delete the parent because the child is still referencing it. I mean, isn't the whole point of IaC to be able to create, update and delete resources and the underlying provider takes care of the ugly work for you!?
  • The Azure Kubernetes module creates a child resource group for you, and for any other node-pool you want to add to the cluster, you can't create a separate resource group, but rather, you gotta reference the same resource group to create the new node-pool. 🤯

Some of these would have been fine if we weren't promised that IaC tools such as OpenTofu are supposed to protect you from a need to get into the Azure portal and do the manual chores yourself, the same chore the provider should've done for you.

But that's whole point. We were promised that it's all gonna be the responsibility of the underlying provider. That's wrong! At least in the case of Azure. 😥