Skip to content

How to Create Your Ansible Dynamic Inventory for AWS Cloud

Most of the modern software deployment these days benefit from containerization and Kubernetes as the de-facto orchestration platform.

However, occasionally, I find myself in need of some Ansible provisioning and configuration management.

In this blog post, I will share how to create Ansible dynamic inventory in a way that avoids the need to write hard-coded IP addresses of the target hosts.

Introduction

Dynamic Inventory is the technique that uses the cloud provider's API to fetch the IP address, and some of the initial metadata about remote host(s) before sending any request to the target(s).

This will allow us to fetch dynamically allocated private or public IP addresses and use them as ansible_host in the inventory.

In a traditional Ansible setup, you would possibly see a host file like below:

hosts.ini
[aws]
1.2.3.4
5.6.7.8

[azure]
9.10.11.12
13.14.15.16

With the technique of dynamic inventory, not only will we not require to memorize and/or hardcode those IP addresses, it also gives us the advantage and flexibility of keeping our Infrastructure as Code (IaC) agnostic and portable, to a certain extent!

Prerequisites

  • I use Ansible v21 in these examples; ansible-core v2.18 to be explicit, as of writing.
  • You can either follow along, or if you want to create the resources, you will need accounts in the AWS cloud provider.
  • Although provisioning of the remote hosts are not the main aim of this article, I use OpenTofu v1.82 to create those instances.
  • Lastly, I prefer to use Terragrunt v0.x3 as a nice wrapper around TF. This gives me the flexibility to define dependency and use outputs from other stacks.

The directory structure for this mini-project looks like the following:

.
├── ansible
│   ├── ansible.cfg
│   ├── inventory
│   │   ├── cloud.aws_ec2.yml
│   │   └── group_vars
│   │       ├── all.yml
│   │       ├── aws_bastion.yml
│   │       ├── aws_worker.yml
│   │       └── provider_aws.yml
│   └── requirements.txt
├── asg
│   ├── cloud-init.yml
│   ├── main.tf
│   ├── net.tf
│   ├── outputs.tf
│   ├── terragrunt.hcl
│   ├── variables.tf
│   └── versions.tf
└── bastion
│   ├── cloud-init.yml
    ├── main.tf
    ├── net.tf
    ├── outputs.tf
    ├── terragrunt.hcl
    ├── variables.tf
    └── versions.tf

AWS AutoScaling Group (ASG)

At this initial step, I will create an autoscaling group4 with a pre-defined and minimal launch template5 using a cloud-init6 YAML file.

This will include update and upgrading the host on the first boot, and installing the latest available python3 package (as required by our Ansible).

Although not required, I will also create a custom AWS VPC7.

Additionally I will configure the AWS Security Group8 to allow SSH access to only the hosts within the VPC, giving me the peace of mind that secure access is gated behind private network.

For an additional layer of security, one might want to consider deploying AWS VPN9!

With that said, let's roll up our sleeves & get our hands dirty. 🤓

asg/versions.tf
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "< 6"
    }
  }
  required_version = "< 2"
}
asg/variables.tf
variable "tags" {
  type = map(string)
  default = {
    Name        = "worker"
    provisioner = "tofu"
    inventory   = "worker"
    cloud       = "aws"
  }
}
asg/net.tf
data "aws_availability_zones" "available" {
  state = "available"
}

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.17.0"

  name = "ansible"
  cidr = "10.0.0.0/16"

  azs             = data.aws_availability_zones.available.names
  private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  public_subnets  = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]

  # Single NAT Gateway for all Availability Zones
  enable_nat_gateway     = true
  single_nat_gateway     = true
  one_nat_gateway_per_az = false

  tags = var.tags
}

resource "aws_security_group" "this" {
  name   = "worker"
  vpc_id = module.vpc.vpc_id
  tags   = var.tags
}

resource "aws_security_group" "bastion" {
  name   = "trusted-bastion"
  vpc_id = module.vpc.vpc_id
  tags = merge(
    var.tags,
    {
      Name = "trusted-bastion"
    }
  )
}

resource "aws_vpc_security_group_egress_rule" "this" {
  security_group_id = aws_security_group.this.id
  cidr_ipv4         = "0.0.0.0/0"
  ip_protocol       = "-1" # all protocols
  tags              = var.tags
}

resource "aws_vpc_security_group_ingress_rule" "ssh" {
  security_group_id            = aws_security_group.this.id
  from_port                    = 22
  to_port                      = 22
  ip_protocol                  = "tcp"
  referenced_security_group_id = aws_security_group.bastion.id
}
asg/cloud-init.yml
#cloud-config
package_update: true
package_upgrade: true
packages:
  - python3
  - python3-pip
power_state:
  delay: 1
  mode: reboot
  message: Rebooting machine
asg/main.tf
data "aws_ami" "this" {
  most_recent = true

  owners = ["amazon"]

  filter {
    name   = "architecture"
    values = ["arm64"]
  }

  filter {
    name   = "name"
    values = ["al2023-ami-2023*"]
  }
}

locals {
  tls_public_key = file(pathexpand("~/.ssh/ansible-dynamic.pub"))
}

resource "aws_key_pair" "this" {
  key_name   = "tofu"
  public_key = local.tls_public_key
  tags       = var.tags
}

resource "aws_launch_template" "this" {
  name_prefix   = "ansible"
  image_id      = data.aws_ami.this.id
  instance_type = "t4g.nano"
  key_name      = aws_key_pair.this.key_name

  instance_market_options {
    market_type = "spot"
    spot_options {
      max_price = "0.01"
    }
  }

  vpc_security_group_ids = [
    aws_security_group.this.id,
  ]


  user_data = base64encode(file("${path.module}/cloud-init.yml"))

  tags = var.tags
}

resource "aws_autoscaling_group" "this" {
  name_prefix = "ansible"

  capacity_rebalance  = true
  desired_capacity    = 3
  max_size            = 3
  min_size            = 1
  vpc_zone_identifier = module.vpc.private_subnets
  launch_template {
    id      = aws_launch_template.this.id
    version = "$Latest"
  }

  dynamic "tag" {
    for_each = var.tags
    content {
      key                 = tag.key
      value               = tag.value
      propagate_at_launch = true
    }

  }

  timeouts {
    delete = "5m"
  }
}
Generate SSH key pair

The most straightforward way is to use the ssh-keygen command:

ssh-keygen -t rsa -N '' -C 'Ansible Dynamic Inventory' -f ~/.ssh/ansible-dynamic
asg/outputs.tf
output "aws_key_pair_name" {
  value = aws_key_pair.this.key_name
}

output "vpc_id" {
  value = module.vpc.vpc_id
}

output "public_subnets" {
  value = module.vpc.public_subnets
}

output "bastion_nsg_id" {
  value = aws_security_group.bastion.id
}
asg/terragrunt.hcl
inputs = {
}

We create and apply this stack with the following command sequence10:

Bash
export AWS_PROFILE="<your-profile>"
terragrunt init -upgrade
terragrunt plan -out tfplan
terragrunt apply tfplan

Self-Managed Bastion Host

At this step, we will opt for a simple and minimal single instance AWS EC211.

This will be enough for our demo purposes but is surely not a good candidate for production use.

bastion/versions.tf
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "< 6"
    }
  }
  required_version = "< 2"
}
bastion/variables.tf
variable "tags" {
  type = map(string)
  default = {
    Name        = "bastion"
    provisioner = "tofu"
    inventory   = "bastion"
    cloud       = "aws"
  }
}

variable "vpc_id" {
  type     = string
  nullable = false
}

variable "key_pair_name" {
  type     = string
  nullable = false
}

variable "public_subnets" {
  type     = list(string)
  nullable = false
}

variable "bastion_nsg_id" {
  type     = string
  nullable = false
}

The variables above will be specified using the Terragrunt dependency12 block as you will see shortly.

bastion/net.tf
resource "aws_security_group" "this" {
  name   = "bastion"
  vpc_id = var.vpc_id
  tags   = var.tags
}

resource "aws_vpc_security_group_egress_rule" "this" {
  security_group_id = aws_security_group.this.id
  cidr_ipv4         = "0.0.0.0/0"
  ip_protocol       = "-1"
  tags              = var.tags
}

resource "aws_vpc_security_group_ingress_rule" "ssh" {
  from_port         = 22
  to_port           = 22
  ip_protocol       = "tcp"
  cidr_ipv4         = "0.0.0.0/0"
  security_group_id = aws_security_group.this.id
}

resource "aws_eip" "this" {
  instance = aws_instance.this.id
  tags     = var.tags
}
bastion/cloud-init.yml
#cloud-config
package_update: true
package_upgrade: true
packages:
  - python3
  - python3-pip
power_state:
  delay: 1
  mode: reboot
  message: Rebooting machine
bastion/main.tf
data "aws_ami" "this" {
  most_recent = true

  owners = ["amazon"]

  filter {
    name   = "architecture"
    values = ["arm64"]
  }

  filter {
    name   = "name"
    values = ["al2023-ami-2023*"]
  }
}

resource "aws_instance" "this" {
  ami = data.aws_ami.this.id

  instance_market_options {
    market_type = "spot"
    spot_options {
      max_price = "0.01"
    }
  }

  instance_type = "t4g.nano"

  key_name = var.key_pair_name

  subnet_id = var.public_subnets[0]


  user_data = file("${path.module}/cloud-init.yml")

  vpc_security_group_ids = [
    aws_security_group.this.id,
    var.bastion_nsg_id,
  ]

  tags = var.tags
}
bastion/outputs.tf
output "bastion_public_ip" {
  value = aws_eip.this.public_ip
}
bastion/terragrunt.hcl
inputs = {
  bastion_nsg_id = dependency.worker.outputs.bastion_nsg_id
  key_pair_name  = dependency.worker.outputs.aws_key_pair_name
  public_subnets = dependency.worker.outputs.public_subnets
  vpc_id         = dependency.worker.outputs.vpc_id
}

dependency "worker" {
  config_path = "../aws-worker"
}

We apply this just as we did for the ASG stack (no need to repeat ourselves).

Ansible Dynamic Inventory

Now the fun part begins. We have the instances ready, and now can create our inventory files and send requests to the remote hosts.

First step first, we'll create the ansible.cfg file in the ansible directory:

ansible/ansible.cfg
[defaults]
inventory = ./inventory
interpreter_python = auto_silent
fact_caching = ansible.builtin.jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 86400

Awesome! 🥳

We now need to create our AWS EC2 dynamic inventory file13.

ansible/inventory/cloud.aws_ec2.yml
plugin: amazon.aws.aws_ec2

keyed_groups:
  - key: tags.inventory
    prefix: aws
  - key: tags.cloud
    prefix: provider

compose:
  # literal value, as opposed to the otherwise jinja variable
  ansible_user: "'ec2-user'"

Note that the file name should end with .aws_ec2.yml, e.g. example.aws_ec2.yml. Additionally, specifying the plugin attribute is crucial for a reproducible and consistent behavior.

Pay close attention to the keyed_groups section. We'll use those when targeting instances in our Ansible playbooks as well as ad-hoc commands.

As a required step at this point, we need to install some Python libraries.

ansible/requirements.txt
boto3<2
botocore<2
pip install -U pip -r ansible/requirements.txt

Let's go ahead and create a couple of Ansible group_vars files:

ansible/inventory/group_vars/all.yml
ansible_ssh_extra_args: -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=ERROR

The all.yml is a special name which refers to all hosts and the variables inside will be available as Ansible facts14.

ansible/inventory/group_vars/provider_aws.yml
ansible_ssh_private_key_file: ~/.ssh/ansible-dynamic
bastion_host: "{{ hostvars[groups.aws_bastion | random] | to_nice_json | from_json }}"

The bastion_host is a very critical variable which is getting one of the possibly many bastion hosts randomly and using its available facts to get connected to the other remote hosts in the private network (as you will see shortly).

Ansible Groups

Let's explain it step by step:

  1. First, the groups.aws_bastion is resolving to all the remote hosts in the group aws_bastion. This group comes from our earlier keyed_groups where we prefixed aws to every tag named inventory.

    ansible/inventory/cloud.aws_ec2.yml
    keyed_groups:
      - key: tags.inventory
        prefix: aws
    
    bastion/variables.tf
    variable "tags" {
      type = map(string)
      default = {
        Name        = "bastion"
        provisioner = "tofu"
        inventory   = "bastion"
        cloud       = "aws"
      }
    }
    

    The result will be something like the following. Notice the groupings that took place because of how we set the keyed_groups configuration.

    $ ansible-inventory --graph
    @all:
      |--@ungrouped:
      |--@aws_ec2:
      |  |--ip-10-0-2-166.eu-central-1.compute.internal
      |  |--ip-10-0-3-239.eu-central-1.compute.internal
      |  |--ec2-3-69-93-166.eu-central-1.compute.amazonaws.com
      |  |--ip-10-0-1-52.eu-central-1.compute.internal
      |--@aws_worker:
      |  |--ip-10-0-2-166.eu-central-1.compute.internal
      |  |--ip-10-0-3-239.eu-central-1.compute.internal
      |  |--ip-10-0-1-52.eu-central-1.compute.internal
      |--@provider_aws:
      |  |--ip-10-0-2-166.eu-central-1.compute.internal
      |  |--ip-10-0-3-239.eu-central-1.compute.internal
      |  |--ec2-3-69-93-166.eu-central-1.compute.amazonaws.com
      |  |--ip-10-0-1-52.eu-central-1.compute.internal
      |--@aws_bastion:
      |  |--ec2-3-69-93-166.eu-central-1.compute.amazonaws.com
    

    Fun fact: I didn't trim the output of this command. Ansible doesn't close the vertical lines on the left as tree command does! 😁

  2. The groups.aws_bastion will get piped to the random and one will get selected: groups.aws_bastion | random. The result will be Ansible host vars15.

    ansible/inventory/group_vars/provider_aws.yml
    bastion_host: "{{ hostvars[groups.aws_bastion | random] | to_nice_json | from_json }}"
    
  3. We do some unavoidable juggling to produce a dot-accessible Ansible variable from that output. The result will allow us to reference the Ansible Facts14 e.g. bastion_host.ansible_host. You will see this shortly.

Bastion Proxy Jump

In this final step of the preparation, we set the connect address of the bastion to be the public IP address attached to the host (the AWS ElasticIP16), as opposed to the other remote hosts in the VPC where we will use the private IP addresses.

ansible/inventory/group_vars/aws_bastion.yml
ansible_host: "{{ public_ip_address }}"

Notice the value of the ansible_host variable. We will ensure that all the connections to the bastion host are using that public IP address.

It's now time to configure all the other remote hosts in our VPC, this time, we'll use private IP address for connection.

However, we can't directly connect to their private IP address and that's where the bastion host is gonna come in-between, playing as a proxy jump, an extra hop if you will.

Notice the double-quotation of ProxyCommand in the following group vars file17.

ansible/inventory/group_vars/aws_worker.yml
ansible_host: "{{ private_ip_address }}"

ansible_ssh_common_args: >-
  -o ProxyCommand="ssh
  -o StrictHostKeyChecking=no
  -o UserKnownHostsFile=/dev/null
  -o LogLevel=ERROR
  -i {{ bastion_host.ansible_ssh_private_key_file }}
  -W %h:%p
  -q {{ bastion_host.ansible_user }}@{{ bastion_host.ansible_host }}"

Take a close look at how we are using bastion_host.FACT to access all the facts available to us from the bastion remote host.

These facts are all available from the AWS API before we send a single request to any of the target hosts.

To see that for yourself, run ansible-inventory --list in the ansible/ directory.

A JSON formatted output will be displayed, showing all the available facts about the remote hosts, all available through AWS API and before sending any requests to any of the target hosts.

Verify the Setup

Let us do a sample ad-hoc command:

$ ansible -m ping all
ec2-3-69-93-166.eu-central-1.compute.amazonaws.com | SUCCESS => {
    "changed": false,
    "ping": "pong"
}
ip-10-0-2-166.eu-central-1.compute.internal | SUCCESS => {
    "changed": false,
    "ping": "pong"
}
ip-10-0-1-52.eu-central-1.compute.internal | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3.9"
    },
    "changed": false,
    "ping": "pong"
}
ip-10-0-3-239.eu-central-1.compute.internal | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3.9"
    },
    "changed": false,
    "ping": "pong"
}

And that sums it all up.

We wanted to create a dynamic inventory for our AWS cloud, and we did it. 👏

Conclusion

Although the use of Ansible is not as prevalent as it used to be, it may still be crucial to do some configuration management on your target hosts.

Instead of manually adding hard-coded IP addresses to your inventory, Ansible dynamic inventory allows you to use API calls to your cloud provider to fetch metadata and variables about the target hosts.

The end result will be a more flexible and portable IaC, which can be used even if the remote host has been re-imaged or replaced with a new set of variables and facts.

I can definitly see myself coming back to this article in a future. 😉

Until next time, ciao 🤠 & happy coding! 🐧 🦀

If you enjoyed this blog post, consider sharing it with these buttons 👇. Please leave a comment for us at the end, we read & love 'em all. ❣

Share on Share on Share on Share on

Comments