How to Create Your Ansible Dynamic Inventory for AWS Cloud¶
Most of the modern software deployment these days benefit from containerization and Kubernetes as the de-facto orchestration platform.
However, occasionally, I find myself in need of some Ansible provisioning and configuration management.
In this blog post, I will share how to create Ansible dynamic inventory in a way that avoids the need to write hard-coded IP addresses of the target hosts.
Introduction¶
Dynamic Inventory is the technique that uses the cloud provider's API to fetch the IP address, and some of the initial metadata about remote host(s) before sending any request to the target(s).
This will allow us to fetch dynamically allocated private or public IP addresses and use them as ansible_host
in the inventory.
In a traditional Ansible setup, you would possibly see a host file like below:
With the technique of dynamic inventory, not only will we not require to memorize and/or hardcode those IP addresses, it also gives us the advantage and flexibility of keeping our Infrastructure as Code (IaC) agnostic and portable, to a certain extent!
Prerequisites¶
- I use Ansible v21 in these examples;
ansible-core
v2.18 to be explicit, as of writing. - You can either follow along, or if you want to create the resources, you will need accounts in the AWS cloud provider.
- Although provisioning of the remote hosts are not the main aim of this article, I use OpenTofu v1.82 to create those instances.
- Lastly, I prefer to use Terragrunt
v0.x
3 as a nice wrapper around TF. This gives me the flexibility to define dependency and use outputs from other stacks.
The directory structure for this mini-project looks like the following:
.
├── ansible
│ ├── ansible.cfg
│ ├── inventory
│ │ ├── cloud.aws_ec2.yml
│ │ └── group_vars
│ │ ├── all.yml
│ │ ├── aws_bastion.yml
│ │ ├── aws_worker.yml
│ │ └── provider_aws.yml
│ └── requirements.txt
├── asg
│ ├── cloud-init.yml
│ ├── main.tf
│ ├── net.tf
│ ├── outputs.tf
│ ├── terragrunt.hcl
│ ├── variables.tf
│ └── versions.tf
└── bastion
│ ├── cloud-init.yml
├── main.tf
├── net.tf
├── outputs.tf
├── terragrunt.hcl
├── variables.tf
└── versions.tf
AWS AutoScaling Group (ASG)¶
At this initial step, I will create an autoscaling group4 with a pre-defined and minimal launch template5 using a cloud-init6 YAML file.
This will include update and upgrading the host on the first boot, and installing the latest available python3
package (as required by our Ansible).
Although not required, I will also create a custom AWS VPC7.
Additionally I will configure the AWS Security Group8 to allow SSH access to only the hosts within the VPC, giving me the peace of mind that secure access is gated behind private network.
For an additional layer of security, one might want to consider deploying AWS VPN9!
With that said, let's roll up our sleeves & get our hands dirty.
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "< 6"
}
}
required_version = "< 2"
}
variable "tags" {
type = map(string)
default = {
Name = "worker"
provisioner = "tofu"
inventory = "worker"
cloud = "aws"
}
}
data "aws_availability_zones" "available" {
state = "available"
}
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.17.0"
name = "ansible"
cidr = "10.0.0.0/16"
azs = data.aws_availability_zones.available.names
private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
# Single NAT Gateway for all Availability Zones
enable_nat_gateway = true
single_nat_gateway = true
one_nat_gateway_per_az = false
tags = var.tags
}
resource "aws_security_group" "this" {
name = "worker"
vpc_id = module.vpc.vpc_id
tags = var.tags
}
resource "aws_security_group" "bastion" {
name = "trusted-bastion"
vpc_id = module.vpc.vpc_id
tags = merge(
var.tags,
{
Name = "trusted-bastion"
}
)
}
resource "aws_vpc_security_group_egress_rule" "this" {
security_group_id = aws_security_group.this.id
cidr_ipv4 = "0.0.0.0/0"
ip_protocol = "-1" # all protocols
tags = var.tags
}
resource "aws_vpc_security_group_ingress_rule" "ssh" {
security_group_id = aws_security_group.this.id
from_port = 22
to_port = 22
ip_protocol = "tcp"
referenced_security_group_id = aws_security_group.bastion.id
}
#cloud-config
package_update: true
package_upgrade: true
packages:
- python3
- python3-pip
power_state:
delay: 1
mode: reboot
message: Rebooting machine
data "aws_ami" "this" {
most_recent = true
owners = ["amazon"]
filter {
name = "architecture"
values = ["arm64"]
}
filter {
name = "name"
values = ["al2023-ami-2023*"]
}
}
locals {
tls_public_key = file(pathexpand("~/.ssh/ansible-dynamic.pub"))
}
resource "aws_key_pair" "this" {
key_name = "tofu"
public_key = local.tls_public_key
tags = var.tags
}
resource "aws_launch_template" "this" {
name_prefix = "ansible"
image_id = data.aws_ami.this.id
instance_type = "t4g.nano"
key_name = aws_key_pair.this.key_name
instance_market_options {
market_type = "spot"
spot_options {
max_price = "0.01"
}
}
vpc_security_group_ids = [
aws_security_group.this.id,
]
user_data = base64encode(file("${path.module}/cloud-init.yml"))
tags = var.tags
}
resource "aws_autoscaling_group" "this" {
name_prefix = "ansible"
capacity_rebalance = true
desired_capacity = 3
max_size = 3
min_size = 1
vpc_zone_identifier = module.vpc.private_subnets
launch_template {
id = aws_launch_template.this.id
version = "$Latest"
}
dynamic "tag" {
for_each = var.tags
content {
key = tag.key
value = tag.value
propagate_at_launch = true
}
}
timeouts {
delete = "5m"
}
}
Generate SSH key pair
The most straightforward way is to use the ssh-keygen
command:
output "aws_key_pair_name" {
value = aws_key_pair.this.key_name
}
output "vpc_id" {
value = module.vpc.vpc_id
}
output "public_subnets" {
value = module.vpc.public_subnets
}
output "bastion_nsg_id" {
value = aws_security_group.bastion.id
}
We create and apply this stack with the following command sequence10:
export AWS_PROFILE="<your-profile>"
terragrunt init -upgrade
terragrunt plan -out tfplan
terragrunt apply tfplan
Self-Managed Bastion Host¶
At this step, we will opt for a simple and minimal single instance AWS EC211.
This will be enough for our demo purposes but is surely not a good candidate for production use.
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "< 6"
}
}
required_version = "< 2"
}
variable "tags" {
type = map(string)
default = {
Name = "bastion"
provisioner = "tofu"
inventory = "bastion"
cloud = "aws"
}
}
variable "vpc_id" {
type = string
nullable = false
}
variable "key_pair_name" {
type = string
nullable = false
}
variable "public_subnets" {
type = list(string)
nullable = false
}
variable "bastion_nsg_id" {
type = string
nullable = false
}
The variables above will be specified using the Terragrunt dependency12 block as you will see shortly.
resource "aws_security_group" "this" {
name = "bastion"
vpc_id = var.vpc_id
tags = var.tags
}
resource "aws_vpc_security_group_egress_rule" "this" {
security_group_id = aws_security_group.this.id
cidr_ipv4 = "0.0.0.0/0"
ip_protocol = "-1"
tags = var.tags
}
resource "aws_vpc_security_group_ingress_rule" "ssh" {
from_port = 22
to_port = 22
ip_protocol = "tcp"
cidr_ipv4 = "0.0.0.0/0"
security_group_id = aws_security_group.this.id
}
resource "aws_eip" "this" {
instance = aws_instance.this.id
tags = var.tags
}
#cloud-config
package_update: true
package_upgrade: true
packages:
- python3
- python3-pip
power_state:
delay: 1
mode: reboot
message: Rebooting machine
data "aws_ami" "this" {
most_recent = true
owners = ["amazon"]
filter {
name = "architecture"
values = ["arm64"]
}
filter {
name = "name"
values = ["al2023-ami-2023*"]
}
}
resource "aws_instance" "this" {
ami = data.aws_ami.this.id
instance_market_options {
market_type = "spot"
spot_options {
max_price = "0.01"
}
}
instance_type = "t4g.nano"
key_name = var.key_pair_name
subnet_id = var.public_subnets[0]
user_data = file("${path.module}/cloud-init.yml")
vpc_security_group_ids = [
aws_security_group.this.id,
var.bastion_nsg_id,
]
tags = var.tags
}
inputs = {
bastion_nsg_id = dependency.worker.outputs.bastion_nsg_id
key_pair_name = dependency.worker.outputs.aws_key_pair_name
public_subnets = dependency.worker.outputs.public_subnets
vpc_id = dependency.worker.outputs.vpc_id
}
dependency "worker" {
config_path = "../aws-worker"
}
We apply this just as we did for the ASG stack (no need to repeat ourselves).
Ansible Dynamic Inventory¶
Now the fun part begins. We have the instances ready, and now can create our inventory files and send requests to the remote hosts.
First step first, we'll create the ansible.cfg
file in the ansible
directory:
[defaults]
inventory = ./inventory
interpreter_python = auto_silent
fact_caching = ansible.builtin.jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 86400
Awesome!
We now need to create our AWS EC2 dynamic inventory file13.
plugin: amazon.aws.aws_ec2
keyed_groups:
- key: tags.inventory
prefix: aws
- key: tags.cloud
prefix: provider
compose:
# literal value, as opposed to the otherwise jinja variable
ansible_user: "'ec2-user'"
Note that the file name should end with .aws_ec2.yml
, e.g. example.aws_ec2.yml
. Additionally, specifying the plugin
attribute is crucial for a reproducible and consistent behavior.
Pay close attention to the keyed_groups
section. We'll use those when targeting instances in our Ansible playbooks as well as ad-hoc commands.
As a required step at this point, we need to install some Python libraries.
Let's go ahead and create a couple of Ansible group_vars
files:
ansible_ssh_extra_args: -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=ERROR
The all.yml
is a special name which refers to all hosts and the variables inside will be available as Ansible facts14.
ansible_ssh_private_key_file: ~/.ssh/ansible-dynamic
bastion_host: "{{ hostvars[groups.aws_bastion | random] | to_nice_json | from_json }}"
The bastion_host
is a very critical variable which is getting one of the possibly many bastion hosts randomly and using its available facts to get connected to the other remote hosts in the private network (as you will see shortly).
Ansible Groups¶
Let's explain it step by step:
-
First, the
groups.aws_bastion
is resolving to all the remote hosts in the groupaws_bastion
. This group comes from our earlierkeyed_groups
where we prefixedaws
to every tag namedinventory
.bastion/variables.tfvariable "tags" { type = map(string) default = { Name = "bastion" provisioner = "tofu" inventory = "bastion" cloud = "aws" } }
The result will be something like the following. Notice the groupings that took place because of how we set the
keyed_groups
configuration.$ ansible-inventory --graph @all: |--@ungrouped: |--@aws_ec2: | |--ip-10-0-2-166.eu-central-1.compute.internal | |--ip-10-0-3-239.eu-central-1.compute.internal | |--ec2-3-69-93-166.eu-central-1.compute.amazonaws.com | |--ip-10-0-1-52.eu-central-1.compute.internal |--@aws_worker: | |--ip-10-0-2-166.eu-central-1.compute.internal | |--ip-10-0-3-239.eu-central-1.compute.internal | |--ip-10-0-1-52.eu-central-1.compute.internal |--@provider_aws: | |--ip-10-0-2-166.eu-central-1.compute.internal | |--ip-10-0-3-239.eu-central-1.compute.internal | |--ec2-3-69-93-166.eu-central-1.compute.amazonaws.com | |--ip-10-0-1-52.eu-central-1.compute.internal |--@aws_bastion: | |--ec2-3-69-93-166.eu-central-1.compute.amazonaws.com
Fun fact: I didn't trim the output of this command. Ansible doesn't close the vertical lines on the left as
tree
command does! -
The
groups.aws_bastion
will get piped to therandom
and one will get selected:groups.aws_bastion | random
. The result will be Ansible host vars15. -
We do some unavoidable juggling to produce a dot-accessible Ansible variable from that output. The result will allow us to reference the Ansible Facts14 e.g.
bastion_host.ansible_host
. You will see this shortly.
Bastion Proxy Jump¶
In this final step of the preparation, we set the connect address of the bastion to be the public IP address attached to the host (the AWS ElasticIP16), as opposed to the other remote hosts in the VPC where we will use the private IP addresses.
Notice the value of the ansible_host
variable. We will ensure that all the connections to the bastion host are using that public IP address.
It's now time to configure all the other remote hosts in our VPC, this time, we'll use private IP address for connection.
However, we can't directly connect to their private IP address and that's where the bastion host is gonna come in-between, playing as a proxy jump, an extra hop if you will.
Notice the double-quotation of ProxyCommand
in the following group vars file17.
ansible_host: "{{ private_ip_address }}"
ansible_ssh_common_args: >-
-o ProxyCommand="ssh
-o StrictHostKeyChecking=no
-o UserKnownHostsFile=/dev/null
-o LogLevel=ERROR
-i {{ bastion_host.ansible_ssh_private_key_file }}
-W %h:%p
-q {{ bastion_host.ansible_user }}@{{ bastion_host.ansible_host }}"
Take a close look at how we are using bastion_host.FACT
to access all the facts available to us from the bastion remote host.
These facts are all available from the AWS API before we send a single request to any of the target hosts.
To see that for yourself, run ansible-inventory --list
in the ansible/
directory.
A JSON formatted output will be displayed, showing all the available facts about the remote hosts, all available through AWS API and before sending any requests to any of the target hosts.
Verify the Setup¶
Let us do a sample ad-hoc command:
$ ansible -m ping all
ec2-3-69-93-166.eu-central-1.compute.amazonaws.com | SUCCESS => {
"changed": false,
"ping": "pong"
}
ip-10-0-2-166.eu-central-1.compute.internal | SUCCESS => {
"changed": false,
"ping": "pong"
}
ip-10-0-1-52.eu-central-1.compute.internal | SUCCESS => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python3.9"
},
"changed": false,
"ping": "pong"
}
ip-10-0-3-239.eu-central-1.compute.internal | SUCCESS => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python3.9"
},
"changed": false,
"ping": "pong"
}
And that sums it all up.
We wanted to create a dynamic inventory for our AWS cloud, and we did it.
Conclusion¶
Although the use of Ansible is not as prevalent as it used to be, it may still be crucial to do some configuration management on your target hosts.
Instead of manually adding hard-coded IP addresses to your inventory, Ansible dynamic inventory allows you to use API calls to your cloud provider to fetch metadata and variables about the target hosts.
The end result will be a more flexible and portable IaC, which can be used even if the remote host has been re-imaged or replaced with a new set of variables and facts.
I can definitly see myself coming back to this article in a future.
Until next time, ciao & happy coding!
If you enjoyed this blog post, consider sharing it with these buttons . Please leave a comment for us at the end, we read & love 'em all.
Share on Share on Share on Share on
-
https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html ↩
-
https://docs.aws.amazon.com/autoscaling/ec2/userguide/auto-scaling-groups.html ↩
-
https://docs.aws.amazon.com/autoscaling/ec2/userguide/launch-templates.html ↩
-
https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html ↩
-
https://docs.aws.amazon.com/vpc/latest/userguide/vpc-security-groups.html ↩
-
https://docs.aws.amazon.com/vpc/latest/userguide/vpn-connections.html ↩
-
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts.html ↩
-
https://terragrunt.gruntwork.io/docs/reference/config-blocks-and-attributes/#dependency ↩
-
https://docs.ansible.com/ansible/latest/collections/amazon/aws/docsite/aws_ec2_guide.html ↩
-
https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_vars_facts.html ↩↩
-
https://docs.ansible.com/ansible/latest/inventory_guide/intro_inventory.html ↩
-
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html ↩
-
https://www.adainese.it/blog/2022/10/30/ansible-with-bastion-host/ ↩