24 February 2026
Roles All the Way Down
The roles declared in Terraform were designed to feed Ansible. Ansible was built to stand up k3s. This is how a three-node cluster assembles itself from a single playbook run.
This is part 2 in a 3 part series. The last post covered how I brought Terraform into my self-hosted production environment. VMs went from click-ops to declared infrastructure. Cloned from a template, assigned a static IP, given a role name, done.
But the role name was still just a name. k3s_control in a Terraform module told anyone reading the code what the machine was supposed to be. It didn't do anything about it.
This post is about closing that gap: introducing a roles structure in Terraform that Ansible could consume, so that machines could be configured for what they were supposed to do. Specifically, to stand up a Kubernetes cluster.
The Design Decision
The previous post ended with a roadmap. Phase 3 was Ansible. Phase 4 was Kubernetes. But phases on paper don't tell you how the layers connect.
The connection needed to be designed, not improvised. Specifically: Ansible needs to know which machines exist and what groups they belong to so it can run the configuration in a deliberate fasion. If that information lives in one place in Terraform and a separate hosts.yml in Ansible, you will eventually have two sources of truth. They will drift apart, and drift is the enemy of platforms.
So the Terraform VM definitions got two new fields:
k3s-cp-01 = {
vm_id = 210
ip_address = "192.168.2.50/24"
tier = "k3s"
role = "control"
cores = 4
memory = 8192
disk_size = 20
}
k3s-wk-01 = {
vm_id = 211
ip_address = "192.168.2.61/24"
tier = "k3s"
role = "workers"
}
k3s-wk-02 = {
vm_id = 212
ip_address = "192.168.2.62/24"
tier = "k3s"
role = "workers"
}
So we have tier and role. No, not documentation. Not naming conventions. Structured fields with a specific purpose: to be consumed by the next layer, Ansible.
The Interface Between Layers
A Terraform template renders those fields directly into an Ansible inventory:
all:
children:
%{ for tier in distinct([for h in hosts : h.tier]) ~}
${tier}_hosts:
children:
%{ for role in distinct([for h in hosts : h.role if h.tier == tier]) ~}
${tier}_${role}:
hosts:
%{ for name, host in hosts ~}
%{ if host.tier == tier && host.role == role ~}
${name}:
ansible_host: ${replace(host.ip_address, "/24", "")}
ansible_user: ubuntu
ansible_python_interpreter: /usr/bin/python3
%{ endif ~}
%{ endfor ~}
%{ endfor ~}
%{ endfor ~}
tier = "k3s" and role = "control" produce the k3s_control Ansible group. tier = "k3s" and role = "workers" produce k3s_workers. The inventory is generated at terraform apply time. It doesn't exist independently. It's derived.
There is one source of truth for what machines exist and what they are. Ansible reads from it; it doesn't maintain a parallel copy. Adding a new VM to locals.tf automatically puts it in the right Ansible group.
This is the interface between layers, and it was the part that needed to exist before Ansible could do anything useful.
What Ansible Does With It
Once the inventory was sorted, Ansible could be structured around it. The playbook runs in sequence:
- name: Apply base configuration to all servers
hosts: all
become: true
roles:
- base_linux
- name: Install K3s control plane
hosts: k3s_control
become: true
roles:
- k3s_control_plane
- name: Join K3s workers
hosts: k3s_workers
become: true
roles:
- k3s_workers
Every node gets base_linux first. The platform baseline. Then the cluster roles run against their respective groups. The same groups that came from tier and role.
base_linux is the contract every machine must satisfy: a platform user created with a known SSH key and passwordless sudo, base utilities installed, SSH password auth disabled, hostname set. Whatever else a machine will become, it starts here.
k3s_control_plane is specific: disable swap (Kubernetes requires it), install k3s with the bundled Traefik disabled (there's already a Traefik instance at the network level, so no need to duplicate ingress inside the cluster), wait for the API to become ready, read the node token and set it as a host fact.
That last step matters. The workers need the token to join the cluster:
- name: Set join token fact
set_fact:
k3s_join_token: ""
- name: Install K3s worker node
shell: |
curl -sfL https://get.k3s.io | \
K3S_URL=https://:6443 \
K3S_TOKEN= \
sh -
The worker role looks up the token from the control plane's host variables and uses it to join. No manual token copying. No shared secrets file. The playbook runs sequentially (control plane first, workers after) and the cluster forms.
Cluster formation is declared. You run the playbook; the cluster assembles itself.
Why the Roles Structure Matters
The original Terraform setup could provision a VM named k3s-control. It couldn't make that VM a control plane. Naming something and making it what it's named are different operations. The roles structure closes that gap.
The fields in locals.tf aren't just metadata. They're the vocabulary that Ansible operates on. When you add a new worker to the Terraform definition, it appears in the k3s_workers Ansible group automatically, and the next playbook run will configure it, join it to the cluster, and it will be available for workloads. The work is in declaring the machine. The configuration follows.
This is what it means for infrastructure to be composable: each layer produces outputs the next layer can consume, and the contract between them is explicit and machine-readable. Terraform's job is to make VMs exist. Ansible's job is to make them what they're supposed to be.
The roles structure was the thing that made one layer's output useful to the next.
Where Things Stand
- Phase 1 ✓ Declarative VM provisioning with Terraform
- Phase 2 ✓ Role abstraction and resource profiles
- Phase 3 ✓ Ansible configuration management: baseline and cluster roles
- Phase 4 ✓ Kubernetes via k3s: control plane and workers operational
- Phase 5: Observability: Prometheus, Grafana, alerting
Observability is next. A cluster that can't be inspected has the same problem as infrastructure that can't be reproduced: the knowledge is implicit and the failure modes are invisible until they aren't.
GitHub
- terraform-proxmox — VM provisioning, role and tier definitions, and Ansible inventory generation
- ansible-proxmox — Baseline configuration, k3s control plane, and worker node roles
Comments load on request because GitHub may set cookies. See the privacy policy.