19 February 2026

From Click-Ops to Roles: Terraforming My Proxmox Platform

I automated forgetting how to create VMs. A shift from manual provisioning to declarative, role-based platform thinking on Proxmox.

platform-engineering terraform proxmox infrastructure homelab-platform

This is part 1 in a 3 part series.

Not every homelab is a sandpit.

Mine sits closer to a small data centre than an experiment bench. It runs things I depend on and things others depend on.

Which changes what good infrastructure management looks like.

This Host Runs Real Services

GitLab. Containers. Monitoring. My portfolio. Community infrastructure. Internal tooling. Backups.

When I rebuild something, it matters.

For a long time, provisioning a new VM meant:

Clone template
Set CPU and RAM
Attach storage
Configure networking
Inject SSH key
Start
Hope I did not miss a setting

It worked. It was not broken.

But it was fragile.

And fragile systems do not scale cognitively.

The Constraint Set

This is not a cloud environment.

Single Proxmox node
Thin LVM storage
pfSense upstream
Flat network with segmented services
Linked clones for efficiency

No managed control plane. No autoscaling groups. No cloud safety net.

Just deliberate engineering.

Those constraints forced clarity. Every design choice has weight when there is no abstraction layer hiding it.

The Problem Was Not Speed

I can provision a VM manually in five minutes.

The issue was repetition.

The real problem was this:

Provisioning lived in my head.

The template ID. The correct bridge. The RAM profile for Docker hosts vs GitLab. Which disk size I normally assign. Every one of those decisions was correct, but none of them were written down.

That knowledge was implicit.

Implicit knowledge is operational risk.

The Mental Shift: Machines vs Roles

The turning point was subtle.

I stopped thinking in terms of VMs and started thinking in roles.

Not:

VM 200, 2 cores, 4GB RAM

But:

docker_host gitlab k3s_control monitoring

Each role carries intent:

Resource profile
Storage expectations
Network assumptions
Future configuration layering

I was no longer provisioning machines.

I was declaring responsibilities.

Encoding Intent with Terraform

I refactored into a reusable module and began expressing infrastructure declaratively.

Instead of manually creating a Docker host, I now declare it:

module "docker_host" {
  source         = "./modules/vm"
  vm_id          = 200
  name           = "docker-host"
  template_id    = local.template_id
  cores          = 2
  memory         = 4096
  disk_size      = 40
  bridge         = "vmbr0"
  ip_address     = "192.168.2.50/24"
  dns_servers    = ["192.168.2.1"]
  ssh_public_key = local.ssh_key
}

module "gitlab" {
  source         = "./modules/vm"
  vm_id          = 201
  name           = "gitlab"
  template_id    = local.template_id
  cores          = 4
  memory         = 8192
  disk_size      = 80
  bridge         = "vmbr0"
  ip_address     = "192.168.2.51/24"
  dns_servers    = ["192.168.2.1"]
  ssh_public_key = local.ssh_key
}

Two roles. Same module. Different intent. The shape of the infrastructure is visible in the code itself.

Underneath, the module handles:

Linked clone from cloud-init template
Storage mapping
Network attachment
Deterministic configuration

If I destroy and reapply, the outcome is predictable.

That predictability is the real feature.

The full module and configuration are available on GitHub.

What Changed

Provisioning time dropped, but that was not the main win.

The real gains were:

Rebuild confidence
Reduced cognitive load
Clear separation between chassis and role
Faster experimentation

I can now:

Destroy a service VM without hesitation
Iterate on templates safely
Standardise resource profiles
Spin up test roles in seconds

The infrastructure is no longer memory dependent.

It is declared.

What Went Wrong

This was not frictionless.

Along the way I hit:

Provider quirks between Telmate and BPG
Cloud-init disk attachment issues
Permission errors on thin LVM volumes
Template lifecycle confusion

The provider choice alone cost me hours. I started with the Telmate provider because most tutorials reference it, but hit issues with cloud-init disk handling and inconsistent plan diffs. Switching to the BPG provider resolved most of those problems, but meant rewriting every resource block. If you are starting fresh, save yourself the detour and go straight to BPG.

Each issue forced me deeper into how Proxmox actually works:

How disks are mapped at the block layer
What happens during clone operations
Why cloud-init ISO generation can fail
Where Terraform state becomes authoritative

I now understand my hypervisor better than I did when everything was manual.

Automation did not abstract the system away.

It made me confront it properly.

The Platform Perspective

This is still a single node.

There is no HA. No distributed storage. No control plane redundancy.

But the thinking has shifted.

Infrastructure is now:

Declarative
Role-based
Rebuildable
Evolvable

The next phase is layering configuration management and eventually Kubernetes patterns on top.

But this step mattered.

Because platform engineering is not about scale first.

It is about repeatability first.

Why This Matters Beyond My Basement

Declarative infrastructure is not exclusive to AWS or Azure.

It is a pattern of thinking.

When infrastructure intent is encoded:

Failure becomes recoverable
Change becomes deliberate
Systems become testable

I did not automate VM creation.

I automated forgetting how to create VMs.

And that is the real shift.

The Roadmap

Phase 1: Declarative VM provisioning ✓
Phase 2: Role abstraction and resource profiling ✓

Still to come:

Phase 3: Configuration layering with Ansible: roles declared in Terraform drive Ansible inventory, turning declared intent into applied configuration
Phase 4: Kubernetes integration: a k3s cluster bootstrapped from Ansible, control plane and workers resolved from the same role structure defined in Terraform
Phase 5: Observability and policy enforcement: Prometheus/Grafana stack, resource budgets, drift detection

The foundation is solid.

The work continues.

The moment you can destroy and recreate without hesitation, your relationship with your platform changes.

Mine did.

Comments load on request because GitHub may set cookies. See the privacy policy.

Remember my choice

Previous ← Self-Hosted GitLab CI/CD: Building My Own Deployment Platform Next Roles All the Way Down →