After three years of manually provisioning VMs through ssh and adjusting docker-compose files on the hosts, I finally committed to Infrastructure as Code. April 1, 2024: First commit to the infrastructure repository.

The Problem

Since my first homelab post in November 2020, I’d accumulated a collection of snowflake servers—each one unique, manually configured, and completely undocumented.

The cost:

  • ❌ 4-hour recovery times for failed VMs
  • ❌ Deployment anxiety (one wrong click = broken service)
  • ❌ Zero reproducibility
  • ❌ Tribal knowledge locked in my head

The wake-up call: A hard drive failure on my GitLab VM. I had no backup of the VM configuration. Was it 8GB or 16GB of RAM? What VLAN was it on? Where were the mount points?

The Solution

Every component of my infrastructure is now defined in code. No more clicking through UIs. No more manual SSH sessions.

Two-Layer Architecture

Terraform: Manages external/immutable infrastructure.

  • Cloudflare DNS records and Tunnel

Ansible: Manages mutable state and VM lifecycle.

  • Proxmox VMs (provisioning from cloud-init templates)
  • Docker containers
  • File system configurations (NFS mounts)
  • Service orchestration

Why both? Terraform manages external resources. Ansible handles everything on Proxmox VMs.

Implementation

VM Provisioning with Ansible

Before, I clicked through Proxmox UI screens and hoped I wrote down what I did.

Now, VM specs are defined in YAML:

# vars/vm_specs.yml
vms:
  - name: ruby
    vmId: 920
    target_node: pve5
    cores: 8
    memory: 16384
    disk:
      scsi0: 'nvme-thin:200'
    net:
      net0: 'virtio,bridge=vmbr0,tag=50'
    ipconfig:
      ipconfig0: 'ip=192.168.50.20/24,gw=192.168.50.1'
    tags: ['ansible', 'gitlab-runner']
    clone: 'ubuntu-24.04-server-cloudinit-template'
# roles/hypervisor/tasks/provision_vm.yml
- name: Provision VM from cloud-init template
  community.general.proxmox_kvm:
    api_host: "{{ ansible_host }}"
    node: "{{ inventory_hostname }}"
    name: "{{ vm_name }}"
    vmid: "{{ vm_id }}"
    clone: "{{ vm_clone }}"
    cores: "{{ vm_cores }}"
    memory: "{{ vm_memory }}"
    net: "{{ vm_net }}"
    state: "{{ vm_state }}"

Benefits:

  • ✅ Reproducible (spin up identical VMs from templates)
  • ✅ Version controlled (VM specs in Git)
  • ✅ Self-documenting (vm_specs.yml IS the documentation)
  • ✅ Idempotent (run multiple times safely)

Application Deployment

Before: SSH in, manually install packages, hope nothing breaks.

After:

# roles/platform_one/tasks/gitlab.yml
- name: Ensure GitLab directory structure
  file:
    path: "{{ container_data }}/ruby/gitlab"
    state: directory

- name: Template GitLab docker-compose
  template:
    src: gitlab-docker-compose.yml.j2
    dest: "{{ container_data }}/ruby/gitlab/docker-compose.yml"
  notify: restart gitlab

- name: Deploy GitLab container
  community.docker.docker_compose_v2:
    project_src: "{{ container_data }}/ruby/gitlab"
    state: present

Run once: ansible-playbook main_playbook.yml --tags gitlab --limit ruby

Terraform → Ansible Integration

Terraform outputs (VM IPs, bucket names) flow into Ansible via a generated vars file:

# terraform/outputs.tf
resource "local_file" "ansible_vars" {
  filename = "${path.root}/../../vars/tf_ansible_vars_file.yml"
  content  = yamlencode({
    minio_endpoint = minio_s3_bucket.backups.endpoint
    vault_addr     = vault_auth_backend.oidc.path
  })
}

Ansible consumes this automatically:

# playbook.yml
- hosts: all
  vars_files:
    - vars/tf_ansible_vars_file.yml

The Results

Full environment rebuild:

$ cd terraform/1-infrastructure && terraform apply
$ ansible-playbook main_playbook.yml

That’s it. Every VM, every service, every configuration restored from code.

This was the foundation. Terraform came later in December 2024. For now, Ansible handled everything—VM provisioning, Docker deployments, configuration management. The infrastructure repo was born on April 1, 2024. What followed was rapid iteration.