Trying to automate Talos Linux deployment with Terraform
So I heard about Talos Linux from the interwebs and figured I’d give it a shot in my homelab. Something about an immutable OS just for Kubernetes? Sure, why not.
My goal: get a Talos cluster running on Proxmox using only Terraform. No manual clicking around, I’m too lazy. Should be simple enough, just like my neovim config. (which I use btw)
What Even Is Talos Linux?
Talos Linux is basically a stripped-down Linux that only runs Kubernetes. The weird parts:
- No SSH access (takes some getting used to in a homelab where you want to control everything)
- Read-only filesystem
- Everything happens through APIs
- More secure because there’s less to attack
The basic idea: less stuff = less problems. Makes sense, I guess. (Unless it’s Neovim)
My Setup
Running this on a Proxmox server in my basement. Nothing fancy:
- 1 control plane node (4 CPU, 8GB RAM)
- 2 worker nodes (4 CPU, 12GB RAM each)
- Static IPs because DHCP in my homelab is… flaky
Terraform for everything because clicking through UIs gets old.
The Terraform Approach I Settled On
After trying a bunch of different approaches (and failing), here’s what worked:
- Proxmox provider for creating VMs
- Talos provider for configuring the cluster
- OnePassword provider for secrets (I use 1Password for everything)
- Local provider for saving kubeconfig files
Not saying this is the best way. Just what finally worked.
The Version Gotcha That Cost Me Hours
This one burned me: Talos changed their image types between versions. Version 1.7.x has “nocloud” images that work with cloud-init. Version 1.8.0+? They switched to “metal” images that don’t.
Why does this matter? If you want static IPs through Terraform, you need cloud-init support. Without it, you’re stuck doing manual network config. So much for automation.
Sticking with 1.7.6:
variable "talos_version" {
description = "Talos Linux version to use"
type = string
default = "1.7.6" # Latest 1.7.x for nocloud support
}
There’s probably a better way with newer versions. Haven’t found it.
The Template Approach
I split my Terraform into modules. The template module creates a reusable VM template:
module "talos_template" {
source = "../../modules/talos-template"
template_vm_id = 9200
talos_version = "1.7.6"
proxmox_node = "proxmox"
vm_storage_pool = "local-lvm" # Adjust for your Proxmox storage
}
The node module clones from the template:
module "control_plane" {
source = "../../modules/talos-node"
template_vm_id = module.talos_template.template_id
node_name = "homelab-cp"
node_role = "controlplane"
vm_id = 200
# Static IP configuration
ip_address = "10.0.20.10"
subnet_mask = 24
gateway = "10.0.20.1"
# Required Talos configuration
cluster_name = var.cluster_name
talos_client_config = talos_machine_secrets.cluster.client_configuration
machine_config = data.talos_machine_configuration.control_plane.machine_configuration
}
Works reliably enough.
The File Upload Problem
The Talos images are 1.2GB compressed. Uploading through the Proxmox API? Timeouts. Failures. Pain.
I ended up using SSH and rsync:
# Download and decompress locally
curl -fsSL "https://github.com/siderolabs/talos/releases/download/v${var.talos_version}/nocloud-amd64.raw.xz" \
-o "/tmp/talos-${var.talos_version}-nocloud-amd64.raw.xz"
xz -d "/tmp/talos-${var.talos_version}-nocloud-amd64.raw.xz"
# Upload via rsync with progress
rsync -avz --progress "/tmp/talos-${var.talos_version}-nocloud-amd64.raw" \
root@${var.proxmox_node}:/tmp/
Not pretty but it works.
Getting kubectl and talosctl Working
Got the Terraform to automatically configure kubectl and talosctl. It merges the new cluster config with my existing kubeconfig:
resource "null_resource" "kubeconfig_merge" {
provisioner "local-exec" {
command = <<-EOT
# Backup existing config (learned this one the hard way)
cp ~/.kube/config ~/.kube/config.backup.$(date +%Y%m%d_%H%M%S)
# Merge configs
KUBECONFIG="$HOME/.kube/config:${path.root}/kubeconfig" \
kubectl config view --flatten > /tmp/kubeconfig.merged
# Validate merge worked, then replace
if kubectl --kubeconfig=/tmp/kubeconfig.merged config get-contexts | grep -q "${var.cluster_name}"; then
mv /tmp/kubeconfig.merged ~/.kube/config
fi
EOT
}
}
View full kubeconfig merge logic
That backup step? Added after I nuked my kubeconfig. Don’t skip it.
What’s Running on It Now
Got a pretty full stack running now:
- MetalLB for load balancer services (bare metal needs help with LoadBalancer types)
- Prometheus and Grafana for monitoring (because you can’t manage what you can’t see)
- cert-manager for SSL certificates (Let’s Encrypt automation)
- Istio service mesh (probably overkill but it’s cool)
- CloudNativePG for PostgreSQL (way better than managing databases manually)
- Local Path Provisioner for storage (just local volumes but it works)
Turns out once you get the base cluster working, adding stuff with Helm is pretty straightforward.
Performance Tweaks I Found
Made a few tweaks based on random blog posts. Not sure if they help, but they don’t hurt:
# CPU type for better container performance (supposedly)
cpu_type = "x86-64-v2-AES"
# Disable memory ballooning
balloon = 0
The memory ballooning thing: Proxmox tries to be clever about memory but it can mess with Kubernetes.
Things That Take Getting Used To
Talos does things differently than traditional Linux:
- No SSH access feels wrong at first. Yeah, security, but debugging requires learning new approaches
- Certificate stuff just… happens automatically. Nice but takes trust
- Immutable filesystem is cool in theory but changes how you think about troubleshooting
What I’d Do Differently
Next time:
- Start with one node to understand Talos first
- Actually read the docs before diving in
- Maybe try k3s first since it’s simpler
- Set up monitoring from the start
Is This Overkill for a Homelab?
Absolutely. Could do most of this with k3s or Docker Compose. But where’s the fun in that?
Wrapping Up
Getting Talos running with Terraform was more work than expected. Version issues, file upload problems, and Talos complexity made this a multi-weekend project.
But it works! Can tear down and rebuild with terraform apply
. Not perfect, probably better ways to do it, but it’s mine.
If you try something similar:
- Start simple
- Talos 1.7.x vs 1.8.x matters for cloud-init
- Back up your kubeconfig
- Expect frustration
Helpful Resources
Docs that actually helped:
- Talos Linux Documentation - Pretty good once you get the basics
- Proxmox Provider Docs - Essential
- Talos Provider Docs - Sparse but necessary
- Complete Terraform Code - The full working implementation
If you build something similar, let me know what worked. Still figuring this out.
Comments