The Proxmox Utility Toolkit: Stop Cloning That VM by Hand

If you’ve been running Proxmox for more than six months, you’ve typed some version of qm clone followed by a bunch of flags you half-remember from the last time, missed a step, and spent twenty minutes wondering why cloud-init isn’t picking up your IP. You’re not alone. I’ve done it enough times that I considered making it a cardio routine.

After enough repetitions, the sensible answer isn’t “memorize the flags better.” It’s to build a toolkit, document it properly, and stop trusting your past self’s memory. So that’s what I did — and now it lives at github.com/dereklarmstrong/proxmox.

🎯 Key Takeaways

Clone any cloud-init VM — Ubuntu 24.04, Oracle Linux 9 — with a single command, static IP and SSH key pre-configured, no copy-paste required.
Full backup coverage out of the box: per-VM, bulk, GFS retention policies, and a status report so you actually know what’s protected.
GPU passthrough documentation for NVIDIA CUDA and AMD ROCm in one place — AI/ML and gaming paths covered.
Structured learning paths from beginner to expert so the toolkit grows with you rather than overwhelming you on day one.
This is a learning playground, not a production blueprint. Complexity is a liability. The scripts exist to build skills, not to run your business.

🗂️ What the Repository Actually Is

The Proxmox Utility Toolkit is a collection of shell scripts and documentation targeting Proxmox VE 8.x on Debian 12. It isn’t an abstraction layer, and it isn’t trying to replace the Proxmox web UI. It’s a set of opinionated scripts that handle the tedious parts — templating, cloning, backups, network reporting — so you can focus on what you’re actually trying to learn or build.

The structure follows a clear division: scripts/ is where the automation lives, and the top-level directories (backup/, gpu-passthrough/, networking/, security/, automation/) are documentation bundles that explain the why behind the how.

proxmox/
├── scripts/
│   ├── vm/          # Cloud-init templates, clone, snapshot, destroy, console
│   ├── containers/  # LXC creation and management
│   ├── backup/      # Single-VM, bulk, pruning, status reports
│   ├── k8s/         # Oracle Linux Kubernetes cluster deployment
│   ├── api/         # API token creation, curl wrappers
│   ├── network/     # Network config reporting across VMs/containers
│   └── storage/     # Disk usage and cleanup
├── backup/          # Strategy docs: 3-2-1, GFS retention, PBS setup, verification
├── gpu-passthrough/ # NVIDIA CUDA, AMD ROCm, gaming VMs, troubleshooting
├── networking/      # VLAN, firewall rules, SDN configuration guides
├── automation/      # Ansible playbooks, Terraform configs, learning path
├── security/        # Zero trust, CIS benchmarks, auditing, incident response
└── learning-paths/  # Skill progression from "create a VM" to "zero trust"

One config file drives everything. Copy config.example.sh to config.sh, fill in your storage pool, bridge interface, SSH key path, template IDs, and backup retention settings. Set it once, use it across every script.

🖥️ VM Templates and Cloning

This is where most people spend the most time repeating themselves. Two cloud-init flavors are supported out of the box:

# Ubuntu 24.04 — pull the SHA256 from the Ubuntu cloud images page
bash scripts/vm/create_cloud_init_template.sh -i 9000 --sha256 <ubuntu_sha>

# Oracle Linux 9 — checksum is bundled; this one just works
bash scripts/vm/create_cloud_init_template.sh -i 9100 --os ol9

Once you have a template, cloning is one line:

bash scripts/vm/clone_vm.sh -s 9000 -d 150 -n web01 -i 192.168.1.60/24 -g 192.168.1.1

Source template, destination ID, hostname, IP/mask, gateway. Your static IP and SSH key are baked into the VM before it boots. This is the “set it up correctly once and never think about it again” approach to ops. The rest of the VM scripts handle the supporting cast: snapshot.sh for point-in-time recovery, destroy_vm.sh when you’re done, console.sh for quick access, and find_ip.sh for when you can’t remember which IP you assigned to what.

Aside: Cloud-init is one of those technologies that makes complete sense once you understand it and is maddening until you do. The most common trap is expecting cloud-init to re-run after the VM has already booted once — it won’t without being told to. If your config isn’t applying, cloud-init clean followed by a reboot will save you a significant amount of frustrated tab-completion.

There’s also check_template.sh to validate your template before you clone twenty VMs from it. Speaking from personal experience: validate the template.

💾 Backups You’ll Actually Verify

Backups are one of those things everyone says they have until the moment they actually need one. The toolkit handles the whole lifecycle:

# Single VM or container
bash scripts/backup/backup_vm.sh --vmid 150

# Everything on the node
bash scripts/backup/backup_all.sh

# Prune per your retention policy
bash scripts/backup/prune_backups.sh

# Report on what's actually protected
bash scripts/backup/report.sh

The backup strategy documentation covers the 3-2-1 approach (local backup, offsite copy, one offline or air-gapped) and GFS retention — Grandfather-Father-Son — which defines exactly how long daily, weekly, monthly, and yearly backups survive before pruning. It’s the structure commercial backup products charge extra to explain.

The piece I’d read first, though, is backup/verification.md. It covers how to confirm a backup is actually usable before you’re under pressure to find out. “I think it ran” is not a backup strategy. Running a restore drill in a test environment once and confirming it succeeds is.

🎮 GPU Passthrough Without the Three-Hour Research Session

GPU passthrough in Proxmox has a well-earned reputation for being complicated. The gap between “works in theory” and “works for my specific GPU and motherboard combo” is where most people give up and just run the GPU on bare metal. The documentation in the toolkit covers both major paths:

NVIDIA + CUDA — for AI/ML workloads on a dedicated GPU inside a VM, with the driver configuration that actually sticks
AMD + ROCm — the open-source path for inference and compute work, including ROCm setup inside the guest
Gaming VMs — single GPU passthrough for a Windows gaming VM is a completely reasonable use of hardware you own, and there’s a dedicated guide for it

There’s a troubleshooting guide for when IOMMU groupings don’t cooperate, which on consumer hardware is more of a “when” than an “if.” ACS override options, VFIO binding order, and the usual suspects are all covered.

☸️ Kubernetes on Top of Proxmox

One script deploys a full Oracle Linux Kubernetes cluster:

bash scripts/k8s/deploy_ol_k8s_cluster.sh --config config.k8s.sh

It targets the 192.168.1.50-99 homelab IP range by default, handles VM provisioning, and walks through the Kubernetes bootstrapping sequence. This is explicitly a learning-path feature — it’s not production-hardened, and it doesn’t pretend to be. What it is good for is understanding how Kubernetes actually comes together piece by piece, without a managed service abstracting away the interesting parts.

If your goal is to understand what kubeadm init is doing and why, this is a faster path to that knowledge than starting from scratch.

🔒 Security Worth Taking Seriously

Security docs in most homelab repos are an afterthought dropped in because someone mentioned it in a pull request. This one treats security as first-class content:

Document	Coverage
`security/zero-trust.md`	Network segmentation, identity verification, least-privilege access
`security/cis-benchmarks.md`	Host hardening guidelines for the Proxmox node itself
`security/auditing.md`	What to check, how often, and what to do with the results
`security/incident-response.md`	What to do when something actually goes sideways

The honest caveat: these are guides, not automation. The expectation is that you read them, understand the reasoning, and implement with intention — not apply them blindly and assume you’re done. Security posture is a continuous practice, not a one-time configuration.

🤖 Automation: Ansible, Terraform, and API Access

Once you’re past one-off scripts and want repeatable, idempotent infrastructure, the toolkit has an on-ramp:

Ansible playbooks — configuration management for Proxmox nodes and the VMs running on them
Terraform + HCL — infrastructure-as-code for provisioning, with a main.tf and example patterns to build from
API helpers — create_api_token.sh and a minimal curl wrapper for the Proxmox API, useful when you’re scripting against the REST interface and don’t want to build that boilerplate yourself

The automation/learning-path.md lays out the progression: scripts first, Ansible when you need repeatability across multiple nodes, Terraform when you’re ready to think declaratively about what infrastructure should exist. That order matters — jumping straight to Terraform before you understand what it’s abstracting will bite you.

🎓 Learning Paths: There’s an On-Ramp for Everyone

This is the part that makes the toolkit useful across experience levels rather than just to people who already know what they’re doing:

Path	What You’re Building
🌱 Beginner	VM and container basics, simple backups, basic networking
🌿 Intermediate	VLANs, firewall rules, automated backup retention, Ansible
🌳 Advanced	Proxmox Backup Server, GPU passthrough, Terraform IaC
🌲 Expert	Zero trust architecture, CIS benchmark compliance, automated incident response

Most resources assume you’re either totally new or already building production clusters. The learning paths here acknowledge that the interesting ground is in between — where you understand enough to ask the right questions but haven’t yet built the muscle memory for the complex stuff.

🚀 Getting Started

git clone https://github.com/dereklarmstrong/proxmox.git
cd proxmox
cp config.example.sh config.sh   # Edit with your storage pool, bridge, SSH key path
./scripts/test.sh                # Run the test suite — worth doing before you rely on any of this

The logical starting point from there is enabling the community repository:

bash scripts/setup/pve_community_repo.sh

Then create_cloud_init_template.sh to build your first template. Everything else in the toolkit follows from having a good, verified base template to clone from.

If this saves you some time, pass it on. And if you find something that doesn’t work or could be smarter, the repo is open — github.com/dereklarmstrong/proxmox.