Infrastructure as Code for AI: automating GPU clusters

Fifth post in the series. In the previous one, we dove inside the GPU. Now let’s automate everything around it. Because understanding GPUs is half the battle; provisioning them consistently and at scale is where infrastructure engineering actually meets AI. The $4,000 typo I started the week with a win. Manually provisioned a GPU cluster in East US 2 for an ML experiment: AKS with a Standard_NC6s_v3 node pool, accelerated networking, NVIDIA drivers, correct taints. Took almost a full day, but it worked. ...

May 26, 2026 · 7 min · Ricardo Martins