This is the first post in a series where I’ll translate the world of AI into the language that infrastructure engineers already speak. If you’re the kind of professional who configures VMs, builds CI/CD pipelines, and gets woken up at 2 AM when Nagios fires, this content is for you.

The series is based on my open-source book AI for Infrastructure Professionals, adapted and expanded here on the blog.

The Monday morning message

It’s 8:47 AM on a Monday. You’re halfway through your coffee, reviewing a Terraform plan for a network redesign, when a Slack message lights up your screen. It’s from the data science team lead:

“Hey — we need 8 GPU VMs provisioned by Wednesday for a fine-tuning job. We also need a private endpoint for the model’s inference API, and can you set up TPM monitoring? Thanks!”

You read it twice. GPU VMs? Fine-tuning? You know what a private endpoint is — you’ve configured hundreds. Monitoring? That’s your bread and butter. But what the hell is “TPM” in this context? It’s not Trusted Platform Module. It’s Tokens Per Minute, a throughput metric for language models. You don’t know that yet, but that’s fine.

Notice something: everything else in that request is pure infrastructure.

Provisioning compute. Configuring network security. Setting up observability. You’ve been doing this for years. The only difference is the type of workload.

At its core, AI is just another workload

Let me be direct. Strip away the buzzwords and AI is a workload. It consumes compute, storage, and networking, just like every other workload you’ve ever managed. The difference is in the shape of that consumption: more parallel compute, larger datasets, different performance metrics.

The AI stack runs on three layers you already know:

AI LayerWhat it doesYour infra equivalent
DataFeeds the model with examplesStorage: Blob, Data Lake, NFS, databases
ModelLearns patterns and makes predictionsThe application — your compiled binary running on compute
InfrastructureHolds everything up underneathYour domain: compute, networking, security, observability

The model is the application. The data is what it consumes and produces. The infrastructure is everything that makes it run reliably, securely, and at scale. That last part? That’s you.

Translating AI into infrastructure language

Back in 2014, when I started writing about Docker on this blog, the first thing I did was translate the concepts into something sysadmins already understood. I’m doing the same thing now with AI.

When someone from the AI team throws jargon you don’t recognize, map it back to what you already know:

AI ConceptInfrastructure EquivalentWhy it works
Trained modelCompiled binaryA static artifact produced by a build process, deployed to serve requests
Training a modelBatch jobLong-running, compute-intensive process that reads data and produces an output artifact
InferenceAn API callRequest comes in, the model processes it, response goes out. Just like any microservice
Fine-tuningPatching a binaryYou take an existing artifact and customize it for your environment
DatasetDatabase / Data LakeStructured input that the workload depends on
Training pipelineCI/CD pipelineAutomated workflow: ingest → process → build → validate → deploy
Model registryArtifact repositoryVersioned storage for deployable artifacts (like ACR, but for models)
GPU clusterHigh-performance computeSpecialized hardware allocated for heavy workloads

💡 Meeting tip: When the data science team starts talking about “epochs”, “hyperparameters”, and “loss functions”, don’t panic. Those are their tuning knobs — the equivalent of your connection pool sizes, cache TTLs, and autoscale thresholds. You don’t need to master their knobs. You need to understand what those knobs demand from your infrastructure.

What changes and what stays the same

The good news: AI infrastructure isn’t a different planet. It’s more like a new neighborhood in a city you already know. The streets follow the same grid, the utilities work the same way, but the buildings look different and the residents have unusual needs.

What changes

DimensionTraditional InfraAI Infra
ComputeCPUs, general-purpose VMsGPUs (NVIDIA T4, A100, H100), multi-GPU nodes
StorageSSD/HDD, managed disksData Lakes, high-throughput Blob, local NVMe for scratch
Networking1–25 GbE EthernetInfiniBand (up to 400 Gb/s), RDMA, GPU-to-GPU communication
DeploymentVMs, App Services, containersInference endpoints, model-as-a-service, GPU-enabled containers
ObservabilityCPU %, memory, disk I/OGPU utilization, VRAM, tokens/second, time-to-first-token
Cost$/hour per VM$/hour per GPU (10-30× CPU cost), PTUs for managed services

What doesn’t change

And this is equally important. Maybe more. These fundamentals don’t change just because the workload runs on GPUs:

  • Security: Network segmentation, private endpoints, identity management, encryption. A GPU VM still needs an NSG. An inference API still needs authentication.
  • Networking: VNets, subnets, DNS, load balancing. Packets still flow the same way.
  • Infrastructure as Code: Bicep, Terraform, ARM templates. GPU VMs are still Azure resources with properties and parameters.
  • Monitoring: You’ll still set thresholds, build dashboards, and respond to incidents. The metrics just have different names.
  • Cost management: Budgets, tagging, right-sizing. If anything, cost governance is more critical with AI workloads.

⚠️ Production alert: The most common failures in production AI systems aren’t model accuracy problems. They’re the same old villains: disk full, network timeout, expired certificate, missing RBAC permission. Your instincts are right.

Why AI needs you (not the other way around)

The AI industry has a people problem, and it’s not what you’d expect. Data scientists who can build models in Jupyter notebooks are plentiful. What’s actually scarce are engineers who can take those models and run them reliably in production.

In my experience working with startups and enterprises at Microsoft, I see this pattern constantly:

Uncontrolled GPU sprawl. A data scientist requests 4 Standard_NC24ads_A100_v4 VMs for a training experiment. No resource locks, no budget alerts, no tagging. Three weeks later, the VMs are still running. Nobody remembers who provisioned them or whether the experiment finished. Monthly cost: $35,000+.

Exposed inference endpoints. The ML team deploys a model to a managed endpoint with a public IP. No private endpoint, no WAF, no API management. The model serves responses that include proprietary business logic.

Blind observability. The team monitors model accuracy but not infrastructure health. When inference latency jumps from 200ms to 8 seconds, nobody can tell whether it’s the model, the compute, the network, or a noisy neighbor.

⚠️ The $50K GPU weekend: A team provisioned 8 Standard_ND96asr_v4 VMs (A100 GPUs) on a Friday afternoon for a training run that was supposed to finish Saturday morning. The job crashed at 3 AM due to a checkpoint storage misconfiguration, but the VMs kept running. Nobody had set up auto-shutdown or budget alerts. Monday surprise: $53,000 in compute for 60 hours of idle GPU. An infrastructure engineer would have configured auto-shutdown, set a budget alert at $5,000, and stored checkpoints in Blob with lifecycle policies. Fifteen minutes of infra work would have saved $48,000.

Hands-on: your first AI reconnaissance

You don’t need to train a model or write Python. You need to understand that GPU compute is available to you and what your subscription’s limits are. This is reconnaissance — the same first step you’d take before architecting any new workload.

Discover GPU VMs in your region

az vm list-skus --location eastus2 --size Standard_N --output table

This filters the Standard_N family, which includes all GPU-accelerated VMs in Azure. Pay attention to three prefixes:

  • NC: Compute-optimized GPUs for training and inference (NVIDIA T4, A100)
  • ND: High-end GPUs for distributed deep learning with InfiniBand (A100, H100)
  • NV: GPUs for visualization and lightweight inference (AMD Radeon, NVIDIA A10)

Check your GPU quota

az vm list-usage --location eastus2 --output table | grep -E "NC|ND|NV"

On Windows/PowerShell, replace grep -E "NC|ND|NV" with Select-String -Pattern "NC|ND|NV".

If your quota is zero across the board, you’ll need to request an increase before any provisioning. That’s exactly the kind of infra work that the data science team doesn’t know (and doesn’t want to know) how to do.

Next up

I’ll talk about data and storage for AI workloads — the piece everyone ignores but that ends up being the hidden performance bottleneck in virtually every AI project I’ve seen.

The full book is available for free at ai4infra.com.


This post is part of the AI for Infrastructure Engineers series, based on the book AI for Infrastructure Professionals. New posts every week.