Azure

From prompt engineering to frontier company: why the model is no longer the differentiator

Three years ago, the question I heard most was: “what’s the best prompt?” Two years ago, it shifted to: “how do I do RAG?” Last year: “how do I build an agent?” This year, the conversation is different. People are asking how to transform an entire organization to operate with agents. Not a chatbot on the website. Dozens of agents embedded in business processes, with governance, observability, granular permissions. That progression tells a story, and we often discuss each phase as if it appeared out of nowhere. ...

AI adoption framework: from enthusiasm to governance

Fourteenth post in the series. In the previous one, we used AI for our own infrastructure work. This time the scope is bigger: how to take an entire organization from “let’s use AI” to a governed platform that can survive contact with finance, security, and production support. tl;dr AI adoption fails when teams skip readiness, guardrails, and cost controls. A workable path is assessment, enablement, platform prep, controlled experimentation, production governance, and continuous review. Treat AI as an operating capability with budgets, runbooks, and policies from day 1. Best intentions, worst outcomes Your CTO walks into the all-hands and says: “We’re going all-in on AI.” The room buzzes. Teams start brainstorming use cases before the meeting ends. Within two weeks, Slack is full of threads about GPU availability. ...

MCP and AI Agents 101 for Infrastructure Engineers

Chapter 1: MCP and AI Agents 101 At some point in the last few months, someone on your team probably showed up talking about an “AI agent” or an “MCP server” and asked for cluster access, a deployment, or an explanation for the CISO. I wish I’d had a clean mental model before I touched any of this. That’s what this post is: no hype, and a real Azure example so this does not stay in slideware. ...

AI use cases for infra teams: AIOps and beyond

Thirteenth post in the series. In the previous one, we dealt with the incidents that wake you up at 2 AM. This time the angle flips: using AI to make the infrastructure work itself less miserable. tl;dr AI helps with summarizing, drafting, and finding patterns across noisy data. Do not hand it deterministic enforcement, compliance evidence, or unattended production actions. Flipping the perspective Over the past 12 posts, you’ve been building infra for AI: GPUs, clusters, pipelines, security, monitoring, cost management. You know how to keep the runway paved for data scientists. ...

Troubleshooting playbook: incidents that will wake you at 2AM

Twelfth post in the series. In the previous one, we ran Azure OpenAI with HA and sane retry patterns. This one is for when the nice diagram meets real life. This post is organized as real-world failure scenarios. Each follows: Symptoms → Diagnosis → Root Cause → Resolution → Prevention. Read it once for pattern recognition. Then bookmark it. You will need it again. tl;dr Most late-night AI infra incidents come down to driver drift, memory pressure, scheduler mismatch, throttling, or cold starts. Start with the first check that rules out the biggest class of failure. Scenario 1: NVIDIA driver crash after kernel update Symptoms Monday morning. The ML team reports that all GPU workloads failed over the weekend. Nobody deployed anything. You SSH in: ...

Azure OpenAI in production: tokens, throughput, and high availability

Eleventh post in the series. In the previous one, we built the self-service AI platform with multi-tenancy and scheduling. This time it’s the service everybody wants to consume: Azure OpenAI, and how to run it without getting slapped by 429s. tl;dr Azure OpenAI capacity is a token problem before it is a scaling problem. Design around TPM and RPM, back off on 429s, and route across deployments instead of betting everything on one endpoint. The 429 that changed everything Your team launched an internal GPT-4o chatbot on Monday. Day 1 was demos for leadership and Slack praise. Day 3 brought “the bot is slow.” Day 5 brought HTTP 429 on 30% of requests. You open Azure Monitor and find the 80K TPM ceiling waiting for you. ...

Platform ops: building a self-service AI platform

Tenth post in the series. In the previous one, we controlled costs with Spot VMs, right-sizing, and FinOps. Now for the next problem: how to stop being a human help desk for GPU access. tl;dr Self-service AI platforms need isolation, quotas, and scheduling together. The goal is fewer tickets, not faster manual provisioning. The Slack channel that ate your calendar Six months ago, you provisioned a single GPU VM for the ML team. Configured drivers, mounted storage, closed the ticket. Felt like any other infrastructure request. ...

Cost engineering for AI: when idle GPUs cost more than your car

Ninth post in the series. In the previous one, we hardened the platform against prompt injection and data leakage. Now for the part Finance notices first: how not to go bankrupt in the process. tl;dr AI cost control starts with lifecycle policy, model choice, and quota discipline. Shut down idle GPUs, use cheaper models where quality allows, and treat every exact cost number as time-sensitive. The $127,000 Monday Monday morning. Coffee in hand, email from Finance with the subject line: “URGENT: Azure invoice $127,000, please explain.” Forecast was $42,000. Two ND96isr_H100_v5 VMs, provisioned three weeks ago for a “quick experiment,” never shut down. At about $98/hour each, running 24/7 for three weeks: roughly $99,000 in idle GPU compute. Nobody was using them. Nobody remembered they existed. ...

Security for AI: threats your firewall won't catch

Eighth post in the series. In the previous one, we learned that a green dashboard doesn’t guarantee a healthy model. Now: the threats your WAF won’t catch. tl;dr Most AI security failures happen at identity, data access, and prompt boundaries, not at the firewall. Use managed identity, RBAC, Key Vault, private connectivity, and content filtering together. The chatbot that knew too much Your organization deploys an internal chatbot with Azure OpenAI, connected to a knowledge base of policies, documentation, and FAQs. The rollout goes smoothly, adoption takes off, and leadership is already planning a customer-facing version. ...

Monitoring and observability for AI: when the green dashboard lies

Seventh post in the series. In the previous one, we put models into production with CI/CD pipelines. Now: how do you know they’re actually healthy? tl;dr Infra health is not model health. Track GPU, token, application, and answer-quality signals together or you will miss regressions while every dashboard stays green. The silent failure Your Azure OpenAI endpoint returns 200 OK on every request. Latency is normal, P95 under 800ms. CPU and memory within thresholds. Kubernetes shows healthy pods, no restarts. By every infra metric you trust, the system is perfect. ...