Azure OpenAI in production: tokens, throughput, and high availability

Eleventh post in the series. In the previous one, we built the self-service AI platform with multi-tenancy and scheduling. Now: the service everyone wants to consume, Azure OpenAI, and how to operate it without getting 429’d in the face. The 429 that changed everything Your team launched an internal GPT-4o chatbot on Monday. Day 1: smooth sailing, demos for leadership, Slack full of praise. Day 3: “the bot is slow.” Day 5: 30% of requests return HTTP 429. You open Azure Monitor and discover you’re hitting the 80K TPM ceiling. ...

June 19, 2026 · 5 min · Ricardo Martins