Cost engineering for AI: when idle GPUs cost more than your car

Ninth post in the series. In the previous one, we hardened the platform against prompt injection and data leakage. Now: how not to go bankrupt in the process. The $127,000 Monday Monday morning. Coffee in hand, email from Finance in the subject line: “URGENT: Azure invoice $127,000, please explain.” Forecast was $42,000. Two ND96isr_H100_v5 VMs, provisioned three weeks ago for a “quick experiment,” never shut down. At ~$98/hour each, running 24/7 for three weeks: $33,000 in idle GPU compute. Nobody using them. Nobody remembered they existed. ...

June 11, 2026 · 6 min · Ricardo Martins

Monitoring and observability for AI: when the green dashboard lies

Seventh post in the series. In the previous one, we put models into production with CI/CD pipelines. Now: how do you know they’re actually healthy? The silent failure Your Azure OpenAI endpoint returns 200 OK on every request. Latency is normal, P95 under 800ms. CPU and memory within thresholds. Kubernetes shows healthy pods, no restarts. By every infra metric you trust, the system is perfect. But the support tickets keep coming. Users report the chatbot “gives worse answers.” Fluent but factually incorrect responses. Hallucinations are up, summarizations miss key points, code suggestions introduce subtle bugs. ...

June 3, 2026 · 5 min · Ricardo Martins

Introduction to AI and Comparing OpenAI with Azure OpenAI

As I embark on my journey of learning about artificial intelligence (AI), I am discovering the fascinating world of large language models (LLMs) and their applications in various technologies. In this article, I aim to share my newfound knowledge and insights with others who are also beginning their journey in AI. We will explore OpenAI, one of the leading organizations in AI research and development, and compare its offerings with Microsoft’s Azure OpenAI service. ...

May 10, 2024 · 3 min · Ricardo Martins