Monitoring and observability for AI: when the green dashboard lies

Seventh post in the series. In the previous one, we put models into production with CI/CD pipelines. Now: how do you know they’re actually healthy? The silent failure Your Azure OpenAI endpoint returns 200 OK on every request. Latency is normal, P95 under 800ms. CPU and memory within thresholds. Kubernetes shows healthy pods, no restarts. By every infra metric you trust, the system is perfect. But the support tickets keep coming. Users report the chatbot “gives worse answers.” Fluent but factually incorrect responses. Hallucinations are up, summarizations miss key points, code suggestions introduce subtle bugs. ...

June 3, 2026 · 5 min · Ricardo Martins

Introduction to AI and Comparing OpenAI with Azure OpenAI

As I embark on my journey of learning about artificial intelligence (AI), I am discovering the fascinating world of large language models (LLMs) and their applications in various technologies. In this article, I aim to share my newfound knowledge and insights with others who are also beginning their journey in AI. We will explore OpenAI, one of the leading organizations in AI research and development, and compare its offerings with Microsoft’s Azure OpenAI service. ...

May 10, 2024 · 3 min · Ricardo Martins