Monitoring and observability for AI: when the green dashboard lies

Seventh post in the series. In the previous one, we put models into production with CI/CD pipelines. Now: how do you know they’re actually healthy? The silent failure Your Azure OpenAI endpoint returns 200 OK on every request. Latency is normal, P95 under 800ms. CPU and memory within thresholds. Kubernetes shows healthy pods, no restarts. By every infra metric you trust, the system is perfect. But the support tickets keep coming. Users report the chatbot “gives worse answers.” Fluent but factually incorrect responses. Hallucinations are up, summarizations miss key points, code suggestions introduce subtle bugs. ...

June 3, 2026 · 5 min · Ricardo Martins