Cicd | Ricardo Martins — Cloud Architecture, Azure, Kubernetes & AI

Sixth post in the series. In the previous one, we automated GPU cluster provisioning. Next comes what happens after the hardware is ready: how a model goes from “works on my notebook” to “running in production with an SLA.” tl;dr Models need the same artifact, promotion, and rollback discipline as application builds. Use a real registry with metadata and controlled deployments. Prefer MLflow aliases over deprecated stages when describing promotions. The model with no birth certificate A data scientist drops a message in the team channel with a link to a shared drive: “Here’s the model. It’s a 15 GB PyTorch checkpoint. We need it in production by Friday.” ...