Platform ops: building a self-service AI platform

Tenth post in the series. In the previous one, we controlled costs with Spot VMs, right-sizing, and FinOps. Now: how to stop being a human help desk for GPU. The Slack channel that ate your calendar Six months ago, you provisioned a single GPU VM for the ML team. Configured drivers, mounted storage, closed the ticket. Felt like any other infrastructure request. Today, you have four teams, three AKS clusters, dozens of GPU node pools, and a growing collection of Azure OpenAI endpoints. Each team wants their own resources, their own quotas, and their own SLAs. Your DMs have turned into a help desk: “Can we get more GPUs?” “Why is my training job Pending?” “Who’s using all the A100s?” ...

June 15, 2026 · 7 min · Ricardo Martins

Deploying an Application on OpenShift Local: A Beginner's Guide

Introduction OpenShift, developed by Red Hat, extends Kubernetes to provide a more robust platform for deploying and managing containerized applications in a complete application platform. It integrates the core features of Kubernetes with additional tools and services to enhance developer productivity and operational efficiency. This guide aims to introduce beginners to deploying applications on OpenShift Local, a streamlined method to run OpenShift clusters locally for development and testing. Using a local OpenShift environment, offers several benefits, especially for developers who are new to OpenShift or Kubernetes: ...

December 8, 2023 · 4 min · Ricardo Martins

DevSecOps Workshop

Just sharing an awesome learning resource I found recently. It will introduce you to the application development cycle leveraging OpenShift’s tooling & features with a special focus on securing your environment using Advanced Cluster Security for Kubernetes (ACS). You will get a brief introduction in several OpenShift features like OpenShift Pipelines, OpenShift GitOps, and OpenShift DevSpaces. Check out at https://devsecops-workshop.github.io/

December 7, 2023 · 1 min · Ricardo Martins