Three years ago, the question I heard most was: “what’s the best prompt?”
Two years ago, it shifted to: “how do I do RAG?”
Last year: “how do I build an agent?”
This year, the conversation is different. People are asking how to transform an entire organization to operate with agents. Not a chatbot on the website. Dozens of agents embedded in business processes, with governance, observability, granular permissions.
This evolution tells a story. And I think few people have stopped to connect the dots.
The timeline
Each step solved a real problem the previous one didn’t cover. Prompt engineering taught us how to talk to the model. Context engineering taught us how to feed it the right information. RAG and tool calling gave it memory and the ability to act. Agents gave it autonomy. And then we realized that autonomy without control is chaos in production.
The model became a commodity
I’ll be direct: the model is no longer the competitive differentiator.
In 2023, having access to GPT-4 was a real advantage. Today there’s GPT-5, Claude, Gemini, Llama, DeepSeek, Mistral, Qwen. All excellent. All capable of writing code, interpreting images, calling tools, solving complex problems.
Are there still differences between them? Yes. But the gap between the best and the fifth best has shrunk so much that it rarely determines a project’s success.
Think of it this way: two companies using the exact same model. The first connects that model to their CRM, ERP, monitoring, internal docs, pipelines, security policies, and business workflows. The second opens a chat window.
Same model. Completely different results.
The value was never just in the brain. It was always in the system around it.
Harness engineering: the name of the game now
A formula that’s been showing up a lot this year captures it well:
Agent = Model + Harness
The model is the “brain.” The harness is everything that turns that brain into an agent that works in production. An analogy I like: think of an F1 driver. The driver is the LLM. The car, radio, telemetry, pit crew, tire strategy, and race regulations are the harness. Put the best driver in a bad car and he loses the race.
In practice, the harness of a corporate agent includes:
| Component | What it does | Azure service |
|---|---|---|
| System prompts | Defines personality and constraints | Azure AI Foundry |
| Tools (MCP/APIs) | Gives the ability to act | Azure Functions, Logic Apps |
| RAG | Retrieves relevant knowledge | Azure AI Search |
| Memory | Maintains state across sessions | Cosmos DB |
| Permissions | Controls who accesses what | Microsoft Entra ID |
| Human approval | Critical decisions go through people | Logic Apps, Service Bus |
| Evaluation | Measures response quality | Azure AI Foundry evals |
| Observability | Logs, traces, metrics | Azure Monitor, App Insights |
| Guardrails | Prevents unwanted behavior | Azure AI Content Safety |
| Orchestration | Coordinates multiple agents | Azure Container Apps, AKS |
Notice the pattern? Almost nothing on this list is AI. It’s software engineering. Infrastructure. The kind of thing we’ve been doing for years, applied to a new context.
A concrete example
Imagine an agent that answers questions about your Azure infrastructure. Someone asks:
“Which VM has been consuming the most CPU in the last 24 hours?”
The model alone doesn’t know the answer. It has no access to your environment.
With a well-built harness, the actual flow is:
- The agent identifies the intent (VM metrics query)
- Validates whether the user has permission to access this data (Entra ID)
- Queries Azure Resource Graph to list VMs
- Calls Azure Monitor to pull CPU metrics
- Compares results and identifies the worst case
- Builds the response with real data
- Optionally suggests action (resize, alert, runbook)
This entire flow has nothing to do with “which model to use.” It’s system design. Integrations. Permissions. Observability.
In code, the agent’s call to Azure Resource Graph would look something like:
az graph query -q "
Resources
| where type == 'microsoft.compute/virtualmachines'
| project name, resourceGroup, location, properties.hardwareProfile.vmSize
" --output table
And the CPU metrics would come from Monitor:
az monitor metrics list \
--resource "/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.Compute/virtualMachines/{vm}" \
--metric "Percentage CPU" \
--interval PT1H \
--start-time 2026-07-02T00:00:00Z \
--end-time 2026-07-03T00:00:00Z \
--output table
The model only comes in to decide the sequence of calls and format the final response. The harness does the heavy lifting.
Context engineering: the agent’s operating system
Context is not just “dump documents into the prompt.”
Context engineering is deciding:
- What the agent can see (and what it can’t)
- When it can see it (temporal context)
- How long it retains information (short-term vs long-term memory)
- In what format it receives data (structured vs unstructured)
- How much space each piece takes in the token budget
I wrote about this in detail in the context engineering post. The short version: well-crafted context is 80% of the result. A mediocre model with excellent context beats a top model with poor context.
MCP: the USB-C of agents
Another harness component that exploded in 2026: the Model Context Protocol. I’ve written about it here and here, but in the context of this post the point is simple.
Before MCP, every tool needed a custom integration. Now there’s a standardized protocol that connects any model to any system. That’s what allowed harness engineering to scale. Instead of building one-off integrations, you expose your systems as MCP servers and any agent can consume them.
Frontier company: when the harness becomes the business
Microsoft started using the term “frontier company” to describe something beyond “company that uses AI.”
A frontier company is not a company where some employees use Copilot. It’s a company where:
- Agents are part of business processes
- Humans supervise and decide, agents research and execute
- There’s real governance over what agents can do
- Productivity is measured at the organizational level
- Data and systems are connected in a way agents can navigate
The difference seems subtle. But it changes the entire business architecture.
The parallel with what we’ve already lived through
In the 2000s, we learned to build for physical servers. Then virtualization. Then cloud. Then containers. Then Kubernetes.
Each of those changes seemed to be “just a new technology.” In reality, each one completely changed how we think about software. Cloud wasn’t “run a VM somewhere else.” It was a new mental model for how to build, operate, and scale systems.
I have the feeling we’re living through the same thing again. Applications whose behavior is defined by agents already exist in production. And they require different practices, different architectures, different ways of testing and monitoring.
If the previous era was “cloud native,” maybe this one is “agent native.”
The real differentiator
A few years from now, I doubt anyone will ask which model your company uses. The same way nobody today asks which hypervisor runs your environment or which web server delivers your site.
The differentiator will be elsewhere:
- In the quality of context your agents receive
- In the engineering of the harness (security, observability, governance)
- In the integration between agents and people
- In the ability to turn artificial intelligence into organizational intelligence
Because companies don’t compete on models. They compete on the ability to use knowledge to make better decisions, faster.
And the system that enables that has a name: harness.
This post is also available in Portuguese. If you want to dive deeper into the technical concepts mentioned here, check out the posts on context engineering, RAG, and MCP.