Three years ago, the question I heard most was: “what’s the best prompt?”

Two years ago, it shifted to: “how do I do RAG?”

Last year: “how do I build an agent?”

This year, the conversation is different. People are asking how to transform an entire organization to operate with agents. Not a chatbot on the website. Dozens of agents embedded in business processes, with governance, observability, granular permissions.

This evolution tells a story. And I think few people have stopped to connect the dots.

The timeline

Prompt EngineeringContext EngineeringRAG + Tool CallingAgent EngineeringHarness EngineeringMulti-Agent SystemsFrontier Company2023202420242025202620262026+how to talk to the modelwhat the model can seegive it memory and actiondecision autonomyproduction reliabilityagent coordinationentire org operates with agents

Each step solved a real problem the previous one didn’t cover. Prompt engineering taught us how to talk to the model. Context engineering taught us how to feed it the right information. RAG and tool calling gave it memory and the ability to act. Agents gave it autonomy. And then we realized that autonomy without control is chaos in production.

The model became a commodity

I’ll be direct: the model is no longer the competitive differentiator.

In 2023, having access to GPT-4 was a real advantage. Today there’s GPT-5, Claude, Gemini, Llama, DeepSeek, Mistral, Qwen. All excellent. All capable of writing code, interpreting images, calling tools, solving complex problems.

Are there still differences between them? Yes. But the gap between the best and the fifth best has shrunk so much that it rarely determines a project’s success.

Think of it this way: two companies using the exact same model. The first connects that model to their CRM, ERP, monitoring, internal docs, pipelines, security policies, and business workflows. The second opens a chat window.

Same model. Completely different results.

The value was never just in the brain. It was always in the system around it.

Harness engineering: the name of the game now

A formula that’s been showing up a lot this year captures it well:

Agent = Model + Harness

The model is the “brain.” The harness is everything that turns that brain into an agent that works in production. An analogy I like: think of an F1 driver. The driver is the LLM. The car, radio, telemetry, pit crew, tire strategy, and race regulations are the harness. Put the best driver in a bad car and he loses the race.

In practice, the harness of a corporate agent includes:

ComponentWhat it doesAzure service
System promptsDefines personality and constraintsAzure AI Foundry
Tools (MCP/APIs)Gives the ability to actAzure Functions, Logic Apps
RAGRetrieves relevant knowledgeAzure AI Search
MemoryMaintains state across sessionsCosmos DB
PermissionsControls who accesses whatMicrosoft Entra ID
Human approvalCritical decisions go through peopleLogic Apps, Service Bus
EvaluationMeasures response qualityAzure AI Foundry evals
ObservabilityLogs, traces, metricsAzure Monitor, App Insights
GuardrailsPrevents unwanted behaviorAzure AI Content Safety
OrchestrationCoordinates multiple agentsAzure Container Apps, AKS

Notice the pattern? Almost nothing on this list is AI. It’s software engineering. Infrastructure. The kind of thing we’ve been doing for years, applied to a new context.

A concrete example

Imagine an agent that answers questions about your Azure infrastructure. Someone asks:

“Which VM has been consuming the most CPU in the last 24 hours?”

The model alone doesn’t know the answer. It has no access to your environment.

With a well-built harness, the actual flow is:

  1. The agent identifies the intent (VM metrics query)
  2. Validates whether the user has permission to access this data (Entra ID)
  3. Queries Azure Resource Graph to list VMs
  4. Calls Azure Monitor to pull CPU metrics
  5. Compares results and identifies the worst case
  6. Builds the response with real data
  7. Optionally suggests action (resize, alert, runbook)

This entire flow has nothing to do with “which model to use.” It’s system design. Integrations. Permissions. Observability.

In code, the agent’s call to Azure Resource Graph would look something like:

az graph query -q "
  Resources
  | where type == 'microsoft.compute/virtualmachines'
  | project name, resourceGroup, location, properties.hardwareProfile.vmSize
" --output table

And the CPU metrics would come from Monitor:

az monitor metrics list \
  --resource "/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.Compute/virtualMachines/{vm}" \
  --metric "Percentage CPU" \
  --interval PT1H \
  --start-time 2026-07-02T00:00:00Z \
  --end-time 2026-07-03T00:00:00Z \
  --output table

The model only comes in to decide the sequence of calls and format the final response. The harness does the heavy lifting.

Context engineering: the agent’s operating system

Context is not just “dump documents into the prompt.”

Context engineering is deciding:

  • What the agent can see (and what it can’t)
  • When it can see it (temporal context)
  • How long it retains information (short-term vs long-term memory)
  • In what format it receives data (structured vs unstructured)
  • How much space each piece takes in the token budget

I wrote about this in detail in the context engineering post. The short version: well-crafted context is 80% of the result. A mediocre model with excellent context beats a top model with poor context.

MCP: the USB-C of agents

Another harness component that exploded in 2026: the Model Context Protocol. I’ve written about it here and here, but in the context of this post the point is simple.

Before MCP, every tool needed a custom integration. Now there’s a standardized protocol that connects any model to any system. That’s what allowed harness engineering to scale. Instead of building one-off integrations, you expose your systems as MCP servers and any agent can consume them.

Frontier company: when the harness becomes the business

Microsoft started using the term “frontier company” to describe something beyond “company that uses AI.”

A frontier company is not a company where some employees use Copilot. It’s a company where:

  • Agents are part of business processes
  • Humans supervise and decide, agents research and execute
  • There’s real governance over what agents can do
  • Productivity is measured at the organizational level
  • Data and systems are connected in a way agents can navigate

The difference seems subtle. But it changes the entire business architecture.

The parallel with what we’ve already lived through

In the 2000s, we learned to build for physical servers. Then virtualization. Then cloud. Then containers. Then Kubernetes.

Each of those changes seemed to be “just a new technology.” In reality, each one completely changed how we think about software. Cloud wasn’t “run a VM somewhere else.” It was a new mental model for how to build, operate, and scale systems.

I have the feeling we’re living through the same thing again. Applications whose behavior is defined by agents already exist in production. And they require different practices, different architectures, different ways of testing and monitoring.

If the previous era was “cloud native,” maybe this one is “agent native.”

The real differentiator

A few years from now, I doubt anyone will ask which model your company uses. The same way nobody today asks which hypervisor runs your environment or which web server delivers your site.

The differentiator will be elsewhere:

  • In the quality of context your agents receive
  • In the engineering of the harness (security, observability, governance)
  • In the integration between agents and people
  • In the ability to turn artificial intelligence into organizational intelligence

Because companies don’t compete on models. They compete on the ability to use knowledge to make better decisions, faster.

And the system that enables that has a name: harness.


This post is also available in Portuguese. If you want to dive deeper into the technical concepts mentioned here, check out the posts on context engineering, RAG, and MCP.