How to build an agentification factory that actually turns a profit

27–40 minutes

read

An intro to Dutch tokenomics

If you are responsible for running an agentification program, you will have learned the hard way, how fragile your ROI is. Even though token prices dropped 99.7% over the last few years, the bills that the frontier AI labs have sent to our companies have tripled, and that is because agents are a totally different beast than a chatbot.

I want you to sit with that for a moment before we discuss anything else, and think of the different types of cost you’ve ran into at one point in time, and see later on if I have missed anything. I value your experience, that will help me – not to fail.

Article content

We are all working at the frontier of unsexy AI, and ROI has to be treated as a demi-god, and I think that the future belongs to controlled, cost-aware, governed agentic systems and not to enterprises that blindly rent intelligence from the number one frontier lab at luxury-printer-ink-prices.

Though, over the years, the price per token has collapsed to nearly nothing, the total invoice went up anyway, because no one was counting the right things. The CloudZero state of AI† research reported that nearly all of agentic AI pilots generated no meaningful return. They found that average monthly enterprise AI spend jumped 36% between 2024 and 2025 alone, and the share of companies spending over a hundred thousand dollars per month more than doubled in that same window.

Then there’s the EY Responsible AI Survey conducted with 975 senior execs across 21 countries who were running Agentic AI programs at companies with larger than 1 Billion $ revenue, paints a far more nuanced picture of enterprise AI. Most of the companies experienced some form of financial loss associated with AI deployment. These losses were linked to governance failures, flawed outputs, compliance issues, bias, and operational disruptions.

The study even stated that even though AI frequently improves productivity and efficiency, organizations struggle to convert those gains into measurable financial value. Efficiency improvements, in many cases, simply created additional work capacity due to post-factum governance related actions, rather than reducing costs or increasing revenue. The result is a growing gap between AI activity and AI value realization.

My conclusion is that the technology keeps getting cheaper, but the programs keep getting more expensive. And at some point you have to conclude that the problem is not the price of tokens, though it is a huge factor contributing to the cost, you have to managed your ROI on an integrated way.


Now that’s what I call Dutch Tokenomics.

Yes, I am Dutch.

And as you know, the Dutch love their chocolate sprinkle “hagelslag” sandwiches with lots of butter, we drink milk by the liter, therefore we’re all as tall as a tree, and I have a bicycle which I locked to three separate fixed objects because I also know what my countrymen do when they are drunk and the taxi fare feels unreasonable.

We also have a reputation.

The English language preserved it in amber. Going Dutch means that everyone pays their own share, precisely calculated, and you won’t see any gesture of generosity that would require reciprocation. Therefore a Dutch treat is not a treat. And Dutch courage is borrowed confidence that dissolves in daylight, a Dutch uncle is the relative who tells you the thing you do not want to hear with the warmth of a quarterly financial review. The national character, as perceived from the outside, is a people who looked at wealth and abundance and decided the correct response was portion control.

Article content

Yes. we’re mostly know for our frugal, measured lifestyle.

When a Dutch family invites you to dinner, you tell them in advance exactly how many people you are bringing. It isn’t because we are inhospitable, but it’s simply because our portions are calculated. Even our bread arrives pre-sliced by machine, every slice identical and accounted for, because variation is waste and waste is a moral failing that has been engrained in our DNA for over centuries, and the leftovers, if any exist, are already someone’s lunch tomorrow.

And this is exactly the way I am running our agentification factory‡

I see the frontier model as the taxi. It is comfortable and fast, and it will absolutely get you there, but at scale it will bankrupt you. The open-weight models like QWEN 3.7 Max, which can run for 35 hours straight, running locally on AMD hardware is my bicycle. It is slightly less comfortable, and it runs three months behind the frontier in intelligence terms, but for the Zone II processes that constitute the bulk of a profitable agentification program, three months of intelligence gap is operationally invisible. The agent is routing invoices and extracting structured data, and it does not have to win the Olympic Gold Medal. The bicycle will arrive at the same destination, and even better, it just costs a fraction of the fare. And you own the infrastructure, which means the bill is fixed and the data never leaves the building.

Tripple whammy!

So, no, Dutch Tokenomics is not a poverty mindset.

In our program we have an engineering meets accounting attitude that we applied to an entrenched cost structure that the industry has collectively decided to ignore because the vendors benefit from our ignorance and the consultants bill by the complexity.

It’s all about being frugal where frugal is correct, and reserving the expensive intelligence for the decisions that genuinely require it, measuring value at the trajectory level before you scale, calibrating governance to the actual risk, not the imagined risk – don’t treat all agents the same and running local inference where the volume justifies it, and the volume justifies it sooner than you think.

The Dutch did not build the a cool water management system in human history by spending freely and hoping for the best. They built it by understanding exactly what the water cost, exactly where it needed to go, and how much engineering was required to keep the water from getting our feet wet. And if you look close, there’s not one meter of dike more than necessary.

So, treat your agentic program like water management. The money flows in the wrong direction until someone builds the right dike to redirect it.

I wrote this piece to help you think like a Dutchie.

This article is about the six cost categories that kill agentic programs, and about what a factory that actually makes money looks like in practice. Some of it will be uncomfortable, but I will try to be funny about it, because if I am not, I will simply be angry.

Now go read and build that dike.

Links in the comments.

‡ This blog describes how to setup an agentification factory:


The real factory model no one puts on the slide

As you know I run an agentification factory, and I’ve deployed plenty of agentic systems across multiple processes in a few sectors, and I founded Eigenvector Research specifically because I got tired of watching enterprises burn money on AI programs that were structurally designed to fail before a single agent went live. I decided to take up the gauntlet because I’ve sat in the meetings, seen the business cases and watched the governance overhead eat our savings alive, all while the program manager updated the RAG status to amber and secretly hoped no one would look too closely at the denominator.

An agentification factory, properly built, runs a couple of streams in parallel.

Article content

The Analysis and Improvement team takes business processes, dissects them, runs them through a triage methodology, and delivers them to the build team as clean BPMN packages with defined exception paths, bounded context windows, and measurable success criteria attached The Hyperautomation team takes those packages and builds agents against them and uses RPA where appropriate, agent frameworks where necessary, a pattern library of proven agentic design patterns, and maintains a harness architecture that has been stress-tested before the business ever sees it. These two streams are what most people think of when they think of an AI factory, and they are both necessary.

The third workstream is the one that determines whether the factory makes money. I call it the Value and Compliance stream, and that stream builds what I call the Evidence Factory‡

The reason those two things live in the same room is the core insight of this entire article. Agents generate savings, that is the benefit side of the ledger, and governance, inference, integration, people, evaluation, and time sit on the cost side. Most factory designs treat these as separate concerns managed by separate teams, but that is a mistake with measurable financial consequences. The Roundtrip Value Governance research* I published earlier this year found that hidden costs erode between 55 and 65 percent of claimed FTE savings before they reach the accounts, through intervention cost, hidden work, governance overhead, and value degradation over time. More than half of what the business case promised is consumed by cost categories that were never mentioned in the business case at all. The Value and Compliance stream exists to make those costs visible, measurable, and controllable in real time rather than embarrassing at the quarterly review, and the Evidence Factory puts them in a layered architecture including tools. I’ve written about these streams at length, and if you want to know how to build them, read the blogs or download the white papers*.

With that architecture in mind, let me walk through the six ways factories bleed money. I have ranked them roughly by financial impact, and I have saved the most important one for last because that is how articles work and also because it is the one the vendors will never tell you about.

† Read:

‡ Read:

* Read:


More rants after the messages


Failure mode one | Picking the wrong processes

The fastest way to destroy an agentic program’s economics is to automate processes that should never have been automated in the first place. I see this constantly and it takes two distinct forms that are almost opposites of each other.

Article content

The easiest way to burn money is picking Zone I processes. Zone I, in the PASF framework, me and my students developed by following 177 agentic deployments* for nearly a year, covers processes that are fully deterministic, rule-based, and structurally simple. These are the processes that RPA has been automating for fifteen years, and a chatbot like Copilot with Studio handles them for a fraction of what a tool-calling agent costs. But an agent with tool access, reflection loops, and a verification pass is orders of magnitude more expensive per interaction than a well-designed conversational flow.

So, if the process needs a simple answer to a simple question, deploy an RPA or a chatbot like Copilot studio, if you want to get sassy. But when you do decide to deploy an autonomous agent† there, you are not saving money on your process, but instead you’re funding one.

The second form is picking Zone III processes. These are the long-horizon, multi-step, complex judgment workflows that every ambitious CTO wants to automate first because they would be genuinely transformative if the technology could handle them reliably, and most of the execs have built their agentic business cases around them. Wake up, and smell the coffee. They cannot.

Article content

I asked the AI on ATLAS‡ to give me the summary of the research on this subject.

The research my ATLAS agent found is unambiguous on this point. Agents fail structurally beyond 4 to 5 sequential steps, with compound success probability collapsing to approximately 36% at 20 steps – that is 0.95 to the power of 20, assuming a generous 95% per-step success rate. In practice, per-step success rates in production are often lower, which means the compound failure rate is worse than that. A 20-step workflow with a 30% failure rate per attempt requires an average of 2.8 attempts to complete successfully. Each failed attempt costs money. Each retry costs money. Each human escalation costs money. The “The Long-Horizon Task Mirage” paper from arXiv in April 2026 confirms that as task horizon grows, planning-related and memory-related failures become dominant, but those are architectural problems and not model intelligence problems, meaning you cannot fix them by upgrading to a more expensive model. That is a topic I will return to by the way. . .

Zone II is where factories actually make money. The processes in that category already have a human already in the loop by design, a bounded scope, defined exception paths, and enough structure that the agent knows what done looks like. The HITL design is not a limitation by the way, it is what makes the governance burden manageable and the reliability profile acceptable at current agent technology maturity. A clean Zone II candidate arrives at the hyperautomation shop as a BPMN document with annotated decision points, a defined context boundary, a list of tools the agent is permitted to invoke, and a success metric the finance team can verify

From production experience across those 177 deployments, a well-selected Zone II candidate saves between one and two FTEs per process. But the thing is that they are not abundant, the triage filters hard and most candidates do not survive it, but each one that does delivers measurable savings, manageable governance overhead, and something more valuable than either of those things. It delivers a track record, and it creates the trust you need to venture into Zone III. A Zone II process that runs cleanly for three months is the organizational proof-of-concept that you won’t see in a vendor demo or pilot. This is how you earn the internal capital to have the Zone III conversation without getting laughed out of the steering committee.

Article content

* Read the blogs I mentioned earlier if you want to know more about the 177 deployments.

There are droves of people with Dunning Kruger characteristics that have looked at Microsoft’s and other big AI-tech company’s brochures and are under the impression that a chatbot is an agent. When I talk about agentic AI, I mean the autonomous kind. The stuff without a UI. And yes, if you want to defend big tech, chatbots do have lots of characteristics of an agent, but the main difference is that a chatbot waits for you to command it, and an autonomous agent does not.

ATLAS is a research website dedicated to long-horizon agents, including 700+ papers, >300 success patterns, a book (automatically updated each month), desktop app (beta, good luck), an AI (you have to pay the tokens yourself – I am Dutch, remember), benchmarks, Git repos, and much much more. Link in the comments.


Failure mode two | Governance that costs more than it saves

Alas, my smart friend, governance is not optional for enterprise agentic systems. I am going to say that clearly before I spend several paragraphs explaining how governance kills agentic programs, because the people who need to hear this part often use the first part as an excuse to stop listening.

Article content

After reviewing the research on this subject, and comparing it with our own I experience, I must conclude that the numbers here are quite alarming. Through ATLAS, I found 283 sources of research around agentic governance, and after reading it, it became clear that governance and compliance consumes between 35 and 50 percent of a total AI program cost in regulated enterprises, rising to between 50 and 70 percent in critical applications. Running a cost analysis on our own program found that 60 to 80 percent of the AI budget sit entirely outside inference. The EU AI Act compliance cost for a single high-risk AI system runs between €29,000 and €400,000, and those are pre-implementation estimates from 2021, before anyone had actually tried to implement it. In the research my students and I did across those 177 deployments, post-factum governance overhead exceeded the projected savings in 30 percent of assessed cases.

That is a structural failure repeated across nearly a third of the program portfolio!

But, the problem isn’t so much the governance itself. The real problem is that people apply uniform governance without discrimination across agents supporting processes with fundamentally different risk profiles. Zone I and Zone II processes do not require the same governance architecture as Zone III. Zone II has HITL baked in as its defining characteristic. A human operator is already in the loop, reviewing agent outputs, approving consequential decisions, catching errors before they propagate. Layering a full post-factum audit function on top of a process that already has mandatory human checkpoints is paying twice for the same oversight.

If people ask why you wouldn’t want to run the same governance policies across the board, my normal response would be “have you ever both wore a belt plus suspenders and then hired a guy to stand behind you holding your trousers up?”.

Yeah, thought so.

The answer is of course discriminated governance, calibrated to zone risk rather than to the uniform risk appetite of the legal department. And the deeper answer is the OCG‡ – the Ontological Compliance Gateway architecture we developed for Zone III deployments. The idea behind the OCG is that governance should live inside the agent’s decision layer rather than only around it as a post-factum audit. An agent that cannot make a non-compliant decision because compliance is built into its reasoning is fundamentally cheaper to govern than an agent that makes whatever decision it calculates is optimal and then gets audited afterward. The audit costs money, a remediation costs even more, but building the fence before the cliff costs engineering time once. The math is not complicated, but most people prefer listening to vendors instead of participating in community- practitioner initiatives such as ATLAS.

Arrogance and complacency simply costs money.

This is why the Value and Compliance stream combines these two concerns. Because governance that is too expensive destroys ROI just as effectively as agents that do not work. They are the same problem viewed from different sides of the ledger.

Article content

† For much more on agentic governance, read:

‡ Read:


Failure mode three | Token gluttony

The Dutch have a cultural allergy to excess. It runs deeper than frugality, and it traces back to Calvinist roots that treated conspicuous consumption as a moral failure rather than a social signal. It is typically not considered appropriate to display wealth. Excess without purpose is like eating more than your portion, but taking someone else’s share, and in a country where the land itself was rationed from the sea, well, that distinction was always made clear.

Article content

And this brings us to the point of the agentic cost multiplier, which is something you won’t find in the vendor demo. A single chat call to a frontier model is expensive but it’s bounded. On the other hand, an agentic workflow involving 15 model calls, tool results passed back as context, a reflection loop, and a verification pass can multiply the effective cost of that workflow by 10 to 50 times before governance and failure costs are added. And the thing is that reasoning models make this worse, they generate internal thinking tokens that are not visible in the output but are absolutely visible on the invoice, making them 3 to 10 times more expensive than their headline price says. An agent running a reasoning model across a 20-step workflow can incur costs 30 to 100 times the per-call headline number. That is the number that should appear in the business case, but no, it almost never does because the people making the business cases haven’t spoken with the practitioners.

We found that our large agent workflows require three times more model calls than simple prompts, and that teams systematically underestimate the cost of retry and reflection loops. This tracks with production experience. The retry loop is the cost that surprised us in particular. A process with a 30 percent failure rate effectively multiplies your costs by 1.43 times before you have even thought about governance overhead. Stack that on top of the agentic cost multiplier and you are looking at costs that bear no relationship to the per-token price.

And then there’s this thing called context rot which is a related problem that the research community started documenting seriously around 2025. When the agent workflow encounters context rot, you see the workflow’s performance degrade sharply at around 60 percent context fill – this is consistent across multiple sources – and most production agent failures trace to context mismanagement rather than to model intelligence gaps.

Context rot or context degradation can be overcome if you architect your memory in different layers.

Article content

et me give you a concrete example of what context rot actually looks like in production, because the research description “performance degrades sharply at 60 percent context fill” is accurate but it’s so abstract that it makes it easy to dismiss as someone else’s problem.

I run Hermes, that I use as my own agent framework, and the first version had the memory architecture of a goldfish. Out of the box, Hermes remembered nothing much between sessions, which meant that I spent half my time repeating myself to a system I’d built specifically to stop repeating myself. My first fix was a Karpathy-style LLM Wiki,. that is a structured fact store the agent could query at the start of each session. It worked, briefly, like how a Post-it note on a monitor works.

Facts, yes, but connections between facts, um, no.

So I connected Obsidian, which is my personal knowledge graph, and it has thousands of notes, papers, project histories. This second brain is the accumulated intellectual residue of several years of research, and for a while Hermes was genuinely impressive. It could locate relevant context, connect ideas across documents, and operate with something resembling continuity.

Then the opposite problem appeared.

You have to know that Obsidian is built for humans. Humans navigate it with intuition, skipping irrelevant material, weighting recent decisions over ancient ones, knowing instinctively that the note about a project that no longer exists is archaeology, and not an instruction. But Hermes did not know that, and it retrieved everything with equal enthusiasm, clogging up it’s entire working memory and active tasks sat alongside abandoned experiments. The context window filled and as the context grew, Hermes began losing the thread. It would retrieve irrelevant information and it would forget what it was supposed to be doing halfway through doing it, and sometimes it would abandon a task entirely, simply because the task had become statistically insignificant against the volume of everything else it was holding in memory. When your current objective represents 0.3 percent of your total context, end up with an agent that may have an opinion about your goal, surrounded by a thousand louder opinions about everything else.

The fix wasn’t adding more memory, but rearchitecting it.

Hermes now runs multiple separated layers:

  1. Dragonfly is holding the active agent state. DF is a real-time context engine acting as working memory.
  2. Then there’s PostgreSQL managing episodic memory
  3. Obsidian was retained but demoted to semantic memory only
  4. Qdrant handles retrieval
  5. And Apache Jena is carrying ontological knowledge.

Each layer has a distinct cognitive function, goals live in one place, actions in another, knowledge in a third, and history in a fourth.

I must admit that this separation sounds bureaucratic but when you consider the alternative, which is context degradation, a single undifferentiated mass where everything looks equally important and therefore nothing is truly relevant, it’s an investment that will pay you back. Literally. The lever that actually changes the cost curve is a context engine, a proper architectural layer that manages what enters and exits the agent’s context window at each step, and it enforces relevance boundaries, handles memory externally rather than through accumulation.

But context degradation is just one of the factors contributing to a loss of ROI. When you decide to dig through ATLAS and have a coin or two for the AI (I am not making any money on it), you will find lots of anti-patterns that will end up costing you money. Things like context pollution, bloated tool sets, poor memory handling, vague system prompts. The context window fills up with noise, the agent’s decision quality drops, errors compound, retries accumulate, the invoice goes up, and the program manager moves the status to red and starts looking at consulting firms to replace you, my smart curious friend.

Now, Tokenomics† addresses some of this, and then there’s Patternomics† that reduces failure rates and therefore retry costs. Both are useful, but both are “pennywise”. The lever that actually changes the cost curve is Local Inference.

More on local inference later.

Article content

† Eigenvector/research:

  • Tokenomics for agentic AI and quality-per-token-metrics (Eigenvector/research)
  • Patternomics: A formal theory of execution pattern optimization in enterprise agentic AI systems

Failure mode four | The cost categories that were always in your program but never in the business case

There are four of them and they are all real and documented and almost universally absent from the financial models that got approved by your steering committee.

Let’s kick things off with Value Degradation Over Time – VDOT – because why not an acronym. In the Roundtrip Value Governance research you can find on ATLAS, about 5 to 15 percent annual value erodes as your agents encounter environmental drift and novel cases they were not designed for. An agent that performed well in a controlled pilot will gradually perform worse as the world around it changes. And when you’re doing an expensive and time-consuming post-mortem, you’ll find that the model didn’t update, and the process hadn’t been changed, but when you dig deeper, you’ll find out that the exceptions accumulated. And then the intervention rate of your agent governance team (your agentic process-operators) creeps up. The FTE savings you communicated with pride in the steerco, have simply evaporated, and yet, the program status dashboard still shows green because not a soul is measuring value at the trajectory level, only at the quarterly review level.

Article content

And that is why you need an evidence factory. Y’all feel me?

Adoption and change management is another nail in your coffin. I’ve put this number in the Value Office whitepaper you can download, and it flags 22 percent adoption as the point where a program is already failing, meaning fewer than one in four people is actually using the agent you built. The rest simply found workarounds, or they override the agent on instinct or escalate every borderline case to a human because that is easier than trusting a system they had no hand in designing and no training to operate. You built the agent, but you forgot to bring the people.

Evaluation infrastructure is another one, and the ATLAS research is unambiguous about this.

Every major agent benchmark like SWE-bench, WebArena, OSWorld, GAIA (see ATLAS), has been shown to be exploitable. The gap between benchmark performance and production reliability is real and underestimated. You cannot know whether your agent is working by looking at its leaderboard score, but you need evaluation infrastructure built against your own tasks, your own data and success criteria. That costs engineering time to build but not building it costs considerably more when the agent degrades for three months before anyone notices.

Model update risk is the last one for now.

Frontier API models change without warning and a harness that is fine-tuned against one model version can degrade measurably after an update your provider did without telling you. And then there’s the cost of legacy system integration, API-call costs per tool invocation, and vendor lock-in that completes the picture. A GitClear analysis found that AI-generated code has a 41 percent higher churn rate than human-written code, and that code reviews for AI-heavy pull requests take 26 percent longer. One SaaS company documented gross margin compression from 78 to 52 percent in a single quarter after shipping AI features.

Article content

Failure mode five | Structural failure patterns

A bit about ATLAS.

I’ve built this website as a free knowledge platform where it automatically aggregates the latest research on long-horizon agentic AI, runs automated extraction of success and failure patterns from the literature, maintains a benchmark collection and a pattern library, integrates with GitHub, and publishes the ATLAS Book of Knowledge – a document that updates itself every month as new research is absorbed and new patterns are identified.

Oh yeah, there is now also a desktop app.

Article content

I built this thing initially for myself as a resource because I wanted the actual state of agentic AI rather than another annoying vendor whitepaper written by someone who has never deployed anything in production.

The pattern library in ATLAS shows a consistent distinction between tuning failures and structural failures. Tuning failures like wrong prompts, missing context, poor tool design, are fixable with a bit of engineering elbow grease. But structural failures like context ceiling, planning breakdown, memory drift, compounding errors, goal misalignment, well, these are architectural problems that scaling with a better model does not solve.

Many teams assume that adding more agents automatically produces better results. The evidence suggests otherwise. Across multiple studies, multi-agent systems often delivered only small improvements in accuracy but introducing significantly more complexity, infrastructure, and operational cost. Some benchmark studies even found that multi-agent architectures increased token consumption, coordination overhead, and execution time substantially, and they only produced marginal gains over well-designed single-agent systems.

Article content

The reason is simple. Every additional agent creates new work whereby those agents must exchange information, maintain shared context, coordinate decisions, resolve disagreements, and recover from communication failures. And as the number of agents grows, the cost of coordination grows as well and the additional intelligence gained from a second, third, or fourth agent is smaller than the additional cost required to manage them. This creates a point where adding more agents makes the system more expensive without making it meaningfully better.

That’s the reason I wrote the Patternomics paper. The economy of agentic patterns.

The most successful architectures found in the ATLAS research follow a different pattern. They operate within a clearly bounded scope, limiting what each agent is allowed to do and maintain an explicit state at every step so the system always knows where it is in a workflow, they include verification loops that continuously check whether outputs are correct before moving forward and they are designed to degrade gracefully when problems occur. Most importantly, their failures are visible and recoverable. When something goes wrong, the system exposes the problem and allows for it to be corrected instead of propagating those errors through the workflow.

These engineering choices appear to contribute more to real-world reliability than simply increasing the number of agents involved.


Failure mode six | The frontier model addiction

This is the one that decides if your program is in the red or the black. I have saved it for last because it is also the most fixable, BUT that fix is uncomfortable enough that I want the reader to have absorbed the context before I make the argument.

Article content

The Epoch AI Capabilities Index that was published in October 2025, reported that open-weight models lag behind frontier models only by an average of three months. Yes, only three months. In 2024, that lag was measured at between five and twenty-two months and the gap has closed by more than a year in less than a year. The SWE-bench Verified leaderboard as of May 2026 shows DeepSeek V3.2 resolving 70 percent of real-world GitHub issues, Kimi K2.5 at 70.8 percent, and MiniMax M2.5 at 75.8 percent. These are open-weight models running against the most practically relevant agentic benchmark available. They are competitive with or superior to many closed frontier models. Claude 4.5 Opus leads at 79.2 percent on the same benchmark — so yes, the best frontier model is still the best. For your Zone III long-horizon moonshot that is already breaking structurally at step 12, that 8 percentage point advantage is cold comfort.

The price difference is not subtle either. GPT-5.5 costs about 30 bucks per million output tokens. Groq (the chip, not Lone Skum’s plaything), running Llama 3.1 8B costs $0.08 per million output tokens.

That is a 375x difference.

For a 20-step agentic workflow the cost per successful completion at frontier pricing versus open-weight local inference makes the business case.

And it can even get much much cheaper. Take AMD, ROCm – AMD’s answer to NVIDIA’s CUDA – has matured substantially. Ollama and LM Studio (for home use) now support AMD GPUs out of the box, the installer detects the card and uses it, and the generation speeds on a Radeon 7900 XTX running a quantized 35B model are genuinely competitive for production inference. The Radeon PRO W7900 offers 48GB of VRAM, enough to run large models entirely in memory without the quality penalty of layer offloading. And then there’s the AMD Ryzen AI Max+ a.k.a. Strix Halo, specifically built for local agentic AI rigs. The most interesting piece is not so much the CPU, but their 128 Gb Unified memory, so, just like your Apple, their CPU and GPU and their AI accelerator (like a NPU), all share the same memory pool. This way the GPU can talk to giganormous models like QWen 234B without traditional VRAM limits.

Article content

Ok, these are still consumer grade models, but to bust your bubble, for Agentic AI you don’t need 400B+ models, let alone 1.6 Trillion !! And this is why many organizations who are building agents for process automation are switching to local AMD hardware – even clustered Apple M3 Ultra’s (exo, running MLX models), to run agentic workflows.

The cost profile of on-premise AMD hardware versus frontier API pricing at scale is like comparing black to white. I put the break-even for self-hosting a 70B model on H100 hardware at approximately 500 million to one billion tokens per day. On smaller AMD hardware running a smaller model appropriate for Zone II workflows, that threshold drops considerably.

For Zone II processes – bounded, HITL, structured, with exception paths defined in advance – the three-month intelligence gap between an open-weight model and the frontier is operationally irrelevant. The agent does not need to write poetry or solve novel reasoning problems. It only needs to route an invoice, or flag a compliance exception, extract structured data from a document, and escalate the cases it cannot handle. A well-governed open-weight model running locally does all of that for a fraction of the frontier API cost, with no data leaving your infrastructure, with fixed costs that scale predictably with hardware rather than unpredictably with usage volume.

This is Dutch Tokenomics as an operating principle.

My advice is, be frugal, run locally where volume justifies it, reserve the expensive frontier intelligence for the decisions that actually require it, and be honest with yourself about how few of your Zone II workflows those actually are. Model routing, directing tasks to the cheapest model capable of handling them and reserving frontier APIs for genuinely complex decisions, reduces total inference cost by 60 to 80 percent according to the deep research synthesis.

That is the lever that turns a program in the red into a program in the black.


What a profitable factory actually looks like

The Analysis and Improvement stream triages every process candidate against the PASF zone model before anything gets built. Zone I goes to a chatbot, and you train the end-user to automate it themselves. Zone III goes on the waiting list for when long-horizon agent reliability reaches production maturity, probably in eighteen months, possibly longer.

Article content

Zone II goes to the hyperautomation shop as a clean BPMN package with bounded context, defined exception paths, a success metric attached, a HITL at the end, and a governance specification calibrated to its actual risk level rather than to the legal department’s maximum anxiety.

The Hyperautomation Shop builds against proven patterns from the ATLAS library, with a context engine managing what enters and exits the agent’s context window at each step, a model routing layer that sends Zone II work to locally hosted open-weight inference by default, and a harness architecture that has been stress-tested before the business ever sees it.

The Value and Compliance stream runs discriminated governance calibrated to zone, with OCG handling policy enforcement inside the agent’s decision layer and human process operators focused on genuine exceptions rather than reviewing every step of a workflow that was already designed with HITL checkpoints. The Value Office monitors Trajectory Efficiency† in real time. Above 1.0, the agent is generating more value than it costs to operate – scale it. Below 1.0, the agent is destroying value – redesign or decommission it, and make that decision based on data rather than on the program manager’s attachment to the use case they championed in last year’s roadmap presentation.

Article content

This model is frugal by design, governed by necessity, and measurably profitable at the trajectory level before enterprise-wide rollout is approved. It does not require the smartest model, but simply the right process, a solid agentic harness with the right discriminatory governance architecture, and the discipline to run local inference where the economics support it.

And yes, token prices will keep falling, but enterprise AI bills will keep rising for the organizations that are not counting the right things. The gap between those two curves is your design choice.

Signing off,

Marco

Read the Value Office whitepaper on Eigenvector/whitepaper, or the paper Rountrip Value Governance on Eigenvector/research

Article content

Eigenvector builds Agentification factories at scale, for production environments that actually have to pay-off, and Eigenvector Research occasionally publishes papers about why this is harder than the demos suggest.

👉 Think a friend would enjoy this too? Share the newsletter and let them join the conversation. LinkedIn, Google and the AI engines appreciates your likes by making my articles available to more readers.

Leave a comment