Why we’re open-sourcing operational expertise for AI agents

Frontier models can pass the bar exam, write production code, and summarize a contract in seconds. Put one behind a real operations desk — a freight brokerage at six in the morning, a plant scheduler reconciling a supplier who just slipped a delivery, a compliance team trying to close the week — and it stalls. Not because it cannot reason. Because it does not know how the work is actually done.
That knowledge has never lived in software. It lives in the operator who has run the desk for fifteen years and knows which carrier to call first when a load falls through, which exception is worth escalating and which one to quietly fix, and what “done” means on a Friday afternoon. None of it is written down. So when you hand a model the job, you are asking it to improvise a craft it has never been taught.
We think that gap is the single biggest reason AI has not yet shown up as real operational leverage in the industries that run the physical economy. So we are doing something about it, in the open.
The gap is not intelligence. It is expertise.
The last two years of model progress have been astonishing and slightly misleading. Benchmarks reward general reasoning — math, code, exams — and models have raced up them. But operations is not a reasoning problem in the abstract. It is a thousand specific judgments, each obvious to the person who makes it and invisible to everyone else.
A capable model dropped into a workflow it has never seen will be confidently wrong in ways that are expensive. It will resolve the ticket the policy says not to resolve. It will book the carrier that is cheaper today and unreliable next week. It will produce a document that looks right and fails audit. The model is not the problem. The missing layer is the codified expertise that tells a generalist how this particular kind of work is done well.
What we are releasing
We are open-sourcing a library of operational capabilities: structured, tested representations of how expert operators handle real workflows across logistics, industrials, energy, and adjacent sectors. Each capability captures the decisions, the order of operations, the edge cases, and the definition of a good outcome for a specific slice of work — exception handling on a transport desk, quality documentation on a line, compliance reporting across sites.
These are not prompts. They are the durable, reviewable units of know-how that turn a general model into something that can hold a desk. They are versioned, they are inspectable, and they are built from working with the people who do the job today.
What it does to a base model
The point of codified expertise is not elegance. It is measured lift. On a benchmark of real operational scenarios — drawn from the actual decisions these desks make, not synthetic tasks — adding our capability library raises a strong base model by 11.8 points.
That number matters because of where it lands. The difference between a model that is right most of the time and one that can be trusted with the work unsupervised is exactly the band that expertise closes. Eleven points, in the right place, is the difference between a copilot someone has to check and an operator that gets the work done.
Why open source
Three reasons, in order of conviction.
First, expertise compounds when it is shared. The fastest way to make AI useful in operations is to stop every team from re-deriving the same hard-won knowledge in private. A common, open library raises the floor for everyone building in this space.
Second, operations expertise should be inspectable. If a system is going to act on a business — re-route freight, file a report, touch a system of record — the knowledge it runs on should be something a customer can read, question, and correct. Open is the honest default for software that does real work.
Third, it is the right way to earn trust in a category that has been oversold. We would rather show the expertise than describe it.
What this is not
This is not our model, and it is not the product. The Evos platform is how an operator is deployed, supervised, and graduated to autonomy on a customer’s own systems. The open library is the layer beneath it — the codified craft — and we think that layer belongs to the field, not to one company.
We will keep adding to it as we work more desks and learn more of what the job actually requires. If you run operations in a legacy industry and want to see your work represented well, we want to hear from you. The whole point is to capture the expertise accurately, from the people who have it.
The future of operations work is autonomous. It will not arrive because models got bigger. It will arrive because the knowledge that has lived in people’s heads for decades finally got written down — and put to work.
