When a Frontier AI Lab Builds Its Own Analytics Agent, It Rebuilds the Semantic Layer
June 4, 2026 · 7 min read
A practitioner's read on Anthropic's "How Anthropic enables self-service data analytics with Claude."
Anthropic just published one of the most useful pieces written this year on agentic analytics. It's worth reading in full. What's notable isn't only the quality of the engineering — it's who wrote it. This is the company building the frontier models. If anyone had standing to argue that you can simply point a capable model at a warehouse and let it run, it's them. They argue the opposite.
Their conclusion, after meeting with dozens of their heaviest internal Claude Code users: the accuracy of an analytics agent is a context and verification problem, not a code-generation problem. The SQL is the easy part. The hard part is mapping a person's question to the correct entity in the data model and knowing the right way to work with it. Get that wrong and the agent produces a confident, plausible, wrong answer — what they memorably call a "false sense of precision."
We've been making a version of this argument for a while, so we want to do something more useful than take a victory lap. We want to translate what a team with Anthropic's resources built internally into what the rest of us actually have to ship.
The three failure modes are the real story
Anthropic identifies three attributes that account for the overwhelming majority of wrong answers. They're worth restating because every data leader will recognize them immediately:
- Concept-to-entity ambiguity. "Active users" maps to dozens of plausible tables and definitions. Which actions count as active? Do you include fraudulent accounts? What lookback window? The agent can't choose.
- Staleness. Models, definitions, and schemas change constantly. Assets and agent knowledge rot, and start returning answers that are subtly — and silently — wrong.
- Retrieval failure. The right answer is in the model, properly documented, and the agent still doesn't find it in a search space of millions of fields.
Here's the part that should stop you. Anthropic ran an ablation most vendors would never publish. They gave the agent direct access to their entire corpus of dashboard, transformation, and notebook SQL — thousands of files, every question already answered correctly — and verified the agent actually read it. Accuracy moved by less than a point. The answer was usually already in the corpus. The agent saw it. It still didn't use it.
Their takeaway: the bottleneck was never access to prior work. It was structure — the ability to map a question to the right governed entity. That single null result redirected months of their roadmap.
This is the entire thesis behind a semantic layer, demonstrated by a team that had no commercial reason to demonstrate it. More raw context does not fix ambiguity. Governed meaning does.
What sits at the top of their trust hierarchy
When Anthropic lays out the reference surfaces their agent consults, they rank them by trust. At the very top, ahead of lineage, ahead of the query corpus, ahead of everything: the semantic layer — the compiled metric and dimension definitions. Their agents are structurally required to consult it first. When a question maps to a defined metric, the agent calls a function and gets one number — the same number every other surface in the company produces.
They're equally clear about what doesn't work. They tried bootstrapping the semantic layer by having an LLM auto-generate metric definitions from raw tables and query logs. It produced plausible-looking definitions that encoded the very ambiguities they were trying to remove, and it tested net-negative. Their rule: generate the documentation with AI, but a human owns the definition.
If you've heard us say "semantic layer tools define metrics; the harder job is operationalizing meaning across its whole lifecycle," this is what we meant, validated from the inside of a frontier lab.
The gap between what they built and what you have
Here's where the post needs a translation layer, and where most teams will quietly despair if they read it as a checklist.
Anthropic's stack works because of things most enterprises don't have. A single repository where modeling, semantic definitions, reference docs, and dashboard definitions are colocated, with CI that fails any model change that would break a downstream metric. Skills — curated, versioned procedural knowledge — that are maintained as a first-class engineering artifact, with a code-review hook that flags any reporting-model change that doesn't update its doc. A discipline where roughly 90% of data-model PRs ship a skill change in the same diff. Offline eval suites, ablation at PR granularity, adversarial review sub-agents, provenance footers, and an agent that harvests stakeholder corrections every few hours and opens PRs against the docs.
That is a remarkable operating model. It is also a full-time investment by one of the best data engineering teams in the world, built around their own tooling. The lesson is not "go build all of that." The lesson is that the semantic and governance layer is the part you cannot shortcut, and that it has to be operationalized — owned, enforced, kept fresh, and consistent across every surface where someone asks a question.
That last requirement is worth dwelling on. Anthropic insists the same skill must give the same answer in Slack, in the IDE, in a dashboard, and in a standalone agent session — and they achieve it by syncing one canonical source everywhere. Consistency across surfaces isn't a nice-to-have. It's the difference between a governed system and a pile of agents that each invent their own version of "revenue."
The questions Anthropic tells you to answer first
The most honest section of their post is the one that admits not every practice fits every team. Before you build anything, they suggest aligning on a few principles. Two of them deserve emphasis because they shape the entire architecture:
- What is your comfort around access controls and internal data privacy? Agents get more accurate with more context, but broad data access cuts against most companies' governance posture. Anthropic notes this determines whether you build one agent or many scoped ones. For regulated industries — financial services, insurance, healthcare, government — this isn't a tuning knob. It's the constraint that decides whether agentic analytics is viable at all. A semantic and governance layer that can run inside your perimeter, including on-prem and air-gapped environments, is what turns "we can't let an agent near that data" into "the agent only ever sees governed, consumption-ready definitions."
- How important is a correct answer today versus in the future? Models keep improving. Some scaffolding built for today's model limits becomes moot. But governed meaning, ownership, and freshness are not model-generation problems. They're durable. They were necessary before LLMs and they'll be necessary after the next three model releases. Build the layer that doesn't expire.
Where this leaves us
It would be easy to read Anthropic's post as a Claude Code tutorial, and at one level it is. But strip the product names — which, notably, the post almost entirely avoids — and what remains is an architecture argument: collapse ambiguity into a single governed answer, make that answer discoverable, and flag when it's gone stale. Do those three things and, in their words, the execution and SQL become trivial.
That architecture is exactly what a Semantic Intelligence Platform is for. The semantic layer at the top of the trust hierarchy, the governed lifecycle that keeps it from rotting, the DataOps-native workflow that colocates definitions with the pipelines that produce them, the consistency across every surface, and the ability to do all of it inside your own security perimeter — these aren't features we added because agents got popular. They're the bet we made about where this was heading.
When the people building the models independently arrive at the same blueprint, it's a good sign you were reading the map correctly.
Anthropic names no products in their post and does not endorse any vendor; the alignment here is intellectual, not commercial. We'd encourage every data leader to read the original — it's the clearest articulation we've seen of why governed meaning, not raw model capability, is the bottleneck for trustworthy self-service analytics.
See Semantic Intelligence in Action
Coginiti operationalizes business meaning across your entire data estate.