AI agents are remarkably good at writing SQL. They've been trained on millions of examples; they understand joins, aggregations, and query structure better than most junior analysts. And yet handing an agent direct access to your warehouse remains one of the fastest ways to generate confidently wrong numbers at scale.
The problem isn't SQL fluency. It's that the agent is being asked to do two jobs at once: retrieve the data and define what the metric means. The second job is where it fails — silently, plausibly, and with full confidence.
The fix is to take the second job away. That's what semantic SQL does, and it's why Coginiti's semantic layer speaks it natively. This post covers what measure-aware SQL actually is, why a metric description in a prompt isn't a substitute, and how Coginiti's implementation works.
The problem semantic SQL solves
Metric definitions drift. Two analysts compute "quarterly revenue" with subtly different SQL — one includes refunds, one doesn't; one filters on invoice date, one on payment date — and both present their number with full confidence. The fix has been understood for thirty years: define the metric once, in one governed place, and make every consumer query against the definition rather than reinventing it.
AI agents made this problem urgent again, and worse. A human analyst who writes a wrong query can usually tell when the result looks off. An LLM can't. It produces a syntactically valid query that computes the wrong number and presents it with the same fluency as a right one. We call this the confidence-reality gap, and it's the single biggest blocker to trusting agents with analytical work.
The semantic layer closes that gap structurally — but only if the agent has a way to query it, not just read about it.
Why a description isn't enough
A tempting shortcut: skip the query interface and hand the agent a markdown file describing your metrics. The agent reads the definitions, writes SQL against the raw tables, done. Plenty of early text-to-SQL prototypes work exactly this way.
It fails for two reasons.
A description is informational, not enforceable. Nothing stops the agent from hallucinating a join, referencing a column the user can't see, or quietly substituting its own notion of "active customer." The metric definition in the markdown file and the SQL the agent actually runs are two separate artifacts with no mechanical connection between them. You're back to drift — now at machine speed.
Measures aren't columns. This is the deeper, more interesting problem. A measure like revenue_per_user or completed_percentage isn't a value sitting in a table. It's a computation whose correct result depends on the grouping context of the query that references it. Revenue per user by region and revenue per user by month expand to different SQL. Expose a ratio measure as a flat column and the agent is forced to pick one aggregation level — almost certainly the wrong one once subqueries or window functions enter the picture. And they always enter the picture: agents and BI tools routinely generate three to five levels of nesting.
SQL evaluates bottom-up — subqueries first, outer query last. Measures need top-down context — the outer query's GROUP BY determines how the inner expression should aggregate. Those two evaluation models are fundamentally incompatible, and no amount of prompt engineering fixes a semantics problem.
One function, different semantics
The resolution is a minimal extension to SQL: the MEASURE() function.
SELECT
product_name,
MEASURE(unit_quantity) AS total_units,
MEASURE(sales_amount) AS total_sales
FROM sales_detail
GROUP BY product_name
ORDER BY total_sales DESC
Syntactically, this is trivial. Any model trained on SQL — which is to say, every model — picks it up immediately. Semantically, it changes everything. MEASURE() is an instruction to the engine: don't compute this here; resolve it from the governed definition in the semantic model, at the correct aggregation boundary for this query. If unit_quantity is defined as a sum in the model, the engine expands it to SUM(unit_quantity). If it's a filtered ratio, the engine expands the full expression at the right GROUP BY level — even when the measure reference sits inside a CTE or derived table.
The agent never writes the aggregation logic. It can't get the definition wrong, because it never touches the definition. That's the difference between asking an agent to please respect your metrics and making it structurally impossible not to.
This is the pragmatic case for SQL as the semantic layer's query language, full stop. MDX is effectively dead outside legacy tools. DAX lives only in one ecosystem. Proprietary REST APIs force every consumer to learn a new interface. SQL plus one function is something every existing tool, driver, and model already speaks — and in our experience, agents that struggle to correctly assemble a ratio metric from raw columns are reliably correct when handed MEASURE(metric_name) against a semantic layer. The hard part was never teaching models SQL. It was removing the opportunity to improvise.
How Coginiti's implementation works
Coginiti's semantic layer is built on Apache DataFusion as the query engine. Queries written in semantic SQL are parsed and analyzed by DataFusion, measures are resolved against the semantic model, and the resulting plan is translated to the native SQL dialect of the target platform — PostgreSQL, BigQuery, Snowflake, Redshift, or Yellowbrick — for execution in your warehouse. Your data never moves; the semantic layer is a translation and governance boundary, not a copy of your data.
A few design decisions worth calling out:
Implicit joins from the semantic model. Relationships between entities are defined once, in the model. When a query references columns from multiple entities, the engine supplies the join conditions automatically:
SELECT product_name, MEASURE(unit_quantity) AS qty
FROM sales_detail_fact
JOIN dim_product
GROUP BY product_name
No ON clause. The agent doesn't need to know the join keys — and therefore can't hallucinate them. Explicit joins remain available when you need control, but the default path is the governed path.
Cross-platform validation at analysis time. Because translated queries must run correctly on every supported platform, the engine validates compatibility before execution and rejects constructs that can't reliably translate — rather than emitting SQL and hoping. The translation layer handles platform differences automatically: DATE_TRUNC argument order on BigQuery, RANGE frame conversion to ROWS on Snowflake and Redshift, identifier quoting conventions, IS TRUE rewrites, and so on. The query you write is the query that runs, everywhere.
Read-only by construction. The semantic layer accepts SELECT statements. No DDL, no DML, no session control. An agent querying through the semantic layer cannot mutate anything — not because we asked it nicely, but because the surface doesn't exist.
Full analytical SQL, not a toy subset. CTEs (including multiple, chained CTEs), derived tables, explicit joins of all types, window functions — ranking, navigation, running aggregates over measures — UNION ALL, HAVING on measures. This matters because real agent-generated queries are messy and multi-level. A semantic SQL implementation that only handles SELECT dim, MEASURE(m) FROM entity GROUP BY 1 collapses on contact with production traffic.
WITH product_sales AS (
SELECT product_name, MEASURE(unit_quantity) AS qty
FROM sales_detail
GROUP BY product_name
)
SELECT
product_name,
qty,
SUM(qty) OVER (ORDER BY qty DESC) AS running_total,
RANK() OVER (ORDER BY qty DESC) AS rank
FROM product_sales
ORDER BY rank
The measure resolves inside the CTE at the product grain; the window functions operate on the materialized result above it. Top-down measure semantics below the boundary, ordinary bottom-up SQL above it.
Semantic SQL is the interface, not the destination
Semantic SQL solves the access problem: it gives every SQL-speaking tool and agent a correct, governed way to query metrics. That's necessary. It isn't sufficient.
A metric definition tells an agent what revenue computes. It doesn't tell the agent that finance recognizes revenue on shipment while sales reports on booking, that the EMEA numbers before March 2024 are restated, or that "active customer" means something different in the churn model than in the board deck. Meaning lives in relationships, lineage, and context — not just in aggregation expressions.
That's why semantic SQL is one access path into Coginiti's semantic layer, alongside MCP for agent integration and the semantic graph that connects definitions to the pipelines, datasets, and documentation around them. Defining metrics is step one. Operationalizing meaning — making the full context queryable, governed, and available to both humans and agents at the moment of analysis — is the actual job.
Getting agents to write correct SQL turned out to be the easy part. The more interesting question is what an agent does with a correct number once it has one.
Coginiti's semantic SQL reference is available in our documentation. If you want to see agents querying a governed semantic layer against your own warehouse, get in touch.
See Semantic Intelligence in Action
Coginiti operationalizes business meaning across your entire data estate.