Why AI agents shouldn’t generate SQL directly

Why better metadata won't save you, and the architectural shift that changes what agents output

The Data Maturity Guide

A practical four-stage guide to driving impact with customer data. Complete with case studies and implementation strategies.

Part 2 in the series

Part 1 of this series examined why incrementality is harder than it looks: it is not one problem, but a family of problems, and tools built for time-grained analytics outputs don't map cleanly onto entity-grained activation use cases. This post builds on that foundation. Where Part 1 focused on execution, this one focuses on authoring: specifically, what AI agents should be asked to produce in the first place.

A new assumption is taking hold across the data stack: If AI agents are going to work with data reliably, they need better context. More metadata. Richer schemas. Cleaner business definitions. A stronger semantic layer.

That instinct is directionally right. Better context does help, but it does not solve the deeper architectural problem.

The problem is not only that agents lack context. The problem is that most architectures still expect agents to output SQL directly.

And SQL is the wrong target.

Better metadata improves what agents know. It does not change what they produce.

SQL already won the data war

Before arguing that SQL is the wrong target for agents, it is worth stating clearly: SQL absolutely won the data war.

Every major warehouse speaks SQL. Every modern analytics stack depends on it. Tools that originally positioned themselves as alternatives to SQL often ended up adopting SQL interfaces anyway. Even newer systems for streaming, lakehouse analytics, and distributed compute have converged on SQL-like abstractions because they are familiar, portable, and expressive.

Technology	Started as	Ended up
No SQL databases	"No SQL"	Added SQL interfaces (CQL, N1QL, PartiQL)
Spark/Hadoop	MapReduce jobs	SparkSQL became the dominant interface
Streaming systems	Custom APIs	KSQL, Flink SQL everywhere
dbt	SQL-first from day one	Still SQL templates

Even technologies designed to replace SQL ended up embracing it. That is settled.

This is not an argument against SQL itself. SQL is still the right execution layer for a huge portion of the data stack. It is still what warehouses optimize for. It is still what compilers should often emit.

But there is a difference between the right execution target and the right authoring surface. That distinction becomes much more important when the author is no longer only a human analyst or data engineer. It is also an AI system.

Why agents generating SQL directly breaks down

AI agents are becoming a fixture in enterprise software. Gartner expects 40% of enterprise apps to embed task-specific agents by the end of 2026. The natural assumption is that those agents will generate SQL: Give an LLM a schema and a question, get back a query.

That works surprisingly well for lightweight exploration and troubleshooting. But once you move beyond one-off querying and into real production workflows, the cracks show quickly. Three problems:

Problem 1: SQL is stateless

SQL starts fresh every time. That is fine for ad hoc querying. But it’s a serious liability for persistent systems that need to understand what has already been computed, what changed, what is stale, and what can be skipped.

This is the same family of problems examined in Part 1. Incrementality is hard precisely because efficient systems need memory. They need to know whether to recompute everything, process only new slices, or selectively refresh downstream dependencies.

A raw SQL-generating agent has no shared transformation state by default. It does not know what another agent already computed. It doesn’t know what was materialized yesterday. It does not know which parts of the graph are reusable and which are invalidated.

So it does what stateless systems do. It regenerates.

The stateless tax

A simple example makes this point: When using llm.clickhouse.com to plot PyPI download stats from a natural language prompt produced a beautiful, complete visualization. Genuinely impressive.

Then I asked for an update with new data. The AI regenerated everything: re-fetched all the data, re-rendered the complete HTML and CSS. Roughly 1,800 tokens for what should have been a small delta. Ten updates later, that was 18,000 tokens for what a stateful system would have handled incrementally.

This is not a criticism of llm.clickhouse.com. It is a phenomenal tool. But it reveals the architectural limitation clearly: An agent outputting SQL has no state. It cannot know what was computed yesterday. Every request is a fresh start.

That may be acceptable for a dashboard question, but it is much more dangerous for activation pipelines, feature pipelines, or entity-level models that need to behave predictably over time.

Problem 2: SQL is too low-level for a probabilistic author

If you would not let end users write arbitrary SQL directly against production systems, it is worth asking why you are comfortable letting agents do it.

That is not because the agent is bad at SQL. In many cases, it may be quite good. It is because SQL is too close to the implementation layer.

When an agent outputs SQL directly, there is no meaningful contract boundary between a probabilistic planner and deterministic execution. The output can vary from run to run. There is no stable intermediate representation to review, test, version, or reason about. There is only generated text that happens to be executable.

The problem is not just query correctness. It is architectural control.

Problem 3: SQL bypasses governance by default

A semantic layer can define metrics. A catalog can describe lineage. A policy system can label a field as sensitive.

But if the agent still outputs SQL, those constraints are often advisory until the moment the warehouse blocks something. Governance happens too late and too indirectly.

You can document that an email field is private. You can note that opt-out users should be excluded. You can annotate ownership and lineage. If the authoring interface still allows arbitrary query construction, governance is not built into authoring. It is bolted on around it.

That’s better than nothing, but it is not the same as enforcement by construction.

Why semantic layers help, but still fall short

In 2025, every major data platform shipped semantic layer features. dbt added a semantic layer with natural language query support. Snowflake shipped native semantic views and Cortex Analyst. Databricks added Unity Catalog Metric Views.

The thesis: If agents have a business dictionary, with metrics, dimensions, and relationships defined, they can finally navigate data safely. This is real progress. Semantic layers add meaning, standardize definitions, and give agents context beyond raw schemas.

But it is not enough.

Limit 1: Sub-optimal by position

The semantic layer arrives after the Transform step in ELT. It annotates outputs. It does not drive production.

That positional limitation matters because many of the hardest optimization, dependency, and governance decisions happen before the final semantic view even exists.

This means the semantic layer has no concept of what was computed before, what changed, what can be skipped. You cannot optimize what you cannot see.

A concrete example: You have an events view with 100 million rows, a union of every event type. Your semantic layer exposes it as "customer events."

Clean. Governed.

An agent needs only purchase events, which number about 1 million, buried in that 100-million-row view. The semantic layer may even know that purchase events originate from a smaller upstream source.

The optimization is obvious: query the 1-million-row source table, not the 100-million-row view. But to do that, the agent must bypass the semantic layer, touch pre-transformed data, and generate raw SQL. The moment it does, you have lost governance. You have lost incrementality.

You’re back to square one.

The trap: Metadata might tell you the optimization exists. But to exploit it, you abandon everything the semantic layer promised. This is not a hypothetical edge case. It’s a typical Tuesday.

Limit 2: Direct data access to probabilistic agents

You would not let end users query your database with raw SQL. You give them an API instead.

So why let agents bypass what you would never allow humans to bypass?

The semantic layer provides context, not contracts. Agents still generate SQL directly, with different output each run, at scale. There is no checkpoint between the probabilistic agent and deterministic execution. An activation agent generating discounts for distressed customers will produce different SQL every time. There is nothing stable to version or audit.

Limit 3: Governance as documentation

The semantic layer describes what data means. It does not constrain what agents can do with it.

You can annotate a column as sensitive. The agent can still run SELECT email FROM users, bypassing the semantic layer entirely. Policies in semantic layers are suggestions. There is no compilation, no automatic propagation, no guarantee.

Metadata to the rescue? Not quite

The semantic layer's weakness is position. It arrives after the data is already modeled. So flip it: what if semantics covered the entire data graph, not just the outputs?

This is what metadata platforms are building. DataHub, Atlan, and Metaphor annotate every artifact: sources, transformations, outputs, dashboards, models, schemas, lineage, policies, ownership. This is important work. Agents need context, and rich metadata provides it.

But notice what has not changed.

The agent reads better context. It still outputs SQL directly. Metadata improves generation context, not the execution contract.

The distinction is architectural:

Approach	Agent outputs	Core limitation
Metadata-as-context(DataHub, Atlan, Metaphor)	SQL directly	Better-informed SQL is still stateless, probabilistic, and ungoverned
Intent-as-contract(Semantic Intent Compiler)	Structured YAML	None of the above: state-aware, deterministic, governed by construction

Both approaches start with rich metadata. The difference is what the agent targets.

Metadata-as-context improves agent inputs. Intent-as-contract changes the agent's output. That is not an incremental improvement. It is a different architecture.

The real shift: Intent as contract

The more durable pattern is not "give agents more context so they can write better SQL." Rather, it’s "change what agents author in the first place."

Instead of:

Natural language → Agent → SQL

The architecture becomes:

Natural language → Agent → Semantic intent → Compiler → SQL

That intermediate layer matters because it creates a boundary between probabilistic planning and deterministic execution. It gives the system something structured to validate, version, test, govern, and optimize before warehouse-specific SQL is ever produced.

This is the architectural shift that matters most. Not richer metadata. Not better prompt engineering. A new authoring surface.

What semantic intent looks like

Semantic intent is a structured, higher-level way to describe what should be computed without forcing the author to specify every low-level implementation detail. Instead of hand-authoring joins, dependency logic, and incremental refresh behavior, the author expresses the desired outcome and lets the compiler handle the rest.

A cohort: A segment of users

Note: The YAML examples below show simplified semantic intent definitions using the RudderStack Profiles model syntax.

YAML
# Who are my known users?

- name: known_users

model_type: entity_cohort

model_spec:

extends: users/all

filter_pipeline:

- type: exclude

value: "{{ users.email_count }} = 0"

An entity feature: A computed attribute per user

YAML
# What's each user's lifetime value?
- entity_var:
    name: lifetime_value
    select: sum(order_total)
    from: inputs/orders
    description: "Total revenue from this user"

An events-driven funnel: A conversion sequence

YAML
# Who engaged with cart but didn't complete purchase?

- name: cart_abandonment_funnel

model_type: events_driven_funnel

model_spec:

entity_key: user

events_spec:

where:

- occurred:

name: e1

type: cart_engaged

- occurred:

name: e2

type: cart_abandoned

- did_not_occur:

name: e3

type: cart_completed

What is notable here is not only that the syntax is cleaner. It is also that the author is no longer being asked to manage every implementation concern. There are no warehouse-specific table names in the business logic. No explicit join graph. No hand-authored incremental orchestration. No requirement to manually encode which downstream assets should be recomputed after a change.

That work moves into the compiler.

Why this architecture is better for AI systems

This is not only better for humans. It is better for AI systems, too. Here’s a breakdown of why:

It gives the model a constrained target

Natural language is too loose. SQL is too low-level.

Structured semantic intent sits in a more useful middle ground. It’s expressive enough to capture non-trivial business logic; constrained enough to validate; and stable enough to persist, diff, and edit incrementally over time. That makes it a better target for both human review and AI-assisted authoring.

It separates authoring from execution

An AI system can translate a request into semantic intent. A deterministic compiler can then resolve dependencies, enforce policies, plan execution order, determine what is stale, and emit SQL. That split matters because it allows AI to accelerate authoring without being trusted to directly control the execution layer.

It enables state-aware updates

This is where the connection back to Part 1 of this blog series becomes especially important.

In that post, we argued that incremental systems need the right primitives because different workloads require different ways of handling change. The same principle applies here.

If intent is compiled through a state-aware system, updates become selective and incremental rather than full regeneration. The architecture can reuse previous work, resolve invalidated dependencies, and avoid recomputing everything because one request changed.

A practical example: RudderStack Profiles

This is not purely theoretical. RudderStack Profiles already works in this direction, letting teams define semantic resources in YAML and compile them into execution plans.

YAML
# profiles.yaml - entity features

- entity_var:

name: first_seen

select: min(cast(timestamp as date))

from: inputs/tracks

- entity_var:

name: days_active

select: count(distinct cast(timestamp as date))

from: inputs/tracks

description: No. of days a user was active

- entity_var:

name: total_sessions

select: count(distinct session_id)

from: inputs/tracks

Dependencies are automatically discovered. You do not declare them. RudderStack Profiles infers them from your semantic references.

You can also compose features from other semantic features:

YAML
# A feature derived from other semantic features

- entity_var:

name: user_lifespan

select: "{{ user.last_seen }} - {{ user.first_seen }}"

description: Days between first and last activity

The reference to {{ user.last_seen }} points to another semantic feature by name. The compiler resolves this dependency, handles incrementality, and generates the SQL. You declare what you want. The compiler handles the wiring.

That is a meaningful difference from asking an agent to invent warehouse SQL on the fly.

Where `this.DeRef()` fits in

Part 1 introduced this.DeRef() as a primitive for state-aware dependency resolution in entity-level incremental systems. That concept becomes even more important in an intent-and-compiler architecture.

A semantic compiler is not only translating YAML into SQL mechanically. It also needs to understand dependency structure, shared state, freshness, and recomputation boundaries. That is exactly the territory where naive SQL generation breaks down.

The compiler has to know what was computed last time, what changed since then, which dependencies are stale, and what can safely be skipped. That is the difference between direct generation and true compilation. A better authoring surface without state awareness is incomplete. A state-aware compiler is what ties the model together.

Agent state vs. transformation state

A common objection: Agent frameworks like LangChain and AutoGen have memory. Why isn't that enough?

Agent state is individual. Each agent tracks its own history, including what queries it ran, what the user asked. Agents don't share state.

Transformation state is collective. It is the sum total of what all agents and processes have computed. When Agent A computes a feature, Agent B should know it exists. When a source table gets new rows, every downstream feature that depends on it needs refreshing.

The Semantic Intent Compiler is the coordination layer. It maintains a single source of truth about transformation state. All agents benefit from work any agent has done. That is not something a scratchpad can provide.

Why YAML is a reasonable target

A natural objection: why not stay with natural language on one end and SQL on the other? Why add another layer?

Because abstraction layers matter. Natural language is a great UX layer for asking questions and expressing rough intent. SQL is a great execution layer for warehouses and compute engines. But neither is ideal as the durable contract layer in between.

Think of it this way: C++ existed. It was powerful. Experts could do anything with it. We still invented Python, Ruby, and JavaScript. Why? Because abstraction level matters. C++ was too close to the machine. Semantic YAML finds a useful middle ground: expressive enough for complex logic, constrained enough for reliable execution.

Layer	Abstraction level	Purpose
Natural language	High (vague wishes)	Exploration, onboarding, rough intent
Semantic YAML	Medium (what to compute)	Structured intent, compliable contracts
SQL	Low (how to compute)	Execution, warehouse optimization

YAML works well as the contract layer because it is human-readable, machine-validated, diffable and versionable, stable across edits, easy for AI systems to generate and modify, and strict enough to support policy enforcement and compilation.

The important idea is not the specific syntax. It is that agents should author intent, not executable warehouse logic.

"Won't models just get better at SQL?"

They probably will. But that does not remove the need for contracts.

A more capable model may generate more correct SQL more often. It may reason better about schemas. It may produce more sophisticated joins and filters. That still does not solve the core problem.

Production systems do not need "usually correct." They need deterministic behavior, reviewable contracts, governed access, and efficient reuse of prior computation. Even a very strong model can still output SQL that is non-composable, non-versionable, or non-compliant with downstream governance requirements. It can still force full regeneration when an incremental update would have been enough.

There is also a subtler concern. An AI optimizing for "generate SQL that answers the question" might generate SQL that technically answers the question while bypassing policies it was told to respect. The smarter AI gets, the more important it becomes to have contracts it cannot violate.

This is not a bet against AI getting better. It is a bet that as AI gets better, it will benefit even more from working at the semantic layer rather than being forced to author at the SQL layer.

The governance test

If your governance is documentation, it’s a suggestion. If your governance is compiled, it is a guarantee.

Can a junior engineer accidentally bypass your privacy policy by writing raw SQL? If yes, you have documentation. If no, you have enforcement. The same question applies to agents.

The bottom line

The industry is asking the right question: How do we give agents better context? But it is not the most important question.

The more important question is: What should agents output?

Semantic layers matter. Metadata matters. Rich context matters. But none of those, by themselves, solve the core problem of AI agents in production data systems. As long as agents generate SQL directly, you are still dealing with stateless authoring for stateful systems, probabilistic output at the execution boundary, and governance that is documented rather than enforced.

The more durable pattern is to move the authoring surface up one level. Let agents produce semantic intent. Let compilers enforce contracts, manage dependencies, preserve state, and generate execution plans.

SQL does not disappear. It becomes what the compiler produces, not what the agent writes. That shift is what makes AI-native data systems reliable rather than aspirational.

Part 3 of this series will take the next step: Decision traces are events, and the same infrastructure patterns used to build Customer 360 systems can also power AI agent context graphs. The compiler is not only for human-authored intent. It is what turns agent behavior into activation-ready data.

Stay tuned for more.

Explore the RudderStack Profiles documentation to see the Semantic Intent Compiler in action

Published:

March 25, 2026

Why better metadata isn’t enough and the architectural shift that changes what agents produce

The Data Maturity Guide

SQL already won the data war

Why agents generating SQL directly breaks down

Problem 1: SQL is stateless

Problem 2: SQL is too low-level for a probabilistic author

Problem 3: SQL bypasses governance by default

Why semantic layers help, but still fall short

Limit 1: Sub-optimal by position

Limit 2: Direct data access to probabilistic agents

Limit 3: Governance as documentation

Metadata to the rescue? Not quite

The real shift: Intent as contract

What semantic intent looks like

A cohort: A segment of users

An entity feature: A computed attribute per user

An events-driven funnel: A conversion sequence

Why this architecture is better for AI systems

It gives the model a constrained target

It separates authoring from execution

It enables state-aware updates

A practical example: RudderStack Profiles

Where this.DeRef() fits in

Why YAML is a reasonable target

"Won't models just get better at SQL?"

The bottom line

More blog posts

Event streaming: What it is, how it works, and why you should use it

From product usage to sales pipeline: Building PQLs that actually convert

RudderStack: The essential customer data infrastructure

Start delivering business value faster

The Data Maturity Guide

The Data Maturity Guide

Where `this.DeRef()` fits in