Data and context should move together, not separately

Everyone is talking about context, and for good reason.
AI agents cannot function without it. Raw data alone is insufficient. For an agent to reason, decide, or act, it needs meaning: what an object represents, how entities relate, what a metric actually signifies.
The real question, then, is: Where should the context layer live?
Today, many data catalog and governance tools, whether native to warehouses like Snowflake and Databricks or offered by standalone vendors, attempt to own this layer. While valuable, this approach often creates duplicated logic, fragmented definitions, and heavy operational overhead.
Context becomes something that must be constantly documented, reconciled, and maintained, rather than something that naturally flows with the data itself. Teams end up with a context layer that's always slightly out of sync, requiring manual effort to keep current as schemas evolve, pipelines change, and new data sources come online.
A more durable architecture would treat context not as a separate overlay, but as something that moves with the data.
Context begins in the pipeline
Data pipelines do more than move data. They carry implicit understanding of source systems. When an ETL tool ingests data from Salesforce, it inherently understands what a Lead, Opportunity, or Contact represents.
For custom objects, teams explicitly configure ingestion, defining how those objects should move and what they mean. In doing so, they are already attaching contextual meaning during transport, not after the fact.
Streaming infrastructure extends this further. Event pipelines like RudderStack allow teams to define schemas, enforce contracts, and standardize event meaning in motion. Context is captured at the moment data is produced, not reconstructed downstream.
This is a meaningful distinction. When context is enforced at the point of collection, it doesn't have to be inferred or reverse-engineered later. The meaning travels with the event from the start.
Transformation layers enrich context
This is where business definitions emerge. Transformation frameworks like dbt or RudderStack Profiles don't just reshape data, they create new meaning from it. Derived attributes, entity joins, feature engineering, and aggregations all introduce higher-order context. As that modeling happens, key metadata questions get answered:
- How is ARR defined?
- What constitutes an active user?
- How is churn calculated?
- Which attributes are canonical vs. derived?
These aren't abstract questions. They're the definitions that determine whether two teams are actually talking about the same thing when they discuss revenue, retention, or engagement.
Metric layers extend this further by establishing centralized, governed definitions for concepts like revenue or LTV. By this stage, context has been progressively enriched from source semantics to modeled business meaning.
The missing abstraction layer
Despite context being generated at every stage, it remains fragmented. Each layer understands context locally. Pipelines understand source semantics. Streaming systems understand event structure. Transformations understand derived meaning. Metric layers understand business definitions. But none of these layers talk to each other in a way that preserves context end-to-end.
What's missing is a unifying abstraction that connects these signals across the full lifecycle, tracing meaning from source to activation, keeping definitions consistent as data moves, and exposing context to downstream consumers, including AI agents.
Instead of reconstructing context after data lands, the architecture would preserve and propagate it throughout. The practical implication is significant: AI agents would inherit consistent, trustworthy context rather than having to work around fragmented or contradictory definitions that were never designed to travel together.
This is also a governance problem. When context lives in a separate catalog rather than in the pipeline itself, governance becomes reactive. You document what happened rather than enforcing what should happen. A pipeline-native context layer flips that: definitions are enforced at the point of production, and downstream consumers, human or AI, get context they can rely on.
Where this leads
If AI agents are the consumers of modern data stacks, context is their operating system. Treating it as static documentation or a warehouse-bound catalog limits its usefulness. The better architecture is one where context is portable, continuously enriched, and intrinsically tied to data movement itself.
Data pipelines shouldn't just move data. They should move understanding.
Published:
February 20, 2026








