AI will push data infrastructure toward Infrastructure as Code

For anyone in DevOps or site reliability engineering (SRE), infrastructure as code (IaC) is second nature. It’s hard to imagine setting up servers, load balancers, or networks today without Terraform, Pulumi, or CloudFormation. Declarative configuration, version control, and reproducible environments have become the foundation of modern software delivery.
But in the world of data, we’re still far behind.
The state of data infrastructure today
While the data ecosystem has exploded with tools for ingestion (e.g., Fivetran, RudderStack), transformation (dbt), cataloging (Alation, DataHub), and observability, much of it is still managed manually. Even in sophisticated data teams, configurations often live inside UIs instead of Git. And a pipeline failure might be diagnosed via dashboards as opposed to logs under version control.
Even when pipelines are written as Python or Airflow DAGs (Directed Acyclic Graphs), they rarely have the same infra-as-code guarantees that DevOps enjoys:
- No native state management (what was deployed, when, and by whom)
- No rollbacks to a known-good configuration
- Limited automated testing or drift detection
- Minimal composability across environments (dev → staging → prod)
Before dbt came along, even basic version control for SQL data transformations was rare. The cultural and tooling divide between DevOps and DataOps has kept data teams from adopting the same rigor that transformed software delivery.
Why AI changes the game
AI will be the catalyst that forces—and enables—data infrastructure to become infrastructure-as-code.
AI systems can only manage, debug, or optimize complex environments when those environments are machine-readable. Configs and logs are far more interpretable to AI agents than opaque UI-based settings.
Imagine a future where you simply describe what you want in plain English:
“Ingest the Accounts table from Salesforce into this Snowflake schema every six hours and notify me if row count drops by more than 10%.”
An AI agent could:
- Generate the correct ETL pipeline
- Set up monitoring and alerts
- Write the deployment YAMLs
- Commit them to Git
- And even test the setup before deployment
That’s only possible when the data infrastructure is defined as code: declarative, versioned, observable, and automatable.
AI for DataOps: From reactive to autonomous
Once this foundation is in place, AI can move from reactive automation (“set this up”) to autonomous infrastructure management.
For example:
- “Why is my lead count low today?” → The AI inspects event logs, discovers that the Lead Created event from the website stopped firing, and suggests (or deploys) a fix.
- “Our daily sync from Shopify looks off.” → AI detects a schema drift, adjusts the transformation, and rolls forward safely.
AI agents will effectively become DataOps co-pilots, continuously observing, suggesting, and self-healing your pipelines.
The precondition: Config-driven systems
However, this future hinges on a major prerequisite: config-driven data infrastructure.
Most of today’s commercial data tools are designed for humans, not for automation. Their primary interfaces are web dashboards, which are convenient for analysts, but opaque to code. While dbt and Terraform brought config-based approaches to transformation and cloud infra respectively, many critical data-layer tools (e.g., ingestion platforms, catalogs, or quality monitors) still lack robust IaC support.
That needs to change.
We’ll likely see a major industry push toward declarative data infrastructure, where every component—from pipelines to observability to metadata catalogs—can be expressed as structured configuration (YAML, JSON, or SQL-based DSLs).
This will:
- Make data stacks reproducible and auditable
- Allow version control and rollback
- Enable policy enforcement and CI/CD
- And unlock AI-native management
From Infrastructure as Code → Infrastructure by AI
Once data infrastructure becomes fully declarative, the next leap is inevitable: Infrastructure by AI.
Declarative systems give AI a programmable substrate to operate on, so agents can:
- Understand dependencies across data components
- Generate valid configurations that respect schema and governance rules
- Observe telemetry and modify configs safely
In short, AI will move from “chatting about your data problems” to solving them autonomously.
Closing thoughts
The last decade saw the DevOps revolution powered by infrastructure as code. The next will see a DataOps revolution powered by AI, and the bridge between the two will be data infrastructure as code.
The winners will be the teams—and the tools—that embrace this shift early, designing their systems to be declarative, observable, and AI-manageable from day one.
Published:
October 14, 2025

Event streaming: What it is, how it works, and why you should use it
Event streaming allows businesses to efficiently collect and process large amounts of data in real time. It is a technique that captures and processes data as it is generated, enabling businesses to analyze data in real time

RudderStack: The essential customer data infrastructure
Learn how RudderStack's customer data infrastructure helps teams collect, govern, transform, and deliver real-time customer data across their stack—without the complexity of legacy CDPs.

How Masterworks built a donor intelligence engine with RudderStack
Understanding donor behavior is critical to effective nonprofit fundraising. As digital channels transform how people give, organizations face the challenge of connecting online versus offline giving.