AI will push data infrastructure to Infrastructure as Code

For anyone in DevOps or site reliability engineering (SRE), infrastructure as code (IaC) is second nature. It’s hard to imagine setting up servers, load balancers, or networks today without Terraform, Pulumi, or CloudFormation. Declarative configuration, version control, and reproducible environments have become the foundation of modern software delivery.

But in the world of data, we’re still far behind.

The state of data infrastructure today

While the data ecosystem has exploded with tools for ingestion (e.g., Fivetran, RudderStack), transformation (dbt), cataloging (Alation, DataHub), and observability, much of it is still managed manually. Even in sophisticated data teams, configurations often live inside UIs instead of Git. And a pipeline failure might be diagnosed via dashboards as opposed to logs under version control.

Even when pipelines are written as Python or Airflow DAGs (Directed Acyclic Graphs), they rarely have the same infra-as-code guarantees that DevOps enjoys:

No native state management (what was deployed, when, and by whom)
No rollbacks to a known-good configuration
Limited automated testing or drift detection
Minimal composability across environments (dev → staging → prod)

Before dbt came along, even basic version control for SQL data transformations was rare. The cultural and tooling divide between DevOps and DataOps has kept data teams from adopting the same rigor that transformed software delivery.

Why AI changes the game

AI will be the catalyst that forces—and enables—data infrastructure to become infrastructure-as-code.

AI systems can only manage, debug, or optimize complex environments when those environments are machine-readable. Configs and logs are far more interpretable to AI agents than opaque UI-based settings.

Imagine a future where you simply describe what you want in plain English:

“Ingest the Accounts table from Salesforce into this Snowflake schema every six hours and notify me if row count drops by more than 10%.”

An AI agent could:

Generate the correct ETL pipeline
Set up monitoring and alerts
Write the deployment YAMLs
Commit them to Git
And even test the setup before deployment

That’s only possible when the data infrastructure is defined as code: declarative, versioned, observable, and automatable.

AI for DataOps: From reactive to autonomous

Once this foundation is in place, AI can move from reactive automation (“set this up”) to autonomous infrastructure management.

For example:

“Why is my lead count low today?” → The AI inspects event logs, discovers that the Lead Created event from the website stopped firing, and suggests (or deploys) a fix.
“Our daily sync from Shopify looks off.” → AI detects a schema drift, adjusts the transformation, and rolls forward safely.

AI agents will effectively become DataOps co-pilots, continuously observing, suggesting, and self-healing your pipelines.

The precondition: Config-driven systems

However, this future hinges on a major prerequisite: config-driven data infrastructure.

Most of today’s commercial data tools are designed for humans, not for automation. Their primary interfaces are web dashboards, which are convenient for analysts, but opaque to code. While dbt and Terraform brought config-based approaches to transformation and cloud infra respectively, many critical data-layer tools (e.g., ingestion platforms, catalogs, or quality monitors) still lack robust IaC support.

That needs to change.

We’ll likely see a major industry push toward declarative data infrastructure, where every component—from pipelines to observability to metadata catalogs—can be expressed as structured configuration (YAML, JSON, or SQL-based DSLs).

This will:

Make data stacks reproducible and auditable
Allow version control and rollback
Enable policy enforcement and CI/CD
And unlock AI-native management

From Infrastructure as Code → Infrastructure by AI

Once data infrastructure becomes fully declarative, the next leap is inevitable: Infrastructure by AI.

Declarative systems give AI a programmable substrate to operate on, so agents can:

Understand dependencies across data components
Generate valid configurations that respect schema and governance rules
Observe telemetry and modify configs safely

In short, AI will move from “chatting about your data problems” to solving them autonomously.

Closing thoughts

The last decade saw the DevOps revolution powered by infrastructure as code. The next will see a DataOps revolution powered by AI, and the bridge between the two will be data infrastructure as code.

The winners will be the teams—and the tools—that embrace this shift early, designing their systems to be declarative, observable, and AI-manageable from day one.

Published:

October 14, 2025

AI will push data infrastructure toward Infrastructure as Code

The state of data infrastructure today

Why AI changes the game

AI for DataOps: From reactive to autonomous

The precondition: Config-driven systems

From Infrastructure as Code → Infrastructure by AI

Closing thoughts

More blog posts

Event streaming: What it is, how it works, and why you should use it

RudderStack: The essential customer data infrastructure

How Masterworks built a donor intelligence engine with RudderStack

Start delivering business value faster

Company

Company

Products

Products

Read our documentation

Resources

Resources

Join the conversation

The Data Maturity Guide