Blog
One pipeline, every engine: Stream event data to Snowflake Iceberg tables
One pipeline, every engine: Stream event data to Snowflake Iceberg tables

Drew Dodds
Product Manager
6 min read
March 19, 2026

Your analytics team queries event data in Snowflake. Your data science team needs it in S3 for Spark feature engineering. Your cost-conscious BI team wants to run ad-hoc queries in Athena.
So you build an export job. Then another. Then a sync process to keep them consistent. Before long, three teams are querying three copies of the same data, each slightly different, each requiring its own pipeline to maintain.
This is the multi-engine access problem. Your event data arrives in one place and gets locked there. Every additional engine that needs it means another pipeline, another copy, another source of drift.
RudderStack’s Snowflake Streaming destination now supports Snowflake-managed Apache Iceberg tables. Your event data streams into Snowflake in near real-time, just like before. The difference is where the data lives: open Parquet files on your own cloud storage (S3, GCS, or Azure), accessible by any engine that reads Iceberg.
One pipeline. One copy of data. Every engine.
This is how RudderStack's Snowflake Streaming destination now supports Apache Iceberg tables.
Why streaming into Iceberg changes the multi-engine problem
The lakehouse movement is driven by a straightforward idea: data should live in open formats on storage you control, and any engine should be able to read it.
In practice, getting there has been messy. Most paths to Iceberg involve batch Spark jobs, external catalog management, and infrastructure that someone has to run and maintain. If you already stream event data into Snowflake in real-time, the last thing you want is to add a batch pipeline alongside it just to get data into an open format.
Snowflake-managed Iceberg tables change the equation. Snowflake handles the Iceberg metadata catalog, automatic compaction, and governance. The data lands as Parquet files on your own cloud storage. You keep the same real-time pipeline and query performance you rely on today. But the underlying data is now in a format every engine understands.
That is why we built Iceberg support directly into our existing Snowpipe Streaming integration. No new pipeline. No Spark clusters. No EMR. No external catalog setup. You enable a toggle, and your event data becomes portable.
For teams already evaluating multi-engine strategies, or those fielding requests from data science teams who need S3 access, this removes the architectural trade-off between real-time delivery and data openness.
Main points
How do you enable Iceberg on a Snowflake Streaming destination?
Enabling Iceberg on your Snowflake Streaming destination is a single configuration change: set enableIceberg = true and specify your external volume.
That’s it.
RudderStack automatically creates Iceberg tables with the correct catalog, external volume, and base location. Events continue streaming in near real-time with approximately 30-second latency. Track, identify, page, screen, group, and alias events all flow into Iceberg tables.
What this replaces: manually building a separate export pipeline or batch ETL job to get event data into an open format.
We built this on Snowpipe Streaming because our customers already rely on it for real-time delivery. The question was: can we give them the same speed and the same Snowflake experience, but with data landing in an open format on their own storage? Snowflake-managed Iceberg tables made that possible without adding infrastructure. One toggle, and the data is portable. — Drew Dodds, Product Manager at RudderStack
One copy of data, readable by every engine
With Iceberg tables, your event data is stored as Parquet files on your own S3, GCS, or Azure storage. Snowflake reads it through its catalog. Spark, Trino, Databricks, ClickHouse, and Athena read the same Parquet files directly.
One write. Multiple readers. No sync jobs, no consistency issues, no duplicate storage costs.
For data science teams that need direct file access for training data, the data is already in S3 in Parquet. Point your ML pipeline at it. No extraction step needed.
What this replaces: maintaining separate copies of event data for each query engine, with sync jobs to keep them consistent.
How does RudderStack handle schema changes in Iceberg tables?
Event schemas change. New properties show up as your product evolves. With Iceberg table support, RudderStack detects new event properties and adds columns automatically. Your Iceberg tables stay in sync with your event schemas as they evolve, without manual DDL.
What this replaces: manually running ALTER TABLE statements or rebuilding tables when event schemas change.
No compaction scripts, no Spark clusters
Snowflake manages compaction of the underlying Parquet files, keeping query performance optimized without manual maintenance. Table creation happens automatically when RudderStack sees a new event type.
No EMR clusters. No Spark jobs. No compaction scripts.
What this replaces: self-managed Iceberg pipelines that require manual compaction scheduling and infrastructure to run it.
Built-in portability, without paying for it
The cost of streaming into Iceberg tables is nearly identical to standard Snowflake tables. You are not paying a premium for open format storage.
Think of it as built-in optionality. Today, you query with Snowflake. Tomorrow, if your team adopts Databricks, or if your ML engineers need Spark access, or if you want to run Athena for cheap ad-hoc queries, the data is already there. You don’t need to plan a migration. You don’t need to build a new pipeline. The data is already portable.
What are the current limitations of Snowflake Iceberg support?
We want to be upfront about where this feature has boundaries today:
- JSON fields stored as VARCHAR. Snowflake does not yet support the VARIANT data type for Iceberg tables. JSON data lands as
VARCHARstrings. You can parse it withPARSE_JSON(), but nativeVARIANToperators are not available until Snowflake adds V3 support. - No consolidated USERS table. Streaming is append-only and does not support
MERGEoperations. If you rely on a deduplicated user profile table, you will need to build that downstream or use the batch pipeline alongside streaming. - Approximately 30-second minimum latency. This is a Snowflake-imposed constraint for Iceberg table writes, not a RudderStack limitation.
- Key-pair authentication required. Password-based authentication is not supported for Iceberg table destinations.
How to get started in under five minutes
Getting started takes less than five minutes.
- Create an external volume in Snowflake pointing to your cloud storage (S3, GCS, or Azure). This is a one-time setup that your Snowflake admin handles.
- Set up RSA key-pair authentication for your Snowflake destination in RudderStack. Iceberg requires key-pair auth.
- Enable Iceberg on your Snowflake Streaming destination. Set enableIceberg = true in the destination configuration and specify your external volume name.
- Start sending events. RudderStack handles table creation, schema evolution, and compaction automatically. Your data lands in Snowflake and on your storage simultaneously.
See the Snowflake Streaming to iceberg tables documentation for detailed setup instructions.
Iceberg table support is available now in public beta, included with the Snowflake Streaming add-on at no additional cost. A free trial is available through June 8th
Published:
March 19, 2026
More blog posts
Explore all blog posts
Stop flying blind: Real-time JavaScript SDK debugging with the RudderStack Tracking Assistant
Drew Dodds
by Drew Dodds

Data without compromise: The RudderStack story
Danika Rockett
by Danika Rockett

Snowflake Summit 2025 recap: Launches, live demos, and real-time data
Ricky Spiese
by Ricky Spiese


Start delivering business value faster
Implement RudderStack and start driving measurable business results in less than 90 days.


