How to Migrate a Project to Incremental
Beta
Step-by-step guide to migrate an entire Profiles project from non-incremental to incremental mode.
This tutorial walks you through migrating an entire Profiles project from non-incremental to incremental mode. It covers the systematic workflow: auditing your project, configuring inputs, converting features, eliminating input vars, and validating results.
For converting individual features, see How to Make Features Incremental.
Overview
An incremental Profiles project processes only new and changed data on each run, rather than reprocessing all historical data. This reduces compute costs, improves latency, and enables faster data freshness.
Migration follows a specific order of operations:
- Audit your project
- Verify and mark append-only inputs
- Handle ephemeral SQL models
- Convert entity vars (add
merge clauses) - Eliminate input vars
- Validate with
--rebase_incremental
Prerequisites
Before starting migration, make sure that:
1: Audit your project
Before making any changes, inventory your project to understand what needs to migrate.
Identify all components
Review your project files and list:
| Resource | File name | Description |
|---|
| Inputs | inputs.yaml | All input sources and their table types |
| SQL models | sql_models.yaml | All SQL template models, noting which are ephemeral |
| Entity vars | profiles.yaml | All entity vars, their aggregation functions, and from sources |
| Input vars | profiles.yaml | All input vars (must be eliminated or converted) |
For each input, determine whether it is append-only or mutable:
| Input type | Append-only | Examples |
|---|
| Event streams (facts) | Yes | Page views, purchases, clicks, logins |
| Slowly-changing dimensions | No | User profiles, account settings, subscription status |
| Snapshot tables | No | Daily balance snapshots, inventory levels |
Input vars scan all input data on every run and cannot be made incremental. If your project has input vars, you must convert them (see Step 5).
For a model to participate in incremental processing, its upstream inputs must be append-only. This step ensures you correctly identify and mark them.
Verify append-only behavior
Use these three checks to verify an input is truly append-only:
1. Business semantics check
Ask: Is the underlying source a fact table (events) or a slowly-changing dimension?
- Fact tables (events, transactions, clicks) are typically append-only — rows represent immutable events
- Dimension tables (user profiles, account settings) are not append-only — rows are updated in place
2. Warehouse verification query
Run this query against your warehouse to check for duplicate or updated rows:
SELECT entity_id_column, COUNT(*)
FROM your_input_table
GROUP BY entity_id_column
HAVING COUNT(*) > 1;
If this returns rows, inspect whether duplicates represent distinct events (append-only) or updates to the same record (not append-only).
3. Downstream usage inspection
Check how the input is used in SQL models and entity vars. If a model selects the “latest” attribute value (such as current subscription status or country), that suggests the input receives updates and is NOT append-only.

When in doubt about whether an input is append-only, treat it as not append-only.
Incorrectly marking a mutable input as append-only leads to incorrect incremental results that are difficult to diagnose.
For each verified append-only input, add the contract:
inputs:
- name: events
table: PROD.ANALYTICS.EVENTS
contract:
is_append_only: true
is_event_stream: true
occurred_at_col: event_timestamp
Some key fields in the contract are listed below:
| Field | Description |
|---|
is_append_only | Enables delta model generation for incremental processing |
is_event_stream | Indicates this is an event stream |
occurred_at_col | The timestamp column used for time-range filtering. Required for incremental inputs. |
3: Handle ephemeral SQL models
Ephemeral SQL models defined on top of append-only inputs can also participate in incremental processing.
Identify eligible ephemeral models
Look for SQL models with output_type: ephemeral that:
- Read from append-only inputs
- Only filter, rename, or transform columns (no aggregations or joins with mutable sources)
- Produce output that is itself append-only
Mark ephemeral models as append-only
models:
- name: purchase_events
model_type: sql_template
model_spec:
occurred_at_col: timestamp
materialization:
output_type: ephemeral
single_sql: |
{% with events = this.DeRef("inputs/events") %}
SELECT * FROM {{ events }}
WHERE event_name = 'purchase'
{% endwith %}
contract:
is_append_only: true
is_event_stream: true

Only mark an ephemeral model as is_append_only: true if all of its upstream inputs are also append-only and the model’s SQL only filters or transforms without introducing mutations (no joins with mutable dimension tables, no aggregations that change with updates).
4: Convert entity vars
Add merge clauses to entity vars to enable incremental computation. The merge clause defines how to combine previous feature values with new calculations.
Choose the right pattern
Use the Incremental Features decision guide to determine which pattern applies:
Quick reference: Common migrations
Simple aggregations: Add a matching merge:
# Before
- entity_var:
name: first_purchase_time
select: min(timestamp)
from: inputs/orders
# After
- entity_var:
name: first_purchase_time
select: MIN({{orders.timestamp}})
merge: MIN({{rowset.first_purchase_time}})
from: inputs/orders
Window functions: Replace with MAX_BY/MIN_BY:
# Before (non-incremental window function)
- entity_var:
name: last_login_platform
select: last_value(platform)
from: inputs/logins
window:
order_by:
- timestamp
# After (incremental with MAX_BY + helper)
- entity_var:
name: last_login_platform
select: MAX_BY({{logins.platform}}, {{logins.timestamp}})
merge: MAX_BY({{rowset.last_login_platform}}, {{rowset.last_login_platform_by_param}})
from: inputs/logins
- entity_var:
name: last_login_platform_by_param
select: MAX({{logins.timestamp}})
merge: MAX({{rowset.last_login_platform_by_param}})
from: inputs/logins
is_feature: false
See Compound Aggregations for the full MIN_BY/MAX_BY pattern and _by_param helper documentation.
When merge is not needed
Not every entity var requires a merge clause:
Derived-only vars that reference other entity vars (no from clause) are recomputed from their incrementally maintained dependencies:
- entity_var:
name: days_since_last_seen
select: "DATEDIFF(day, {{user.last_seen}}, {{ end_time_sql }})"
description: Days since the user was last seen
Vars from window-bounded incremental tables that read from an incremental SQL model with a sliding window. The model already maintains state incrementally, and the var recomputes from the bounded window each run:
- entity_var:
name: active_days_in_past_7_days
select: count(distinct active_day)
from: models/recent_active_days
where: active_day >= DATEADD(day, -7, {{ end_time_sql }})
Input vars scan all data on every run and block incremental processing. You must convert or replace every input var before the project can run incrementally.
Decision tree
For each input var, follow this decision tree:
Input Var
+-- Can this logic be expressed as an entity var?
| +-- Yes: Convert to entity_var with merge clause
| | (common for window functions like LAST_VALUE, FIRST_VALUE)
| +-- No
| +-- Can the input var be replaced by a SQL model?
| | +-- Yes: Create a SQL model (possibly incremental) and
| | | reference it from entity vars
| | +-- No
| | +-- Create an incremental SQL model with full
| | DeRef pattern
This is the most common conversion. Window functions like LAST_VALUE and FIRST_VALUE can typically be replaced with MAX_BY and MIN_BY:
# Before (input var with window function)
- input_var:
name: last_known_city
select: last_value(city)
from: inputs/events
window:
partition_by:
- user_id
order_by:
- timestamp
where: city IS NOT NULL
# After (entity var with MAX_BY)
- entity_var:
name: last_known_city
select: MAX_BY({{events.city}}, {{events.timestamp}})
merge: MAX_BY({{rowset.last_known_city}}, {{rowset.last_known_city_by_param}})
from: inputs/events
where: {{events.city}} IS NOT NULL
- entity_var:
name: last_known_city_by_param
select: MAX({{events.timestamp}})
merge: MAX({{rowset.last_known_city_by_param}})
from: inputs/events
where: {{events.city}} IS NOT NULL
is_feature: false
When input var logic is too complex for an entity var (multi-step transformations, complex joins), create an incremental SQL model:
# In sql_models.yaml
models:
- name: recent_active_days
model_type: sql_template
model_spec:
occurred_at_col: timestamp
materialization:
output_type: table
single_sql: |
{%- set lastThis = this.DeRef(pre_existing=true, dependency="optional", checkpoint_name="baseline") -%}
{% set events_delta = this.DeRef("inputs/events/incr_delta_tbl", dependency="coercive", baseline_name="baseline", prereqs=[lastThis]) %}
{% set events_full = this.DeRef("inputs/events", prereqs=[events_delta.Except()]) %}
SELECT user_id, active_day
FROM (
{%- if lastThis.IsEnabled() %}
SELECT user_id, active_day FROM {{ lastThis }}
UNION ALL
SELECT DISTINCT user_id, CAST(timestamp AS DATE) AS active_day
FROM {{ events_delta }}
{%- else %}
SELECT DISTINCT user_id, CAST(timestamp AS DATE) AS active_day
FROM {{ events_full }}
{%- endif %}
)
WHERE active_day >= DATEADD('day', -90, {{ end_time_sql }})
ids:
- select: user_id
type: user_id
entity: user
contract:
is_append_only: false
Then reference the model from an entity var:
- entity_var:
name: active_days_last_30
select: COUNT(DISTINCT active_day)
from: models/recent_active_days
where: active_day >= DATEADD('day', -30, {{ end_time_sql }})
6: Validate with rebase_incremental
After completing the migration, validate that incremental results match full refresh results.
Validation workflow
Run a full refresh to establish a baseline:
Run again to trigger incremental processing (if new data has arrived):
Run with --rebase_incremental to force a full rebuild and compare:
pb run --rebase_incremental
Compare results: Verify that incremental and full refresh outputs match by comparing row counts, aggregated values, and spot-checking individual entities.
Checklist
- Row counts: Entity var tables should have the same number of rows
- Value consistency: Feature values should match between incremental and full refresh runs
- Performance: Incremental runs should be faster than the full refresh baseline (after the first run)
- No errors: Check logs for warnings or errors related to incremental processing

The first run after migration always performs a full refresh to establish the baseline checkpoint. Performance improvements are visible starting from the second run.
Migration checklist
This section contains a detailed migration checklist to convert your project to incremental mode.
Pre-migration
During migration
Post-migration
Common pitfalls
| Pitfall | Description | Solution |
|---|
Missing merge clause | Entity vars without merge cannot run incrementally | Add appropriate merge clause matching the select aggregation |
| Incorrect merge function | Using COUNT instead of SUM for count merges | For COUNT(...) in select, use SUM({{rowset.feature}}) in merge |
Missing _by_param helper | Using MAX_BY/MIN_BY without the ordering companion var | Always define a _by_param helper var with is_feature: false |
| Mutable input marked append-only | Dimension table incorrectly marked as is_append_only: true | Verify append-only behavior before marking; dimension tables should NOT be append-only |
Using CURRENT_DATE in filters | Non-deterministic results across re-runs | Use {{ end_time_sql }} instead |
| Input vars still present | Input vars block incremental processing | Convert to entity vars or incremental SQL models |
| Non-composable merge function | Using AVG directly in merge | Decompose into SUM/COUNT components via Compound Aggregations |
See more
Questions? We're here to help.
Join the RudderStack Slack community or email us for support