Version:

How to Migrate a Project to Incremental Beta

Step-by-step guide to migrate an entire Profiles project from non-incremental to incremental mode.

9 minute read

This tutorial walks you through migrating an entire Profiles project from non-incremental to incremental mode. It covers the systematic workflow: auditing your project, configuring inputs, converting features, eliminating input vars, and validating results.

For converting individual features, see How to Make Features Incremental.

Overview

An incremental Profiles project processes only new and changed data on each run, rather than reprocessing all historical data. This reduces compute costs, improves latency, and enables faster data freshness.

Migration follows a specific order of operations:

Audit your project
Verify and mark append-only inputs
Handle ephemeral SQL models
Convert entity vars (add merge clauses)
Eliminate input vars
Validate with --rebase_incremental

Prerequisites

Before starting migration, make sure that:

You have a working, non-incremental Profiles project that runs successfully
You understand Incremental Features concepts
You understand Checkpoints and Baselines
You have access to run queries against your data warehouse

1: Audit your project

Before making any changes, inventory your project to understand what needs to migrate.

Identify all components

Review your project files and list:

Resource	File name	Description
Inputs	`inputs.yaml`	All input sources and their table types
SQL models	`sql_models.yaml`	All SQL template models, noting which are ephemeral
Entity vars	`profiles.yaml`	All entity vars, their aggregation functions, and `from` sources
Input vars	`profiles.yaml`	All input vars (must be eliminated or converted)

Classify inputs

For each input, determine whether it is append-only or mutable:

Input type	Append-only	Examples
Event streams (facts)	Yes	Page views, purchases, clicks, logins
Slowly-changing dimensions	No	User profiles, account settings, subscription status
Snapshot tables	No	Daily balance snapshots, inventory levels

Check for input vars

Input vars scan all input data on every run and cannot be made incremental. If your project has input vars, you must convert them (see Step 5).

2: Verify and mark append-only inputs

For a model to participate in incremental processing, its upstream inputs must be append-only. This step ensures you correctly identify and mark them.

Verify append-only behavior

Use these three checks to verify an input is truly append-only:

1. Business semantics check

Ask: Is the underlying source a fact table (events) or a slowly-changing dimension?

Fact tables (events, transactions, clicks) are typically append-only — rows represent immutable events
Dimension tables (user profiles, account settings) are not append-only — rows are updated in place

2. Warehouse verification query

Run this query against your warehouse to check for duplicate or updated rows:

SELECT entity_id_column, COUNT(*)
FROM your_input_table
GROUP BY entity_id_column
HAVING COUNT(*) > 1;

If this returns rows, inspect whether duplicates represent distinct events (append-only) or updates to the same record (not append-only).

3. Downstream usage inspection

Check how the input is used in SQL models and entity vars. If a model selects the “latest” attribute value (such as current subscription status or country), that suggests the input receives updates and is NOT append-only.

When in doubt about whether an input is append-only, treat it as not append-only.
Incorrectly marking a mutable input as append-only leads to incorrect incremental results that are difficult to diagnose.

Mark append-only inputs

For each verified append-only input, add the contract:

inputs:
  - name: events
    table: PROD.ANALYTICS.EVENTS
    contract:
      is_append_only: true
      is_event_stream: true
    occurred_at_col: event_timestamp

Some key fields in the contract are listed below:

Field	Description
`is_append_only`	Enables delta model generation for incremental processing
`is_event_stream`	Indicates this is an event stream
`occurred_at_col`	The timestamp column used for time-range filtering. Required for incremental inputs.

3: Handle ephemeral SQL models

Ephemeral SQL models defined on top of append-only inputs can also participate in incremental processing.

Identify eligible ephemeral models

Look for SQL models with output_type: ephemeral that:

Read from append-only inputs
Only filter, rename, or transform columns (no aggregations or joins with mutable sources)
Produce output that is itself append-only

Mark ephemeral models as append-only

models:
  - name: purchase_events
    model_type: sql_template
    model_spec:
      occurred_at_col: timestamp
      materialization:
        output_type: ephemeral
      single_sql: |
        {% with events = this.DeRef("inputs/events") %}
          SELECT * FROM {{ events }}
          WHERE event_name = 'purchase'
        {% endwith %}        
      contract:
        is_append_only: true
        is_event_stream: true

Only mark an ephemeral model as is_append_only: true if all of its upstream inputs are also append-only and the model’s SQL only filters or transforms without introducing mutations (no joins with mutable dimension tables, no aggregations that change with updates).

4: Convert entity vars

Add merge clauses to entity vars to enable incremental computation. The merge clause defines how to combine previous feature values with new calculations.

Choose the right pattern

Use the Incremental Features decision guide to determine which pattern applies:

Pattern	When to use	Reference
Simple aggregation	`SUM`, `MIN`, `MAX`, `COUNT`	Simple Aggregations
Compound aggregation	`AVG`, `WEIGHTED_AVG`, `MIN_BY`/`MAX_BY`, arrays, booleans	Compound Aggregations
Incremental SQL model	Stateful processing, rolling windows, multi-step logic	Incremental SQL Models

Quick reference: Common migrations

Simple aggregations: Add a matching merge:

# Before
- entity_var:
    name: first_purchase_time
    select: min(timestamp)
    from: inputs/orders

# After
- entity_var:
    name: first_purchase_time
    select: MIN({{orders.timestamp}})
    merge: MIN({{rowset.first_purchase_time}})
    from: inputs/orders

Window functions: Replace with MAX_BY/MIN_BY:

# Before (non-incremental window function)
- entity_var:
    name: last_login_platform
    select: last_value(platform)
    from: inputs/logins
    window:
      order_by:
        - timestamp

# After (incremental with MAX_BY + helper)
- entity_var:
    name: last_login_platform
    select: MAX_BY({{logins.platform}}, {{logins.timestamp}})
    merge: MAX_BY({{rowset.last_login_platform}}, {{rowset.last_login_platform_by_param}})
    from: inputs/logins

- entity_var:
    name: last_login_platform_by_param
    select: MAX({{logins.timestamp}})
    merge: MAX({{rowset.last_login_platform_by_param}})
    from: inputs/logins
    is_feature: false

See Compound Aggregations for the full MIN_BY/MAX_BY pattern and _by_param helper documentation.

When merge is not needed

Not every entity var requires a merge clause:

Derived-only vars that reference other entity vars (no from clause) are recomputed from their incrementally maintained dependencies:

- entity_var:
    name: days_since_last_seen
    select: "DATEDIFF(day, {{user.last_seen}}, {{ end_time_sql }})"
    description: Days since the user was last seen

Vars from window-bounded incremental tables that read from an incremental SQL model with a sliding window. The model already maintains state incrementally, and the var recomputes from the bounded window each run:

- entity_var:
    name: active_days_in_past_7_days
    select: count(distinct active_day)
    from: models/recent_active_days
    where: active_day >= DATEADD(day, -7, {{ end_time_sql }})

5: Eliminate input vars

Input vars scan all data on every run and block incremental processing. You must convert or replace every input var before the project can run incrementally.

Decision tree

For each input var, follow this decision tree:

Input Var
+-- Can this logic be expressed as an entity var?
|   +-- Yes: Convert to entity_var with merge clause
|   |   (common for window functions like LAST_VALUE, FIRST_VALUE)
|   +-- No
|       +-- Can the input var be replaced by a SQL model?
|       |   +-- Yes: Create a SQL model (possibly incremental) and
|       |   |   reference it from entity vars
|       |   +-- No
|       |       +-- Create an incremental SQL model with full
|       |           DeRef pattern

Example: Convert input var to entity var

This is the most common conversion. Window functions like LAST_VALUE and FIRST_VALUE can typically be replaced with MAX_BY and MIN_BY:

# Before (input var with window function)
- input_var:
    name: last_known_city
    select: last_value(city)
    from: inputs/events
    window:
      partition_by:
        - user_id
      order_by:
        - timestamp
    where: city IS NOT NULL

# After (entity var with MAX_BY)
- entity_var:
    name: last_known_city
    select: MAX_BY({{events.city}}, {{events.timestamp}})
    merge: MAX_BY({{rowset.last_known_city}}, {{rowset.last_known_city_by_param}})
    from: inputs/events
    where: {{events.city}} IS NOT NULL

- entity_var:
    name: last_known_city_by_param
    select: MAX({{events.timestamp}})
    merge: MAX({{rowset.last_known_city_by_param}})
    from: inputs/events
    where: {{events.city}} IS NOT NULL
    is_feature: false

Example: Replace input var with incremental SQL model

When input var logic is too complex for an entity var (multi-step transformations, complex joins), create an incremental SQL model:

# In sql_models.yaml
models:
  - name: recent_active_days
    model_type: sql_template
    model_spec:
      occurred_at_col: timestamp
      materialization:
        output_type: table
      single_sql: |
        {%- set lastThis = this.DeRef(pre_existing=true, dependency="optional", checkpoint_name="baseline") -%}
        {% set events_delta = this.DeRef("inputs/events/incr_delta_tbl", dependency="coercive", baseline_name="baseline", prereqs=[lastThis]) %}
        {% set events_full = this.DeRef("inputs/events", prereqs=[events_delta.Except()]) %}

        SELECT user_id, active_day
        FROM (
          {%- if lastThis.IsEnabled() %}
            SELECT user_id, active_day FROM {{ lastThis }}
            UNION ALL
            SELECT DISTINCT user_id, CAST(timestamp AS DATE) AS active_day
            FROM {{ events_delta }}
          {%- else %}
            SELECT DISTINCT user_id, CAST(timestamp AS DATE) AS active_day
            FROM {{ events_full }}
          {%- endif %}
        )
        WHERE active_day >= DATEADD('day', -90, {{ end_time_sql }})        
      ids:
        - select: user_id
          type: user_id
          entity: user
      contract:
        is_append_only: false

Then reference the model from an entity var:

- entity_var:
    name: active_days_last_30
    select: COUNT(DISTINCT active_day)
    from: models/recent_active_days
    where: active_day >= DATEADD('day', -30, {{ end_time_sql }})

6: Validate with `rebase_incremental`

After completing the migration, validate that incremental results match full refresh results.

Validation workflow

Run a full refresh to establish a baseline:
```
pb run
```
Run again to trigger incremental processing (if new data has arrived):
```
pb run
```
Run with --rebase_incremental to force a full rebuild and compare:
```
pb run --rebase_incremental
```
Compare results: Verify that incremental and full refresh outputs match by comparing row counts, aggregated values, and spot-checking individual entities.

Checklist

Row counts: Entity var tables should have the same number of rows
Value consistency: Feature values should match between incremental and full refresh runs
Performance: Incremental runs should be faster than the full refresh baseline (after the first run)
No errors: Check logs for warnings or errors related to incremental processing

The first run after migration always performs a full refresh to establish the baseline checkpoint. Performance improvements are visible starting from the second run.

Migration checklist

This section contains a detailed migration checklist to convert your project to incremental mode.

Pre-migration

Inventory all inputs, SQL models, entity vars, and input vars
Classify each input as append-only or mutable
Verify append-only inputs with warehouse queries
Identify all input vars that need conversion

During migration

Mark all append-only inputs with is_append_only: true and occurred_at_col
Mark eligible ephemeral SQL models with is_append_only: true
Add merge clauses to all entity vars that read from append-only sources
Replace window functions with MAX_BY/MIN_BY patterns
Convert or replace all input vars
Create incremental SQL models for complex patterns
Use {{ end_time_sql }} instead of CURRENT_DATE or CURRENT_TIMESTAMP() in rolling-window filters

Post-migration

Run pb run to establish baseline
Run pb run again to verify incremental processing
Run pb run --rebase_incremental and compare results
Verify performance improvement on subsequent runs
Remove any unused input vars or deprecated models

Common pitfalls

Pitfall	Description	Solution
Missing `merge` clause	Entity vars without `merge` cannot run incrementally	Add appropriate `merge` clause matching the `select` aggregation
Incorrect merge function	Using `COUNT` instead of `SUM` for count merges	For `COUNT(...)` in `select`, use `SUM({{rowset.feature}})` in `merge`
Missing `_by_param` helper	Using `MAX_BY`/`MIN_BY` without the ordering companion var	Always define a `_by_param` helper var with `is_feature: false`
Mutable input marked append-only	Dimension table incorrectly marked as `is_append_only: true`	Verify append-only behavior before marking; dimension tables should NOT be append-only
Using `CURRENT_DATE` in filters	Non-deterministic results across re-runs	Use `{{ end_time_sql }}` instead
Input vars still present	Input vars block incremental processing	Convert to entity vars or incremental SQL models
Non-composable merge function	Using `AVG` directly in `merge`	Decompose into `SUM`/`COUNT` components via Compound Aggregations

See more

How to Make Features Incremental: Feature-level migration guide
Incremental Features: Pattern selection guide
Simple Aggregations: SUM, MIN, MAX, COUNT patterns
Compound Aggregations: AVG, MIN_BY/MAX_BY, array, boolean patterns
Incremental SQL Models: Stateful processing patterns
Input Var: Input var reference and migration guidance