How to Make Features Incremental
Beta
Learn how to convert features to incremental mode in your Profiles project.
This guide walks you through converting features to incremental mode in Profiles.
Prerequisites
Before making a feature incremental, make sure that:

Not all features can be incremental.
Features that require scanning all historical data (such as median or percentile calculations) cannot use incremental computation. For such cases, consider using Timegrains and bundling.
Before making features incremental, you must configure all input sources and SQL models with the is_append_only contract. This applies to inputs and SQL models (which can also be ephemeral).
Add the following to each input in your inputs.yaml:
inputs:
- name: orders
table: PROD.ANALYTICS.ORDERS
contract:
is_append_only: true
is_event_stream: true
occurred_at_col: order_timestamp
SQL model example
You can also create append-only SQL models that filter or transform inputs. Add this to your sql_models.yaml:
models:
- name: purchase_events
model_type: sql_template
model_spec:
occurred_at_col: timestamp
materialization:
output_type: table
single_sql: |
{% with events = this.DeRef("inputs/events") %}
SELECT * FROM {{ events }}
WHERE event_name = 'purchase'
{% endwith %}
contract:
is_append_only: true
is_event_stream: true
This SQL model filters events to only purchases and is marked as is_append_only: true, making it usable as an incremental source.

All inputs and SQL models used by incremental features must have contract: is_append_only: true. SQL models can also be ephemeral. Without this configuration, the project will throw an error when you try to run incremental features.
Complete example: Simple incremental feature
The following example shows a complete setup for a simple incremental feature that tracks total purchase value:
Input configuration (inputs.yaml):
inputs:
- name: orders
table: PROD.ANALYTICS.ORDERS
contract:
is_append_only: true
is_event_stream: true
occurred_at_col: order_timestamp
Entity var (profiles.yaml):
- entity_var:
name: total_purchases
select: SUM({{orders.value}})
merge: SUM(rowset.total_purchases)
from: inputs/orders
description: Total purchase value across all orders
This example shows:
- Input configured with
contract: is_append_only: true - Entity var using
SUM aggregation with merge: SUM(rowset.total_purchases) - Profiles automatically handles incremental computation
Step 2: Identify the feature pattern
Determine which incremental pattern your feature matches:
See Incremental Features for a pattern selection guide and detailed examples.
Step 3: Add the merge property
Add a merge: property to your entity_var definition in your profiles.yaml file. This property defines how to combine previous feature values with new data.
Simple aggregators
For MIN, MAX, SUM, COUNT, and similar functions:
- entity_var:
name: first_event_time
select: MIN({{events.timestamp}})
merge: MIN(rowset.first_event_time)
from: inputs/events
description: Earliest timestamp of any event
Average using component features
For AVG, break it into sum and count components. Define three separate entity_var entries at the same level in your profiles.yaml:
- entity_var:
name: sum_of_scores
select: SUM({{game_events.score}})
merge: SUM(rowset.sum_of_scores)
from: inputs/game_events
is_feature: false
description: Sum of scores (used for average calculation)
- entity_var:
name: count_of_scores
select: COUNT({{game_events.score}})
merge: SUM(rowset.count_of_scores)
from: inputs/game_events
is_feature: false
description: Count of scores (used for average calculation)
- entity_var:
name: avg_score
select: "CASE WHEN {{user.count_of_scores}} > 0 THEN {{user.sum_of_scores}} / {{user.count_of_scores}} ELSE 0 END"
description: Average score across all games

Features referencing other features
When a feature references another feature (like {{user.sum_of_scores}}), it doesn’t need a merge: clause if it’s just performing a calculation on incrementally maintained features. The referenced features (sum_of_scores and count_of_scores) maintain their incremental state, and the calculation (avg_score) is computed on-the-fly from those values.
Use incremental sources
You can also make features incremental by pointing to an incremental source (such as a delta model or incremental input window). Even if the entity_var SQL runs a full aggregation, the scan volume stays small because the source is already delta-restricted:
- entity_var:
name: recent_purchases
select: SUM({{recent_orders.value}})
from: inputs/orders/incr_delta_tbl
description: Purchases from recent data only
Step 4: Define merge logic
The merge: clause must match the aggregation function in select:. Common patterns:
| Select function | Merge function |
|---|
MIN(...) | MIN(rowset.feature_name) |
MAX(...) | MAX(rowset.feature_name) |
SUM(...) | SUM(rowset.feature_name) |
COUNT(...) | SUM(rowset.feature_name) |
ARRAY_AGG(...) | ARRAY_UNION_AGG(rowset.feature_name) |
For complete merge syntax, see Simple Aggregations and Compound Aggregations.
Which merge function should I use?
The merge function must match your aggregation type. For COUNT, always use SUM in the merge property (summing counts, not counting counts). See the table above for the correct merge function for each aggregation type.
Step 5: Use merge_where when needed
For complex merges, use merge_where: to restrict merging to only changed rows. Define separate entity_var entries at the same level in your profiles.yaml:
- entity_var:
name: timestamp
select: MAX({{game_events.timestamp}})
merge: MAX(rowset.timestamp)
from: inputs/game_events
where: {{game_events.event_type}} = "BREAK"
is_feature: false
description: Timestamp of the most recent break event (used for score lookup)
- entity_var:
name: score_at_last_break
select: any_value({{game_events.score}})
where: {{game_events.timestamp}} = {{user.timestamp}}
merge: any_value({{game_events.score}})
merge_where: rowset.timestamp = {{user.timestamp}}
from: inputs/game_events
description: Score at the most recent break event
The timestamp entity var is marked with is_feature: false since it’s an intermediate calculation used by score_at_last_break.

Tip:
Mark helper entity vars with is_feature: false when they’re intermediate calculations used by other features. This prevents them from appearing as features in your output while still allowing them to be referenced by other entity vars.
Step 6: Test your implementation
- Run
pb run to verify the feature computes correctly - Compare incremental results with a full refresh to ensure accuracy
- Monitor run times to verify improvement

The first run performs a full refresh to establish the baseline checkpoint. Subsequent runs use incremental computation.
Verifying incremental behavior
To verify your feature is running incrementally:
- Check run times: Incremental runs should be faster than full refresh runs (after the first run)
- Compare results: Run with incremental mode and compare results with a full refresh to ensure they match
- Monitor checkpoint usage: Verify that checkpoints are being created and used in subsequent runs
Troubleshooting
| Issue | Solution |
|---|
| Feature values are incorrect | - Verify that your
merge: clause correctly matches the aggregation logic - Ensure that your input sources are append-only or have reliable timestamps
- Check that you are not missing edge cases in merge logic
- Compare incremental results with a full refresh to identify discrepancies
|
| Performance not improving | - Verify that your input sources are properly configured for incremental processing
- Check that delta models are being used correctly
- Ensure your data warehouse supports efficient timestamp filtering
- Monitor checkpoint creation and usage
|
| Merge errors | - Verify that all aggregators in
select: have corresponding merge logic - Check that merge functions match the aggregation type
- Ensure
merge_where: conditions are correct if used - Review Simple Aggregations and Compound Aggregations for correct syntax
|
| The first run is slow | This is expected. The first run always performs a full refresh to establish the baseline checkpoint. Subsequent runs use incremental computation and should be faster. |
See more
Questions? Contact us by Email or on
Slack