Version:

Incremental Features Beta

Learn about incremental features in Profiles and how they improve performance for large datasets.

3 minute read

This guide introduces you to the concept of incremental features in Profiles and explains how they improve performance when processing large datasets.

Overview

Incremental features update existing feature values with newly arrived event data instead of recalculating from the entire historical dataset.

Instead of scanning billions of rows on every run, incremental features reuse previously computed results and only process new data that has arrived since the last run.

How incremental features work

When you define an incremental feature, Profiles stores previous results in a checkpoint, identifies new data since the last checkpoint, merges previous values with new calculations using your merge logic, and produces updated feature values.

Mandatory requirement:
Inputs used by incremental features require contract.is_append_only: true to be configured. Without this configuration, the project may throw an error.
See Make Features Incremental for configuration details.

Features built incrementally may drift slightly with every run due to late-landing data.

When to use incremental features

Use incremental features when:

You want to reduce warehouse costs when processing large historical datasets (millions or billions of rows)
New data arrives regularly (daily, hourly, or more frequently)
The volume of new data is significantly smaller than your total dataset
Your features can be expressed using supported incremental patterns

Not all features can be computed incrementally.
Features that require scanning all historical data (such as median or percentile calculations) cannot use incremental computation. For such cases, consider using Timegrains and bundling.

Relationship to other Profiles features

Incremental features work alongside other Profiles capabilities:

Incremental ID stitching: The identity graph can also run incrementally, processing only new identity relationships.
Timegrains: For features that cannot be incremental, timegrains control how frequently they recompute.
Entity Var Bundling: Groups the execution of entity_vars based on their definitions to improve efficiency. This optimization complements incremental features and can help optimize performance for features that cannot be incremental.

Incremental entity vars do not support entity var bundling yet. The bundling algorithm will only bundle non incremental entity vars. Incremental entity vars will run one at a time.

Tip: Input vars are generally discouraged due to performance considerations. For incremental processing, consider using entity_vars with merge property or incremental SQL models.
Contact RudderStack Support to discuss specific use cases that may require input vars.

Understanding composable functions

Composable functions are functions that can be merged incrementally by combining results from different time periods:

Simple composable: SUM, MIN, MAX, COUNT - can be merged directly (e.g., SUM(period1) + SUM(period2) = SUM(all))
Compound composable: AVG, WEIGHTED_AVG - require breaking into multiple simple composable functions (e.g., AVG = SUM / COUNT)

Note that non-composable functions (e.g., Median, percentiles, DISTINCT COUNT (in most warehouses)) cannot be computed incrementally and require full scans.

AVG is not composable directly: You cannot average two averages correctly without knowing the counts. For example, averaging 10 (from 5 values) and 20 (from 3 values) doesn’t give you the correct overall average. You need: (SUM1 + SUM2) / (COUNT1 + COUNT2).

Supported patterns

Profiles supports incremental computation through three main pattern categories:

Pattern	Description
Simple aggregations	Direct composable aggregations using `SUM`, `MIN`, `MAX`, `COUNT` with merge property
Compound aggregations	Aggregations requiring multiple intermediate simple composable aggregations, such as: `AVG(=SUM/COUNT)` `WEIGHTED_AVG(=SUM_OF_WEIGHTED_VALUES/SUM_OF_WEIGHTS)` `LAST_KNOWN_LOCATION(=MAX_BY(location update, update timestamp))` arrays(e.g. `LAST_10_TOUCHPOINTS`)
Patterns requiring input pre-processing before aggregation	An incremental SQL model maintains a running snapshot of input dataset. Non-incremental entity vars are then defined on top of this processed input. For example, window function cases can be made incremental using an intermediate incremental input window model, like `user_touchpoints_in_last_30_days`, to subsequently define `is_daily_active_user` and `is_monthly_active_user` over it

Incremental Features Beta

Overview

How incremental features work

When to use incremental features

Relationship to other Profiles features

Understanding composable functions

Supported patterns

See also

Questions? We're here to help.