Version:

Compound Aggregations Beta

Learn how to implement compound incremental aggregations like averages and arrays.

This guide explains compound aggregations and how to implement them as incremental features in Profiles.

Overview

Compound aggregations require intermediate calculations or multiple steps, each implemented as a simple composable aggregation. They break complex aggregations into component parts that can be maintained incrementally.

Motivation: When and why to use compound aggregations

Use compound aggregations when:

  • You need to compute averages (AVG, WEIGHTED_AVG), which cannot be computed directly incrementally
  • You need to aggregate arrays or lists that require union and sorting operations
  • Your aggregation requires multiple intermediate calculations

Compound aggregations use:

  • Component parts (like sum and count for averages)
  • Intermediate entity vars maintained incrementally
  • Final calculations that combine components
tip
Tip: If compound aggregations don’t fit your use case, consider Incremental SQL Models for stateful processing or complex transformations.

Real-world examples

Here are common use cases that map to compound aggregations:

Use casePatternExample
Average order valueAVG using sum/countAverage purchase amount across all orders
Average session durationAVG using sum/countAverage time spent per session
Weighted average ratingWEIGHTED_AVG using weighted sum/weight sumAverage product rating weighted by number of reviews
List of countries visitedArray aggregationSorted list of unique countries a user has visited
Array of product IDsArray aggregationList of all product IDs a user has purchased

Average (AVG) using component features

AVG cannot be computed directly incrementally. Express it using sum and count components:

  • Sum component: Maintain sum incrementally using SUM with SUM merge
  • Count component: Maintain count incrementally using COUNT with SUM merge (counts from previous checkpoints are summed together)
  • Average calculation: Divide sum by count in a final entity var with division by zero protection

Example: Average order value

- entity_var:
    name: sum_of_order_values
    select: SUM({{orders.amount}})
    merge: SUM(rowset.sum_of_order_values)
    from: inputs/orders
    is_feature: false
    description: Sum of order values (used for average calculation)

- entity_var:
    name: count_of_orders
    select: COUNT({{orders.order_id}})
    merge: SUM(rowset.count_of_orders)
    from: inputs/orders
    is_feature: false
    description: Count of orders (used for average calculation)

- entity_var:
    name: avg_order_value
    select: "CASE WHEN {{user.count_of_orders}} > 0 THEN {{user.sum_of_order_values}} / {{user.count_of_orders}} ELSE 0 END"
    description: Average order value across all orders

How it works:

  1. Sum component: Maintains total sum incrementally using SUM with SUM merge
  2. Count component: Maintains total count incrementally using COUNT with SUM merge (counts from previous checkpoints are summed together)
  3. Average calculation: Divides sum by count using {{user.var_name}} syntax, with division by zero protection
tip
Tip: Mark sum and count components with is_feature: false since they’re intermediate calculations. The average is computed on-the-fly from the incrementally maintained components.

Workflow

Input Data → [Sum Component (Incremental)] → Final Calculation
           → [Count Component (Incremental)] ↗
  1. Profiles creates intermediate tables for sum and count components
  2. Both components are maintained incrementally using simple aggregation patterns
  3. The average is computed by dividing sum by count (with division by zero protection)

Weighted Average (WEIGHTED_AVG) using component features

WEIGHTED_AVG cannot be computed directly incrementally. You can express it using weighted sum and sum of weights components:

  • Weighted sum component: Maintain sum of (value × weight) incrementally using SUM with SUM merge
  • Sum of weights component: Maintain sum of weights incrementally using SUM with SUM merge
  • Weighted average calculation: Divide weighted sum by sum of weights in a final entity var with division by zero protection

Example: Weighted average product rating

- entity_var:
    name: weighted_sum_of_ratings
    select: SUM({{reviews.rating}} * {{reviews.review_count}})
    merge: SUM(rowset.weighted_sum_of_ratings)
    from: inputs/reviews
    is_feature: false
    description: Sum of (rating × review_count) for weighted average calculation

- entity_var:
    name: sum_of_review_counts
    select: SUM({{reviews.review_count}})
    merge: SUM(rowset.sum_of_review_counts)
    from: inputs/reviews
    is_feature: false
    description: Sum of review counts (weights) for weighted average calculation

- entity_var:
    name: weighted_avg_rating
    select: "CASE WHEN {{user.sum_of_review_counts}} > 0 THEN {{user.weighted_sum_of_ratings}} / {{user.sum_of_review_counts}} ELSE 0 END"
    description: Weighted average product rating across all reviews

How it works:

  • Weighted sum component: Maintains sum of (value × weight) incrementally using SUM with SUM merge -Sum of weights component: Maintains total sum of weights incrementally using SUM with SUM merge
  • Weighted average calculation: Divides weighted sum by sum of weights using {{user.var_name}} syntax, with division by zero protection
tip
Tip: Mark weighted sum and sum of weights components with is_feature: false since they’re intermediate calculations. The weighted average is computed on-the-fly from the incrementally maintained components.

Workflow

Input Data → [Weighted Sum Component (Incremental)] → Final Calculation
           → [Sum of Weights Component (Incremental)] ↗
  1. Profiles creates intermediate tables for weighted sum and sum of weights components
  2. Both components are maintained incrementally using simple aggregation patterns
  3. The weighted average is computed by dividing weighted sum by sum of weights (with division by zero protection)

Array aggregations

For array aggregations, use ARRAY_UNION_AGG or ARRAY_SORT in the merge property to combine arrays from previous checkpoints with new arrays.

warning
Warehouse-specific syntax: Array aggregation syntax varies by warehouse. The examples in this section are specific to Snowflake. Adjust syntax for your warehouse.

Example: List of countries visited

- entity_var:
    name: countries_list
    select: ARRAY_AGG(DISTINCT {{tracks.country}}) WITHIN GROUP (ORDER BY {{tracks.country}})
    merge: ARRAY_SORT(ARRAY_UNION_AGG(rowset.countries_list))
    from: models/tracks
    where: country IS NOT NULL
    description: Sorted list of unique countries

How it works:

  1. Select clause: Creates an array of distinct countries from new data, sorted

  2. Merge property:

    • ARRAY_UNION_AGG(rowset.countries_list): Unions previous and new arrays, removing duplicates
    • ARRAY_SORT(...): Sorts the final array

Array merge functions

Select functionMerge functionNotes
ARRAY_AGG(...)ARRAY_UNION_AGG(rowset.feature_name)Unions arrays, removing duplicates
ARRAY_AGG(...)ARRAY_SORT(ARRAY_UNION_AGG(rowset.feature_name))Unions and sorts arrays

When compound aggregations don’t work

Compound aggregations may not be sufficient when:

  • You need activity windows: Tracking recent activity over time windows (like MAU/DAU) requires Incremental SQL Models with Incremental Input Windows.
  • You need per-object aggregations: Instead of one value per user, you need one value per (user_id, object_id) pair, where object_id could be product_id, location_id, or any other dimension. This requires Incremental SQL Models.
  • You need complex business logic: Multi-step transformations with conditional logic may require Incremental SQL Models.
warning

Input-vars cannot be built incrementally

An entity_var dependent on an input_var cannot be incremental. To enable incremental features, replace the input_var with an Incremental SQL Model and reference the model from the entity_var instead.

Best practices

  • Break AVG into sum and count components with separate merge properties
  • Break WEIGHTED_AVG into weighted sum (value × weight) and sum of weights components with separate merge properties
  • Mark intermediate entity vars with is_feature: false to avoid exposing them as features
  • Add division by zero protection when computing averages (use CASE WHEN count > 0 THEN sum/count ELSE 0 END for AVG, or CASE WHEN sum_of_weights > 0 THEN weighted_sum / sum_of_weights ELSE 0 END for WEIGHTED_AVG)
  • Use ARRAY_UNION_AGG for array aggregations to maintain deduplication
  • Be aware of warehouse-specific syntax differences for array operations
  • Verify incremental results match full refresh results

Troubleshooting

IssueSolution
Average calculation returns incorrect valuesVerify that sum and count components are correctly maintained incrementally and include division by zero protection
Weighted average calculation returns incorrect valuesVerify that weighted sum (value × weight) and sum of weights components are correctly maintained incrementally and include division by zero protection
Division by zero errorsAdd CASE WHEN count > 0 THEN sum/count ELSE 0 END for AVG, or CASE WHEN sum_of_weights > 0 THEN weighted_sum / sum_of_weights ELSE 0 END for WEIGHTED_AVG
Array aggregation syntax errorsCheck warehouse-specific syntax requirements (Snowflake, BigQuery, etc.)
Duplicate values in arraysEnsure ARRAY_UNION_AGG is used in merge property to remove duplicates
Intermediate features appearing in outputMark intermediate entity vars with is_feature: false

See more


Questions? Contact us by Email or on Slack