ProfilesML (Early Access)

Use Profiles’ predictive features to train machine learning models.


ProfilesML is part of our Early Access Program, where we work with early users and customers to test new features and get feedback before making them generally available. These features are functional but can change as we improve them. We recommend connecting with our team before running them in production.

Contact us to get access to this feature.

ProfilesML extends Profiles’ standard feature development functionality and lets you easily create predictive features in your warehouse. You can predict boolean features like:

  • Is a customer likely to churn in the next 30 days?
  • Will a user make a purchase in the next 7 days?
  • Is a lead going to convert?

Finally, you can add the predicted feature to user profiles in your warehouse automatically and deliver ML-based segments and audiences to your marketing, product, and customer success teams.

Use case: Churn prediction

Predicting churn is one of the crucial initiatives across businesses. Without a predicted churn score, your actions are reactive, whereas you can act proactively with a user trait like is_likely_to_churn. Once you have such features, you can activate them with the appropriate outreach programs to prevent user churn.


  • You must be using a Snowflake warehouse.
  • You must set up a standard Profiles project with a feature table model.
  • Optional: If your data set is significantly large, you might need to create a Snowpark-optimized warehouse.

Project setup

Setting up ProfilesML involves four easy-to-follow steps:

  1. Set up a feature table with labels
  2. Configure training parameters to generate the predictive features.
  3. Configure prediction parameters to generate the predictive features.
  4. Schedule periodic predictions to generate the predictive features.

1. Set up a feature_table_model

Follow the Feature table guide to start with a basic Profiles project. Then, mark one of the entity_var as your label. Note that the label must be of Boolean type.

The following example computes the 30-day churn status of a user:

  name: churn_30_days
  select: case when days_since_last_seen >= 30 then 1 else 0 end

2. Training

RudderStack simplifies your training configuration to a set of parameters. Start with a python_model type and mention the following parameters:

    file_extension: .json
    file_validity: 168h
      - packages/feature_table/models/shopify_user_features
      data: &model_data_input_configs
        package_name: feature_table
        label_column: is_churned_7_days
        label_value: 1
        prediction_horizon_days: 7
        features_profiles_model: 'shopify_user_features'
        output_profiles_ml_model: *model_name
        eligible_users: ''
        ignore_features: [name, gender, device_type]
The file extension. This is a static value and does not need to be modified.
If the last trained model is older than this duration, then the model is trained again.
Path to the base features project.
Name of package where the profiles feature table is defined (declared in pb_project.yaml package).
Column for which we want the predictions.
Number of days in advance when the prediction should be made.

See Prediction horizon days for more information.
Name of the user feature model.
Name of the model.
eligible_usersDefinition of the feature that needs to be defined only for a segment of users.

For example, country='US' and is_payer=true
ignore_featuresList of columns from the feature table which the model ignores for training.

3. Prediction

In your python_model, mention the following parameters:

      - packages/feature_table/models/shopify_user_features
      data: *model_data_input_configs
          percentile: &percentile_name percentile_churn_score_7_days
          score: churn_score_7_days
        feature_meta_data: &feature_meta_data
            - name: *percentile_name
              description: 'Percentile of churn score. Higher the percentile, higher the probability of churn'
Path to the base features project.
Column in the output table having the percentile score.
Column in the output table having the probabilistic score.
Custom description to give for the feature.

4. Scheduling

  1. Upload your project to a GitHub repository.
  2. Create a Profiles project in the RudderStack dashboard. Use the GitHub repository to set up the project.
  3. Schedule your project with the required cadence. Note that this schedule is for prediction.

Trainings are scheduled as per your configuration of the file_validity parameter in the training section of your project.


Final output or the predicted features are pushed to your customer360 table. Use the Explorer tab to check the predicted value for any given user along with the historical values upto last 5 runs.


To check the predicted value for a given user:

  1. In the Preview section, go to the Predictive features tab.
  2. Check the user profile for which the predictive feature has a value.
  3. Search the USER_MAIN_ID of the profile in Profile viewer.
ProfilesML predictive feature value

The value of the predictive feature is a probability. You can consider it as true or false based on your threshold.

All your predictive features are listed separately in the Overview tab of your Profiles project. You can check the logs of each run in the artifacts directory (available in the History tab of your Profiles project).

ProfilesML artifacts


Is there a project to understand ProfilesML further?

You can check the Shopify churn model that builds a churn prediction score on top of the Shopify library project.

Contact us

ProfilesML is a part of RudderStack’s Early Access Program. Contact us to get access to this feature.

Questions? Contact us by email or on Slack