ProfilesML (Early Access)

Use Profiles’ predictive features to train machine learning models.

warning

ProfilesML is part of our Early Access Program, where we work with early users and customers to test new features and get feedback before making them generally available. These features are functional but can change as we improve them. We recommend connecting with our team before running them in production.

Contact us to get access to this feature.

ProfilesML extends Profiles’ standard feature development functionality and lets you easily create predictive features in your warehouse. You can predict boolean features like:

  • Is a customer likely to churn in the next 30 days?
  • Will a user make a purchase in the next 7 days?
  • Is a lead going to convert?

Finally, you can add the predicted feature to user profiles in your warehouse automatically and deliver ML-based segments and audiences to your marketing, product, and customer success teams.

Use case: Churn prediction

Predicting churn is one of the crucial initiatives across businesses. Without a predicted churn score, your actions are reactive, whereas you can act proactively with a user trait like is_likely_to_churn. Once you have such features, you can activate them with the appropriate outreach programs to prevent user churn.

Prerequisites

  • You must be using a Snowflake warehouse.
  • You must set up a standard Profiles project with a feature table model.
  • Optional: If your data set is significantly large, you might need to create a Snowpark-optimized warehouse.

Project setup

Setting up ProfilesML involves four easy-to-follow steps:

  1. Set up a feature table with labels
  2. Configure training parameters to generate the predictive features.
  3. Configure prediction parameters to generate the predictive features.
  4. Schedule periodic predictions to generate the predictive features.

1. Set up a feature_table_model

Follow the Feature table guide to start with a basic Profiles project. Then, mark one of the entity_var as your label. Note that the label must be of Boolean type.

The following example computes the 30-day churn status of a user:

entity_var:
  name: churn_30_days
  select: case when days_since_last_seen >= 30 then 1 else 0 end

2. Training

RudderStack simplifies your training configuration to a set of parameters. Start with a python_model type and mention the following parameters:

train:
    file_extension: .json
    file_validity: 168h
    inputs:
      - packages/feature_table/models/shopify_user_features
    config:
      data: &model_data_input_configs
        package_name: feature_table
        label_column: is_churned_7_days
        label_value: 1
        prediction_horizon_days: 7
        features_profiles_model: 'shopify_user_features'
        output_profiles_ml_model: *model_name
        eligible_users: ''
      preprocessing: 
        ignore_features: [name, gender, device_type]
ParameterDescription
file_extension
Required
The file extension. This is a static value and does not need to be modified.
file_validity
Required
If the last trained model is older than this duration, then the model is trained again.
inputs
Required
Path to the base features project.
package_name
Required
Name of package where the profiles feature table is defined (declared in pb_project.yaml package).
label_column
Required
Column for which we want the predictions.
prediction_horizon_days
Required
Number of days in advance when the prediction should be made.

See Prediction horizon days for more information.
features_profiles_model
Required
Name of the user feature model.
output_profiles_ml_model
Required
Name of the model.
eligible_usersDefinition of the feature that needs to be defined only for a segment of users.

For example, country='US' and is_payer=true
ignore_featuresList of columns from the feature table which the model ignores for training.

3. Prediction

In your python_model, mention the following parameters:

predict:
    inputs:
      - packages/feature_table/models/shopify_user_features
    config:
      data: *model_data_input_configs
      outputs:
        column_names:
          percentile: &percentile_name percentile_churn_score_7_days
          score: churn_score_7_days
        feature_meta_data: &feature_meta_data
          features:
            - name: *percentile_name
              description: 'Percentile of churn score. Higher the percentile, higher the probability of churn'
ParameterDescription
inputs
Required
Path to the base features project.
percentile
Required
Column in the output table having the percentile score.
score
Required
Column in the output table having the probabilistic score.
description
Required
Custom description to give for the feature.

4. Scheduling

  1. Upload your project to a GitHub repository.
  2. Create a Profiles project in the RudderStack dashboard. Use the GitHub repository to set up the project.
  3. Schedule your project with the required cadence. Note that this schedule is for prediction.

Trainings are scheduled as per your configuration of the file_validity parameter in the training section of your project.

Results

Final output or the predicted features are pushed to your customer360 table. Use the Explorer tab to check the predicted value for any given user along with the historical values upto last 5 runs.

info

To check the predicted value for a given user:

  1. In the Preview section, go to the Predictive features tab.
  2. Check the user profile for which the predictive feature has a value.
  3. Search the USER_MAIN_ID of the profile in Profile viewer.
ProfilesML predictive feature value

The value of the predictive feature is a probability. You can consider it as true or false based on your threshold.

All your predictive features are listed separately in the Overview tab of your Profiles project. You can check the logs of each run in the artifacts directory (available in the History tab of your Profiles project).

ProfilesML artifacts

FAQ

Is there a project to understand ProfilesML further?

You can check the Shopify churn model that builds a churn prediction score on top of the Shopify library project.

Contact us

ProfilesML is a part of RudderStack’s Early Access Program. Contact us to get access to this feature.



Questions? Contact us by email or on Slack