Python Models

Profiles model for generating predictive features.

Predictive features are generated using a new type of Profiles models called python_model.

Using a Python model

There are two key steps involved in using a Python model - train and predict.

To use a Python model, you need to modify the train and predict blocks in your profiles.yaml file. The following snippet highlights these blocks:

# This is a sample file, for detailed reference see: https://rudderlabs.github.io/pywht/
  - name: shopify_churn
    model_type: python_model
      occurred_at_col: insert_ts
      entity_key: user
      validity_time: 24h # 1 day
      py_repo_url: https://github.com/rudderlabs/rudderstack-profiles-classifier.git
          - packages/feature_table/models/shopify_user_features
              percentile: &percentile_name percentile_churn_score_7_days
              score: churn_score_7_days
            feature_meta_data: &feature_meta_data
                - name: *percentile_name
                  description: 'Percentile of churn score. Higher the percentile, higher the probability of churn'
        file_extension: .json
        file_validity: 60m
          - packages/feature_table/models/shopify_user_features
            label_column: is_churned_7_days
            label_value: 1
            prediction_horizon_days: 7
            model_name: 'shopify_user_features'
      <<: *feature_meta_data

In a Python model, the actual logic resides in a remote location defined by the key py_repo_url. This need not be modified for setting up a predictive feature.

In the train block, you can define the label columns by pointing to the entity_var defined in the feature table model. You also need to define the following:

  • Expected label value for users who performed the event.
  • Horizon days, that is, number of days in advance when the predictions need to be made.
  • Feature table model name defined for your predictive features project. See Set up a feature table model for more information.
  • Criteria for eligible users so the model need not be used to predict for all users. You can set this criteria by defining a SQL statement referring to the different entity_vars. For example:
eligible_users: lower(country) = 'us' and amount_spent_overall > 0

The above example ensures that the model is trained only on the paying users from the US. Also, the model makes predictions only on this set of users.


Note that:

  • The eligible_users key should be added as one more parameter in the data configuration.
  • To build a model based on all available users, you can leave the eligible_users parameter blank.

Optional: Run Python model locally

If you are using the RudderStack dashboard for running the models, you can skip this step. However, note that RudderStack runs the models locally a few time to get the correct setup.

To run python models locally, you need to set up a python environment with the required packages and add the python path to the siteconfig.yaml file.

