Feeling stuck with Segment? Say 👋 to RudderStack.

Log in



Feature Launch: Profiles ML

Eric Dodds
Eric Dodds

Head of Product Marketing

Blog Banner


We'll send you updates from the blog and monthly release notes.

Hacker News

October 3, 2023

Until now, if you wanted to build and operationalize predictive insights, you had to choose between black-box SaaS or building machine-learning capabilities yourself. Both of these options involve major drawbacks. SaaS tools have limited access to data and are often restricted to a specific use case. On the other hand, building the team and infrastructure required to enable predictive insights yourself is a significant investment of time and money.

With the launch of ProfilesML, that’s all about to change.

ProfilesML makes it easy for you to build predictive features on all of your customer data in Snowflake without additional MLOps and infrastructure, and it’s now available for early access.

Adding predictive capabilities to RudderStack Profiles

In June, we launched RudderStack Profiles at Snowflake Summit. Profiles empowers data teams to create a complete view of their customers without the complex modeling work required for identity resolution and user feature generation.

ProfilesML extends the capabilities of Profiles by automating the process of building predictive user features without complex MLOps infrastructure. Because the product is warehouse-native, it benefits from all the data available in your warehouse and allows you to use your existing workflows.

ProfilesML leverages pre-built ML assets and training models to democratize access to ML feature development. Analytics engineers and BI specialists can leverage those assets in a user-friendly UI. At the same time, advanced users retain the flexibility to build and configure predictive features in a code-based dev workflow.

Overcoming the limits of SaaS and DIY

The power of getting predictive features right is immense. Take customer churn, for example. If you identify churn risks and engage with your customers before they leave, you can make a double-digit impact on revenue.

While some SaaS tools can provide predictive ML capabilities, these capabilities are limited by the data inside the tool itself. For example, your email tool may be able to predict a churn risk and recommend the right message to send, but its prediction lacks the context of any customer journey activity or relevant customer information that lives outside of the tool. It’s unable to tap into all of the rich customer data in your warehouse.

The DIY approach eliminates the restrictions of black box SaaS tooling, but it requires significant investment and involves several challenges.

Drawing from our collective experience of building numerous models in production, we’ve found that the true challenge to building ML capabilities lies in the pre-and-post work.

First, data must be collected and cleaned to train models. This prerequisite data engineering work can often consume more time than the training of the model itself. Working with time series data, which is essential for most customer data use cases, adds additional layers of complexity. With time series data, the features for the training data set need to be calculated at a point in history (like when the customer churned). Plus, when definitions change, you have to recreate those event-based datasets, which are often quite large.

Then, once models are trained, they must be deployed into a production environment, where monitoring for model drift, such as deviations from expected precision and recall as new data arrives, becomes essential. Retraining becomes necessary once drift exceeds predefined thresholds.

Each of these processes demands a significant investment in technical resources, both in terms of time and expertise.

Streamlining ML workflows with ProfilesML

We believe generating and acting on predictive insights fueled by a comprehensive data set shouldn’t be a massive undertaking. That’s why we built ProfilesML. The new product makes it easy for you to:

  • Quickly define a churn_score feature and train a provided churn model with warehouse data
  • Enrich customer profiles with model results, giving every customer an up-to-date churn_score
  • Build audiences of customers that are at high risk of churn
  • Send those audiences to a customer engagement platform like Braze so your marketing team can deliver a win-back offer

With Profiles ML, you can do all this without complex MLOps infrastructure or separate workflows. As part of our Warehouse Native Customer Data Platform, ProfilesML builds upon our Event Stream and Profiles products to collect first-party data, stitch identities, define user features, create training data, experiment with different models, and push the ML model with the best fit to production. ProfilesML also monitors your deployed models for drift and automatically retains them to maintain their predictive power.

How does ProfilesML work?

ProfilesML fits within our existing Profiles workflow, so you can build predictive features without changing your existing process. You start building predictive features in 4 steps:

1. Choose your ProfilesML model

ProfilesML currently supports churn and lead scoring, and more models are coming soon. To start, choose the library project for either churn or lead scoring.

2. Specify the predictive feature or define it yourself

Depending on the model you choose, ProfilesML needs either a churn_score or lead_score feature to create training data and train the model.

If you use RudderStack’s eCommerce tracking spec, you can leverage our standardized definitions for churn_score and lead_score. If you have your own custom event data, you can use those definitions as a starting point for defining your own custom definitions. (As a reminder, developing feature definitions in Profiles leverages a simple, declarative, YAML-based workflow, as opposed to complex SQL.)

If you are an existing Profiles user and have already defined either churn_score or lead_score in a Profiles project, you can specify that feature in your ProfilesML project.

3. Push the ProfilesML project to Github and kick off a run

After pushing your model to Github and adding the repo details in the RudderStack UI, you can trigger a run using the job scheduler in the UI. During the run, ProfilesML will:

  • Leverage the feature definition to understand the required training data
  • Create additional training data if necessary
  • Train the selected model
  • Produce an output value for all relevant users

4. Monitor model performance and tweak as necessary

Once a model has run, you can check the performance and tweak the feature definition and/or model parameters as needed to hone accuracy.

Get Started

With ProfilesML, you can build predictive features in your warehouse without MLOps. It enables you to anticipate opportunities for customer retention, upsells, and other personalized experiences to deliver better business outcomes and get more from your cloud data warehouse investment.

Sign up for a demo today to request early access to RudderStack ProfilesML, or contact us to let us know you’re interested, and we’ll let you know once ProfilesML is generally available. If you’re a current Profiles user, you can read the docs to get started.

Eric Dodds
About the authorEric Dodds

Head of Product Marketing

Get Started Image

Get started today

Start building smarter customer data pipelines today with RudderStack. Our solutions engineering team is here to help.