Early access feature launch: Predictions

Blog Banner

Until now, if you wanted to build and operationalize predictive insights, you had to choose between black-box SaaS or building machine-learning capabilities yourself. Both of these options involve major drawbacks. SaaS tools have limited access to data and are often restricted to a specific use case. On the other hand, building the team and infrastructure required to enable predictive insights yourself is a significant investment of time and money.

With the launch of RudderStack Predictions, that’s all about to change.

Predictions makes it easy for you to build predictive features on all of your customer data in Snowflake without additional MLOps and infrastructure, and it’s now available for early access.

Adding predictive capabilities to RudderStack Profiles

In June, we launched RudderStack Profiles at Snowflake Summit. Profiles empowers data teams to create a complete view of their customers without the complex modeling work required for identity resolution and user feature generation.

Predictions extends the capabilities of Profiles by automating the process of building predictive user features without complex MLOps infrastructure. Because the product is warehouse-native, it benefits from all the data available in your warehouse and allows you to use your existing workflows.

Predictions leverages pre-built ML assets and training models to democratize access to ML feature development. Analytics engineers and BI specialists can leverage those assets in a user-friendly UI. At the same time, advanced users retain the flexibility to build and configure predictive features in a code-based dev workflow.

Overcoming the limits of SaaS and DIY

The power of getting predictive features right is immense. Take customer churn, for example. If you identify churn risks and engage with your customers before they leave, you can make a double-digit impact on revenue.

While some SaaS tools can provide predictive ML capabilities, these capabilities are limited by the data inside the tool itself. For example, your email tool may be able to predict a churn risk and recommend the right message to send, but its prediction lacks the context of any customer journey activity or relevant customer information that lives outside of the tool. It’s unable to tap into all of the rich customer data in your warehouse.

The DIY approach eliminates the restrictions of black box SaaS tooling, but it requires significant investment and involves several challenges.

Drawing from our collective experience of building numerous models in production, we’ve found that the true challenge to building ML capabilities lies in the pre-and-post work.

First, data must be collected and cleaned to train models. This prerequisite data engineering work can often consume more time than the training of the model itself. Working with time series data, which is essential for most customer data use cases, adds additional layers of complexity. With time series data, the features for the training data set need to be calculated at a point in history (like when the customer churned). Plus, when definitions change, you have to recreate those event-based datasets, which are often quite large.

Then, once models are trained, they must be deployed into a production environment, where monitoring for model drift, such as deviations from expected precision and recall as new data arrives, becomes essential. Retraining becomes necessary once drift exceeds predefined thresholds.

Each of these processes demands a significant investment in technical resources, both in terms of time and expertise.

Streamlining ML workflows with RudderStack Predictions

We believe generating and acting on predictive insights fueled by a comprehensive data set shouldn’t be a massive undertaking. That’s why we built Precdictions. The new product makes it easy for you to:

  • Quickly define a churn_score feature and train a provided churn model with warehouse data
  • Enrich customer profiles with model results, giving every customer an up-to-date churn_score
  • Build audiences of customers that are at high risk of churn
  • Send those audiences to a customer engagement platform like Braze so your marketing team can deliver a win-back offer

With Predictions, you can do all this without complex MLOps infrastructure or separate workflows. As part of our Warehouse Native Customer Data Platform, Predictions builds upon our Event Stream and Profiles products to collect first-party data, stitch identities, define user features, create training data, experiment with different models, and push the ML model with the best fit to production. Predictions also monitors your deployed models for drift and automatically retains them to maintain their predictive power.

How does Predictions work?

Predictions fits within our existing Profiles workflow, so you can build predictive features without changing your existing process. You start building predictive features in 4 steps:

1. Choose your Predictions model

Predictions currently supports churn and lead scoring, and more models are coming soon. To start, choose the library project for either churn or lead scoring.

2. Specify the predictive feature or define it yourself

Depending on the model you choose, Predictions needs either a churn_score or lead_score feature to create training data and train the model.

If you use RudderStack’s eCommerce tracking spec, you can leverage our standardized definitions for churn_score and lead_score. If you have your own custom event data, you can use those definitions as a starting point for defining your own custom definitions. (As a reminder, developing feature definitions in Profiles leverages a simple, declarative, YAML-based workflow, as opposed to complex SQL.)

If you are an existing Profiles user and have already defined either churn_score or lead_score in a Profiles project, you can specify that feature in your Predictions project.

3. Push the Predictions project to Github and kick off a run

After pushing your model to Github and adding the repo details in the RudderStack UI, you can trigger a run using the job scheduler in the UI. During the run, Predictions will:

  • Leverage the feature definition to understand the required training data
  • Create additional training data if necessary
  • Train the selected model
  • Produce an output value for all relevant users

4. Monitor model performance and tweak as necessary

Once a model has run, you can check the performance and tweak the feature definition and/or model parameters as needed to hone accuracy.

Get Started

With Predictions, you can build predictive features in your warehouse without MLOps. It enables you to anticipate opportunities for customer retention, upsells, and other personalized experiences to deliver better business outcomes and get more from your cloud data warehouse investment.

Sign up for a demo today to request early access to RudderStack Predictions, or contact us to let us know you’re interested, and we’ll let you know once Predictions is generally available. If you’re a current Profiles user, you can read the docs to get started.

Start building predictive features in your warehouse
Schedule a demo with our team to learn more about RudderStack Predictions
October 3, 2023
Eric Dodds

Eric Dodds

Head of Product Marketing