Feeling stuck with Segment? Say 👋 to RudderStack.

Log in

Customer Stories

How Pachyderm Uses RudderStack to Master Lead Qualification

Customer Stories Banner

How Pachyderm Uses RudderStack to Master Lead Qualification

Pachyderm is a data science platform that combines Data Lineage with End-to-End Pipelines on Kubernetes


  • Building an efficient data tracking pipeline
  • Getting siloed data into a centralized data store for analysis
  • Gaining deeper insights into user product behavior and optimizing UX to increase customer adoption


  • Unified real-time user events, product usage data, and data from cloud sources into a centralized data warehouse
  • Leveraging enriched, transformed warehouse data for analytics and product optimization
  • Routing enriched data back to downstream tools for inbound and outbound marketing and sales

Pachyderm's Data Stack

  • Data Collection and Synchronization
    RudderStack Event Stream SDKs, Warehouse Actions, & RudderStack Cloud Extract
  • Data Warehouse
    Google BigQuery
  • Data Transformation
  • Business Intelligence
  • Cloud Toolset for Activation Use-cases
    HubSpot, Google Analytics, Facebook Pixel, Intercom, Google Tag Manager, Slack, Salesforce

Pachyderm's data challenges

Pachyderm is a data science platform that lets you easily build and manage your data science pipelines, regardless of their scale and complexity. It allows you to track your data lineage and implement version control for your data. You can set up Pachyderm in your development environment, on the cloud, or use Pachyderm Hub - their fully-managed SaaS platform.

Pachyderm generates gigabytes of data by tracking their users’ product interactions and data from their cloud sources. Previously, all of this data was highly siloed in cloud tools like HubSpot. This meant their data team had to do a lot of plumbing to move the data around to their other marketing and sales tools. As the company evolved and needed more customer insights to grow, they knew they needed a better approach.

They wanted to get all of their data into a centralized data store which they could leverage for product analytics and more efficient marketing.

RudderStack has given us better access to our data. Our data was siloed in cloud sources. Now we have it all in a warehouse, making it accessible to everyone.

Dan Baker, Marketing Ops Manager at Pachyderm

Single Source of Truth for Customer Data

Pachyderm’s data engineering team uses Sigma - a warehouse-focused BI tool - to aggregate and transform the data collected from various sources to build a single source of truth for all their customers’ information.

They then use RudderStack’s Warehouse Actions feature to route this transformed, enriched customer data to downstream destinations like HubSpot (their inbound lead system).

Advanced, Behavior-driven Lead Qualification

When a user first signs up on Pachyderm, the first course of action suggested is to create a workspace. Pachyderm’s customer team encourages this action with drip emails. Once the user has created a workspace, an event is sent from the application backend to their data warehouse.

The team then uses Sigma to determine the total number of workspaces created and workspaces created since the last run, and materialize this data on their data warehouse. This information is then sent back to HubSpot with RudderStack Warehouse Actions. Once in HubSpot — their inbound lead system — this data is synced with Salesforce — their outbound lead system. After the behavioral data from the application has made its way into their CRM, they use Outreach.io to drive their personalized messaging and email campaigns, and (in this example) they stop sending drip emails to a user that has created a workspace.

Pachyderm Data Stack

Sources: JS SDK

Destinations: Google Analytics

Warehouse: GCP BigQuery


We'll send you updates from the blog and monthly release notes.