📅 Fireside Chat: Future of Analytics on the Modern Data Stack with Hex, Transform, and RudderStack

Register Now

Streamline lead qualification and get the right data to sales and marketing


In this webinar Eric talks with Dan Baker from Pachyderm, a data science platform that provides end-to-end data science pipelines on Kubernetes.

Dan explains the challenges they faced pulling key internal events from their warehouse and how they implemented RudderStack's How Pachyderm Streamlines Lead Qualification with RudderStack Reverse ETL to solve that problem.

Here's what Eric and Dan cover:

  • How data gets trapped in the warehouse

  • Lead qualification at Pachyderm

  • How was this happening before?

  • Using Reverse ETL to turn internal data into an event stream from the warehouse

  • Live demo of Reverse ETL

  • Q&A


Eric Dodds
Growth at RudderStack

Eric leads growth at RudderStack and has a long history of helping companies architect customer data stacks to use their data to grow.

Dan Baker
Technical Marketing Manager at Pachyderm, Inc.

Dan is a technology addict and automation specialist with over 20 years experience helping businesses realize the potential of their data.


Eric Dodds (00:00)

All right. We'll cover just a couple of housekeeping items. First of all, thanks for joining us today. We will be respectful of your time. So we'll breeze through this content and then leave plenty of time for Q&A. Just a couple of quick things, feel free to raise your hand by clicking the button in Zoom. And feel free to just type the questions in the Q&A as we go along, and then we can address all those at the end of the webinar. Also, if something's wrong or you can't see the screen, just feel free to post in the chat so we can see. And with that, I think we're ready. Are you ready, Dan?

Dan Baker (00:44)

I certainly am, Eric.

Eric Dodds (00:47)

All right. So here's just a quick overview of what we'll cover today. So we'll just do quick introductions, we'll do an overview of lead qualification at Pachyderm and the process there. So Dan will walk us through that. And then Dan and I will talk through the various ways that data gets trapped in silos in the stack and talk through some of the specific ways that that was happening at the Pachyderm. And then we'll talk about a really cool tool called How Pachyderm Streamlines Lead Qualification with RudderStack Reverse ETL from RudderStack that Pachyderm uses to streamline their lead qualification process. So with that, let's jump in. Dan, you're the star of the show, so I'm going to let you go first.

Dan Baker (01:36)

Thanks, Eric. So hi everyone, I'm Dan Baker. I work at Pachyderm. I'm kind of in marketing ops, like data ops, that kind of area. My role involves making sure everyone in every team internally within Pachyderm has the right data to do their job as effectively as they can. And I kind of came across RudderStack a while back while really interested in kind of consuming event data for various websites I was working on at the time. And it's proved to be a greatly successful product for where we're going with the Pachyderm customer data platform.

Eric Dodds (02:19)

Great. And just a little bit of detail on what Pachyderm does.

Dan Baker (02:25)

So probably the best description is Pachyderm's an enterprise-grade open-source data science platform that makes explainable and repeatable and scalable ML and AI a reality. We run a SaaS product as well as having an open-source version. And we have several quite large clients running on both open source and enterprise and our SaaS version.

Eric Dodds (02:56)

That's really cool. I can say firsthand, you should check out Pachyderm if you need tooling around your data science workflow. I'm Eric, I'm the Director of Customer Success at RudderStack. And I do a lot of work with many teams, but I have the privilege of working with customers like Pachyderm, and tons of other customers who are trying to build a data infrastructure. Quick overview on RudderStack, so RudderStack helps you easily build customer data pipelines. So we do that a number of ways in the stack. So event streaming, so from real-time use cases to other use cases with event streams. We also can pull structured data from cloud sources.

And then, what we'll talk about today, one of our most exciting features is Reverse ETL, which allows you to pull tables from your warehouse and turn them into an event stream. So that's RudderStack. So let's dive in, lead qualification at Pachyderm. So Dan, talk us through the challenges you were having in terms of the lead qualification process. And I think the context of what you were doing will be helpful. I know we have a lot of technical people in the audience, but the business case I think is really helpful.

Dan Baker (04:20)

So yeah, I think as it's a pretty typical kind of business case that we've got a bunch of cloud services, data being generated in those services via incoming leads or outgoing communication with those leads or any of those general marketing type processes and sales processes. And we were looking for a solution. We were having issues with getting aggregated data from all of those cloud sources back into specific cloud sources. So like in this case, we'd want to know metrics on our SaaS, our Pach Hub, and know stuff like total workspaces created. So this is like an internal value of how important a customer is. And in the same case, last workspace creation date, and account spend within the Hub.

And aside from that, we'd also want to know how engaged open-source users were in our Slack community and many, many more use cases for this that we haven't even explored yet, but we kind of know where we're going with this and know where our problems were. And until this point, we were spending our time looking around through Slack and through Hub database records, or Metabase, or those kinds of tools. And we were trying to onboard our own internal people into various tools that they weren't really comfortable with. So salespeople using Metabase isn't ideal and people using Slack as a source for understanding what our customers are doing really isn't ideal either. We need to combine that stuff in one place to give this unified view on a particular customer or lead. And so that was our issue. And that's how we came across what RudderStack was doing with their Reverse ETL and solved that by that solution.

Eric Dodds (06:33)

Great. And just a quick question on that, your experience from the sort of... You have a wide purview because you work on the data and the ops side of things, what were the conversations like internally as people were feeling the pain of this, and did that translate into a lot of requests for data from your team?

Dan Baker (06:55)

Yeah, I think there was just an internal lack of understanding reality. And it became obvious that we needed to combine this stuff in one form. And what I was seeing was that yeah, we could go down the dashboard route and we could give people all of this data in one place, but it was more about certain teams were happy with the tools that we're using. So marketing was happy with using HubSpot, sales were happy using Salesforce, neither wanted to switch to each other. And it became more about let's get that aggregated data or that unified view of data back into those tools so that people can see a true view of that customer at that point. And so yeah, I guess it was more about people not really wanting to budge from where they're at because it's just outside their normal workflow.

Eric Dodds (07:48)

Sure. That is really helpful context. So Dan and I thought it'd be good to quickly run through before we get to the specific use case with Reverse ETL, and Dan can talk through their stack, which is really cool. I thought we'd just quickly go through how data gets siloed in the stack. This is something I think every company experiences on some level. So I'll just go through a quick overview. And then Dan, maybe you can talk about the dynamic of this at Pachyderm. So this is something that I think is really common at a lot of companies, so data trapped cloud tools. So you have data flowing from websites, apps, internal systems, et cetera. And usually, that lives in some sort of cloud toolset, right? So as Dan was saying, maybe marketing is using a tool like HubSpot or Marketo, sales are using Salesforce. And then the marketing team is using Google Analytics and Google Ads.

And what ends up happening is you have each of that driving reporting in different ways. And then of course that creates misalignment because not everyone is using the exact same definitions and the reporting is different and you have all these challenges. And then when data is trapped in cloud tools and you have that misalignment, and it's hard to do the analysis to your point, Dan, I think a lot of companies end up in a place where you're just trying to get data out of a lot of those systems. And as crazy as it sounds, a lot of it just ends up in spreadsheets because you have to make decisions, you have to hit quarterly numbers. But that's not scalable, especially with the amount of data that many companies are collecting. Can you talk about just briefly a couple of examples of what were the problems caused by data being trapped in cloud tools?

Dan Baker (09:38)

Yeah. As you quite rightly said Eric, the obvious go-to at this point is that you pull data directly from these sources, these tools, into spreadsheets, like Google Sheets is a great source for that kind of stuff. And we all know these integrations with all of these tools with Google Sheets and we can get data out and we can present it to the right people, it doesn't allow the joining of that data, which is a key part from our perspective. And it was more about... It's kind of a twofold approach, right? So people want to use the tool they're already using, as I mentioned previously, whether it's HubSpot, Salesforce, whatever tool, what everyone's used to using.

And that might be in some cases, it might be people that are just used to using Google Sheets. So let's not discount that as a great way of presenting data. But you want to get that data. You not only want the data from those sources, just to be direct export, but you also want to be able to join that stuff and give people that more true view of that picture. And that's exactly what we were seeing at Pachyderm. Wen it comes to our Hub system, our SaaS product, that database was more closed off in the sense that it was just a Postgres database and a lot of people don't have access to that. And there weren't the tools in place just for direct exports into Google Sheets. And we needed that data in all those other tools.

Eric Dodds (11:16)

Sure. Well, so let's say you get the data unified. So as Dan and I were talking before the webinar, we said, "Okay well, one of the great things about modern stacks is you can solve that problem by either sending data directly from your websites and apps to the warehouse, which is really common, but then also pulling data from those tools into your warehouse." So now you have unified data and you can do the analysis. So Dan, I would love to know what that process is like for you. But then also you were talking about how even though that was really helpful, you still face the problem of teams wanting that data in their own tools. Which even though you've unified it and solved the problem of data being trapped in cloud tools, you get it out of there and into the warehouse. But then the analysis is sort of trapped in the warehouse, which is its own unique problem.

Dan Baker (12:16)

Yeah. It's exactly as you said there, Eric. Moreover, the more tools you put in front of any kind of team, the less they're going to get used to. People are comfortable with one, possibly two, possibly three tools that they use on a daily basis. But the more stuff we chuck at them, the less it's going to get used. And we see this with an awful lot of companies we see this with dashboards, they're great from a high-level overview, but they don't really get used in the way that you would expect on a daily basis by most of the internal teams.

And so from our perspective, we were trying to get that data not only to be visible outside of those dashboards but more to be actionable outside of those dashboards, within marketing drips, within Salesforce reports so that we can see the status of a lead, how active they are, what they're doing, essentially lead score, based on our own internal metrics to build a bigger picture of that. And that was definitely an issue, and the idea of having that data in the database is great and the dashboards are great, but it feels like where we were at maybe 10 years ago. Maybe not 10 years, maybe five years ago, that was kind of the state of play as a standard thing five years ago. And I think we need to move on from that and be in a place where that data is now freely available in all tools that need that data.

Eric Dodds (13:43)

Sure. Yeah, it's interesting. We've heard lots of companies talk about how unified business intelligence from the warehouse is really powerful because it can really change decision-making in a powerful way. But it's hard to use that to drive the actual tactical customer journey. Like you said, triggering marketing emails or helping SDRs prioritize. Well, let's talk about how you broke those data silos at Pachyderm. So we'll talk about both of those use cases because he had data trapped in the cloud tools and then data trapped, or analysis trapped in the warehouse. To talk through how you use different tools to get the data A, out of the cloud silos, and then how you got it from the warehouse back into the tools that teams were using.

Dan Baker (14:43)

Yeah. So I kind of looked at this as being a problem from the get-go really, we were working on creating a data platform for Pachyderm and I knew that this was ultimately going to be a problem that I needed to solve. And when choosing tools to implement as part of our customer data platform, that was a key consideration. How are we going to use this data rather than it just being sat within our warehouse or within our BI tool? That was a key part of it. And so from a kind of data source perspective, like we were having issues with people not really understanding Google Analytics, HubSpot wasn't being really used to its full potential because people didn't know how to access that stuff inside HubSpot. And the same applies to all of those other tools.

And so as I mentioned, the key part of that was how do we get that data on one side, we want that in a dashboard so that people can see that stuff. And that works to a degree and that was working for us to a degree, but we needed that more aggregated view on that data, so total numbers of things, and cost analysis and that kind of stuff. And so the key part was really piping that data back into... Processing out of the warehouse, storing it, materializing back to the warehouse using the tool that we were using Sigma, and then processing that back through with Reverse ETL to solve that problem that we were currently under.

Eric Dodds (16:26)

Yeah. And just give us a quick overview of what does it look like to use Reverse ETL? I'll do a quick demo, but we'd just love to hear from your perspective. I mean, what is it like to use that day today? So you have some sort of analysis in the warehouse that you've derived using Sigma and then what does it look like if you want to take that and push it back to the stack?

Dan Baker (16:47)

Yeah so, it's incredibly simple. We imagined this was going to be way more difficult and I architected a bunch of solutions that were going around the houses or as I like to say, sticky tape solutions that would almost work or do what we needed. And when RudderStack came to us with this feature, this saved the day really. It's so easy, we just turn it on, we point it to a view within our database that Sigma already generates. It writes that straight back out to the database. We process data in Sigma, we store that back to BigQuery, and RudderStack is able to read directly from BigQuery that view, which is essentially just an SQL query. And it pipes that data back into whatever destinations we're choosing in the same way that we were already used to doing with RudderStack in terms of connecting sources and destinations.

We just have a small config level that lets us map fields. So we can say this field from the SQL result maps into this field, in this destination. And we can pipe that through to any destination that we like within RudderStack within reason, but all the ones that we're looking at. And it just unlocks that process in a really simple way, it lets us do that on a scheduled basis. So we can choose a time-frequency in which we want to send that data back in. And our staff is pretty flexible, we're going maybe on an hourly, in some cases and a daily in other cases for this data because it's not critical at that point. But it gives us that true flexibility to use a tool that we're already used to and in a very simple way.

Eric Dodds (18:39)

Yeah. We'll take a look at that in a second. One quick side note, you had mentioned this and I just thought we have to include it because I thought it was such a good point, when we were talking earlier this week, you said data validation was a big piece for you when you started to get all of the data into the warehouse and analyze it. Could you talk through the disparities that you saw using cloud tools and what that looked like when you actually unified it and then started leveraging Sigma to do deeper analysis on the raw data itself?

Dan Baker (19:13)

Yeah, we're in a strange position where I think anyone who's just using these kinds of cloud tools is blindly accepting the data that they're presented in the admin panels or the metrics panels for those tools without any question. And one thing that we were suspicious of, and a lot of people seem to be suspicious of this is the ad traffic data, PPC data. We would spend a lar