Lessons from Heroku and Mattermost: How to build a customer data stack that scales

image-92dfa30a6f8f1b8dcc7297f207c1a58dd2f530a6-1501x901-png


In this on-demand webinar, Eric Dodds talks with Alex Dovenmuehle about the best approach for architecting data stacks that scale well as your business grows. Alex shares lessons learned from data engineering work at startups like Mattermost and huge businesses like Heroku.

When companies are early, it can be difficult to understand the long-term impact of decisions about the data stack. At the same time, when organizations become large, it can be difficult to implement flexible, efficient data architecture that leverages newer standards in technology. In both cases, getting the data stack right is key to enabling every team, from product to marketing, to build competitive advantage.

Here's what Eric and Alex cover:

  • A long-term view of the data stack: what are the problems at scale that early stage companies don’t think about?

  • The essential toolset: what does every company need, regardless of scale?

  • How business models, i.e., B2C vs. B2B, influence data stack architecture and tool choice

  • Evaluating near-term and long-term costs

  • Beyond the tools: the other critical components of a scalable data stack (data engineering resources, team structure, executive buy-in, etc.)

  • Q&A

Speakers

Eric Dodds
Growth at RudderStack

Eric leads growth at RudderStack and has a long history of helping companies architect customer data stacks to use their data to grow.

Alex Dovenmuehle
Co-Founder at Big Time Data

Alex is obsessed with driving business impact with data and automation. He's always looking to create automated and scalable business processes, supported by software and data, that allow businesses to scale their operations.

Transcript

Eric Dodds (00:00:00)

A couple of minutes past the hour, so go ahead and get started, officially kick this thing off. So I'm Eric Dodds. I lead customer success at RudderStack, which we'll talk a little bit about that tool in the context of your webinar. But today what we're going to talk about is the data stack journey. So we have a really exciting guest, Alex, who you've already heard me bantering with. And we'll dig into his background a little bit, but wanted to give a quick overview. So the data stack journey is really about how in the age of data proliferating inside of an organization, the data that's collected and produced by an organization, one of the big challenges that businesses face is all of the effort, processes, and tooling that goes into having a data stack that works across their company, across teams and helps them build competitive advantage.

Now, the challenge is that businesses change. So what we're going to talk about today is how to architect a scalable stack. So what are the tools that you need to put together at various stages of growing a business, relative to your customer data stack? And what does that look like through the life cycle of the business? So here are the things we'll talk about. So we'll talk about why this stack needs to be dynamic. We'll talk about a long-term view of the stack, so what are the problems that we typically see at scale? We'll talk about the stages of companies as we've broken them out. There are different ways to do that, but we've put together a simple framework to walk through. And then we're going to talk through the toolset and cost at each stage of the business. Briefly touch on business models and how those influence the stack. And then we'll touch on things outside of the stack itself in terms of team structure and other stuff like that.

So without further ado, I would like to introduce Alex. And you have a really interesting background as an engineer in several different contexts, but you are currently a consultant at Big Time Data, which I want to hear about. Before that, you were at Mattermost and before that, you were in Heroku. And I wanted to point that out to the audience because you have a really interesting perspective. Heroku is a huge organization, of course, part of the Salesforce empire and doing things at a massive scale. Mattermost is really a startup. The large, fast-growing startup, but on the other end of the spectrum from Heroku. Now, as the consultant, you see all sorts of different things. So do you just want to give us a little bit of your background in your own words, and then we can dive into the content?

Alex Dovenmuehle (00:02:53)

Perfect. Yeah, so prior to Heroku, I was basically like a full stack developer, front end, back end. I just like to do it all. I joined Heroku about six years ago, and that's where my data engineering journey began. Like you touched on, the nice thing about Heroku was just the scale of everything was bigger, not only because of your part of the Salesforce borg, but also just Heroku itself had tons of data. They're processing just billions and billions. The number is higher than billions. What's over billions, trillions? I don't know. Many billions of requests per month. So it's a pretty big scale there. And basically got into the data engineering stuff there, which at the time was very not good. We were still using bash scripts and our data warehouse was literally a Postgres database. It was a giant Postgres database with tons of memory and stuff, but it was still just Postgres.

So I ended up moving them to more modern data architecture, and I'll touch on that later as well. And basically, about 14 months ago, we moved to Mattermost. And the idea was, we're going to take everything that we learned at Heroku and just replay the playbook at Mattermost, which when we joined Mattermost, they literally had no data infrastructure at all, as many series A, series B companies do. So at Mattermost, we were able to build their whole data stack, analytic stack, go-to-market automation, and all this stuff that we built. And then now we've recently created our big-time data consulting company, because we saw, "Hey, everybody has all these problems. They're all pretty similar." It's like, "I think we can help all these different customers figure it out."

And that's where this idea of the data stack journey started to form in my head because there are differences between talking to companies that are at the seed stage or earlier, versus like a Heroku size, or even a Mattermost size. There are some differences between how these companies are operating, and what their concerns are and how do you build a data stack that allows you to grow from a seed company to a Mattermost size company, to the Heroku-sized company, and beyond that?

Eric Dodds (00:05:37)

Sure. Love it. Well, let's dig in and let's hear about that experience. So we'll just go back and forth here. I think we've actually already touched on this, so I'll intro it and then would love your thoughts on it, based on what you just talked about in terms of the stages of the company. But really the question is, why does your data stack have to be dynamic? And I think the best way is, Alex and I were talking about this in prepping for the webinar is, if you even just think about the last 10 years, you had companies who adopted, at the time, amazing basically on-prem infrastructure to handle all their data stuff.

And then in a relatively short amount of time, you have this massive migration to the cloud, and then you have the first wave of warehouses. And then the current phase is from early warehouse solutions to Snowflake. So, Alex, you want us to give a brief definition of the data stack journey in that context, right? We've seen a lot of stuff happening, we live in a great age as far as tools and all that stuff, but what is the data stack journey in a concise definition?

Alex Dovenmuehle (00:06:57)

Yeah, I like the quote that we have there. I don't want to read it, so I'm not going to, but definitely take that into your mind when you're reading that. But it's, how do I start based on the size of my company, and the number of customers, and the various things that would make you choose different things? Based on those, what tool should I pick? What technologies? How should I even, from an organizational perspective, organize who owns what and where those things live? And then, how do I leverage all this stuff so I'm actually getting data out of it the whole time? And then how does my data stack grow with me so that I'm not having to go back and do all this rework, because I made the wrong decision five years ago, and now I have to spend a year rewriting everything to get to that next to modern data stack?

Eric Dodds (00:07:55)

Sure. Well, let's talk about the problems at scale. So this is something you've seen directly. And why don't we just walk through each one of these? I can give a brief explanation, and then why don't you talk about briefly the way that you've seen that play out at scale when it becomes a real issue. So internal tools become burdensome and costly to manage. We see this a lot where you're evaluating tools and you say, "We have a little bit more of a customer need. We're just going to build this data infrastructure ourselves." What does that look like when it becomes problematic at scale?

Alex Dovenmuehle (00:08:32)

Yeah, so obviously I think there's a few things. A is, can your custom solution actually satisfy all the needs that you actually have for it? Can you actually execute and build that, A? And then B, it's like, "Well now I have to pay these highly paid engineers to go build the thing. That takes time." And then it becomes a technical burden. I won't say debt because it could be amazing, but you know what I mean? Somebody has to know how to operate this thing. Somebody has to be monitoring this thing, make sure that it's working. If the person who built it left or whatever, well, now they have to train, and maybe they don't understand the full context. There are just layers and layers and layers of it that really can bite you.

So I think being able to pull something off the shelf, and I don't want to mention RudderStack already, but I think what's nice about RudderStack is, it's just having the data warehouse as the center of it, it's like, "Okay, we're getting the data into the data warehouse. Now, okay, I can control the data warehouse and I can make everything work in there." So I can still build those custom things that maybe I thought I needed elsewhere, but now it's just like, "Oh, it's in my data warehouse. I'm doing stuff with that all day."

Eric Dodds (00:09:52)

Sure. Yeah, I mean the way that we refer to that a lot is, there's a point at which you're building a product, like a data infrastructure product, and that takes away focus from the product that you're building for the customers who are paying. Lack of unified data, I think this one's pretty simple, and everyone's experienced this. Data gets trapped in different silos.

Alex Dovenmuehle (00:10:16)

Yeah, and I think the worst iteration of this is when you have ... Let's say you have Looker or some data visualization tool, and somebody goes and runs a report that says like, "Oh, this customer paid us X, Y, Z amount of dollars last month." But then somebody else, some sales guy goes into Salesforce and he's running his own reports in Salesforce and it says, "Oh, he only gave us this much." And then now people are like, "Well, which one do I trust now?" And the sales guys can be like, "Well, I like Salesforce, so I'm just going to trust Salesforce." And then the customer support person is looking at a totally different view.

So being able to, especially at Big Time Data, we really think about getting that data into the data warehouse so that you can have those real sources of truth that says, "This is the real source of truth." And then using reverse ETL to get that data to those other systems so that the data is consistent across all those systems. And then that means people are trusting the data, they're going to the right places for the data and all that kind of stuff.

Eric Dodds (00:11:20)

Sure. A single view of the customer, customer 360 are just buzzwords that are thrown around so much, but it really is a big challenge for companies who ... Really it stems from lack of unified data. So you have problems around different versions of the same customer record, essentially.

Alex Dovenmuehle (00:11:44)

Yeah. I think everybody's trying to crack this nut, and I think a lot of people are not that good at it. And that's something that we try to bring to our customers. That's something that we built at Mattermost is, give us the customer 360 that can show you, "Here's all the sales metrics about this customer. Here are all the customer support tickets and stuff like that. Here's the product usage stuff." All in one place where they can go and see that kind of thing. And yeah, as you said, if you have all this stuff siloed around and people are, again, using different views of the data, it just gets very unwieldy and really inefficient I think is the term, because then you have to have people running around trying to be like, "Well, what's the real a