🗓️ Live Webinar December 7: How InfluxData eliminated data silos in weeks with RudderStack

Pricing
Log in

Events

WEBINAR

Future-proof User Analytics Tables With dbt and Rudderstack Schemas

Duration: 1 hour

In this webinar, Eric Nelson, data engineer and analyst at Mattermost, shares a better way to clean and combine your customer event data from every source. Traditional methods involve constantly updating SQL or building inside of BI tools, which silos data.

Eric’s method uses RudderStack’s standardized event schemas and advanced dbt modeling to automatically generate clean tables for detailed clickstream reporting. The best part is that the dbt models are designed to automatically process and include new schemas as they show up in the event stream.

What we will cover:

  • The problem: customer data is messy
  • How companies solve this and why traditional methods are sub-optimal
  • A better way: standardized RudderStack schemas and data modeling with dbt
  • Live model walkthrough

Speakers

Eric Dodds

SPEAKER

Eric Dodds

Head of Product Marketing at RudderStack
Eric leads Product Marketing at RudderStack and has a long history of helping companies build practical data stacks to fuel growth.

Eric Nelson

SPEAKER

Eric Nelson

Analytics & Data Engineering at Mattermost
Scaling analytics infrastructure and data pipelines @ Mattermost.

Transcript

Eric Dodds (00:00)

Welcome everyone to live webinar. We have Eric Nelson from Mattermost today, and we're going to dig into some pretty gutsy stuff related to dbt and managing schemas and all sorts of interesting stuff, which I'm super excited about. We'll do a couple of intros first if you want to go to the next slide. It might be the agenda.

Eric Nelson (00:32)

Yeah. It's got the agenda here, but we can give a few intros as well.

Eric Dodds (00:37)

Yeah. I think the intro slide is after this one. What we're going to talk about today is really how you manage tables that drive critical analytics related to customers. By drive tables, I mean the tables that you actually use as the foundation for building you BI, which ultimately turns into all sorts of reporting across the organization. Anyone who has worked with customer data knows that it's extremely messy, and there are a number of ways to solve for this. The reason we wanted to have Eric do a webinar with us was we've just noticed that he and the team at Mattermost have done some really incredible things around leveraging dbt and doing some pretty interesting things around scale in a way that keeps things low maintenance. I think that's really important for companies, especially in a growth phase or they're dealing with enterprise level volume of events. Mattermost collects a huge amount of events. Having a low maintenance way to deliver rich analytics is pretty great, so that's what we're going to look at today.

If you want to click to the next slide, we can do some quick intros. I'll do mine first since it's really quick. I'm Eric. I run Growth at RudderStack and love digging into the technical side of things. Eric Nelson, do you want to give us a little bit of your background? Where were you before Mattermost? How did you get into data engineering and analytics, et cetera?

Eric Nelson (02:25)

Yeah. Sure. I started my career at Salesforce, and then moved over to Heroku, which is Salesforce company. From there, I transitioned over to Mattermost, but basically it was a slow burn. Since my degree was in management information systems, I've always been kind of heavily involved in database management and data engineering, so it was just more or less honing my skills. From there, it was kind of slow progression, and at Heroku is where I started ramping, doing a lot more collaborative stuff using open-source, Git, command line, and all of that stuff. Shout out to Alex Dovenmuehle, who is kind of my mentor as a data engineer. I'm more of a hybrid analytics engineer, so analyst and data engineer. I followed Alex over from Heroku to Mattermost, where we kind of set up a whole analytics infrastructure there.

Eric Dodds (03:22)

Very cool. All right. Let's dig in. Let's talk about messy customer data. This is huge problem, even for companies that have really strictly defined tracking plans and data governance protocols in place across teams. Stuff just changes, especially when you're moving really quickly. Let's talk a little bit about why this is, if you want to click to the next slide. I can talk through a couple of these points, and then would just love to hear about your experience across the companies that you've worked at. One of the reasons is customer data is never static. Marketing, product, et cetera are constantly trying to optimize, which means that the actual interfaces that customers are using, i.e. websites, apps, whatever, are constantly changing, which means that the data that they produce that represent customer behavior is also changing. That's just very challenging to keep track of, especially when you're trying to go through really fast iteration cycles. The reality is that the ops team, data engineering team, analysts have to be able to iterate very quickly to react to that. Could you give us just a couple specific examples of maybe how customer data has changed after Heroku and Mattermost?

Eric Nelson (04:54)

Yeah. Especially at Mattermost, I think, because of the way that the product is designed, there's a lot of functionality and a lot of enhancements that are added constantly to give a monthly release cycle for the on-prem offering, and then a bimonthly or a biweekly release cycle for the cloud offering. Really, what happens is there's a lot of new features out and a lot of new things that we're tracking. It just produces a new column or a new property in a table, an existing table, and that is the base raw table. Then we have a transform table that needs to be surfaced in BI tools. When that new column is added, there's no automated way or there hasn't been in the past to just incorporate that property and ensure that all of this data is bundled together, especially because the way that our product functions is we have a web app interface, a desktop and a mobile, and all of those data sources go to different areas within our raw database. That causes a lot of issues trying to blend those schemas and those click-stream usage data together.

Eric Dodds (06:01)

Sure. If you think about the example that you just gave, where you have three different app infrastructures, those are running three different sets of code, a bunch of different platform specific STKs, all of that makes data governance very hard. There's a point at which we want to say it very explicitly, whether you're doing your tracking plan in a Google Sheet or some sort of tool or whatever, the amount of work that it takes to align the payload structure across teams, across platforms, sometimes taking the time to do that doesn't actually benefit you as much as getting the feature out there and seeing if it works and seeing if it actually helps optimize user experience. That's a challenge that every company faces. Now, to some extent, data governance tooling, I think, still has a long way to go to manage some of the complexities across scheme and across platform, but it's hard because a lot of times the data engineer analyst roles also sort of have to pinch hit in terms of the data governance piece of it just to keep their jobs more manageable, and there a lot of times isn't a lot of clear ownership. Has that been your experience?

Eric Nelson (07:19)

Yeah. I think the one in which we rapidly hire engineers as well, ownership, there is no clear kind of articulated line. It's a lot of ebb and flow, a lot of new hires are coming in and contributing, and being open source, we also have so many outside community contributions that things are changing constantly. Sometimes we even have hackathons where people test our infrastructure, and that produces properties in our raw data house. All of these things work together to produce a lot of what would be intensive data and analytics engineering work.

Eric Dodds (07:57)

Sure. All right. Let's talk about some of the traditional ways that companies solve this, I think is the next section in the slide deck.

Eric Nelson (08:08)

Yeah. Let me move that for you. Historically, for me, I guess I'll just jump into this, a full data engineer is dedicated to SQL and the DDL aspect of things and making sure that all of these cable definitions are clearly defined, all of their columns are in there, and adding any new information and sometimes there's latency in between in additional to the raw warehouse and then actually managing the transformed one, because you have to go through a data engineering team or other people in order to get those things added, and they have a backlog of requests, so doing things dynamically just isn't really possible. Then I'll let you speak to the next two.

Eric Dodds (08:56)

Yeah. One thing that we see a lot is you can do this in a BI tool. I think the reason that throwing lots of SQL work at it just is very, very common is that it gives you more flexibility. It's really hard to scale that, but once you're actually done the work, you can do way more with the data because you have a set of tables that you can use in a variety of ways throughout the organization. It sort of gets at the whole self-serve analytics things as a foundational pillar there, but there's just a ton of work there, and it also requires a skill set that's in very short supply. Someone who has your skillset around data engineering, analytics engineering, et cetera, and doing that at the SQL level, very time intensive and really hard to scale that from a skillset standpoint, but it can be way easier to do some of those things in a downstream BI tool.

You have all this raw data, you solve for all of the edge cases and all the cleanup in the BI tool, but the problem there is that all of that work is very hard to share across the organization. It just lives in the BI tool. You can't really action it. People get access to it, so it still kind of gets at the self-serve analytics thing, but it's far less flexible because essentially you have a set of reports or the ability to build reports based on this, but to actually do anything else with the data requires a lot of manual work. Exporting data, that can be a huge amount of data. There's just all sorts of challenges there. The insights that come from the foundational data aren't really shareable outside of what you can derive from the reports. It's interesting.

Then as simple as it sounds, a lot of people just throw a ton of headcount at it. Whether that's on the data engineering side, where you have a ton of people hammering out SQL or doing constant cleanup projects, or just hiring a ton of analysts who hammer on the BI side of things. At some point, that's just not cost effective. We say okay, for a company of our size, having four analysts just try to keep our basic reporting clean doesn't really make sense. We need to be answering [inaudible 00:11:21] questions. Those are all very common solutions and very understandable, but what we're here to talk about today is a better way to do this with the underlying structure. Enlighten us.

Eric Nelson (11:40)

All right. The better way that I've identified after having instrumented RudderStack throughout our web properties and throughout our product itself is really creating a set of standardized schemas that clearly articulate their purpose. We have product schemas, as well as release candidate and quality assurance or dev schemas for testing so that we're able to easily identify our targets and which tables need to be blended for which purpose, because obviously we still surface a lot of our quality assurance data when devs are testing to make sure that the properties are being collected properly and that we're collecting all of the data that we're intending to and that the events are firing properly.

What we do once we have those standardized schemas is really leverage dbt and the macros that you're able to build within dbt to create low code master user analytics tables for various purposes, whether it's website interactions, whether it's our product usage. We have several master user analytics tables so that we can surface that insights or those blended click-stream or event data into a single table in our [inaudible 00:13:03], where we do the majority of our dashboarding and reporting. Really, the benefit of data modeling with dbt and using these macros is it's low code. It's modular, it's customizable. You have your list of schemas, you know your targets, you input your list of variables for table inclusion schema inclusions, exclusions, and then from there it more or less allows you to blend all of these tables together without riding a massive union script that needs to be constantly updated and upended to when new properties are involved. 

There's also other benefits where you're creating dummy columns or null values where properties don't exist for, say, one event type and they exist for another event type. It really allows for a simplified blending of all of this data into that single master table. Also, it allows you to account for things like more or less typos from devs who are working on one area of the product. Say they're missing an underscore in a column name and there's an underscore in the other column name for a separate property or a separate event. It allows you to capture all of that in a single table in separate columns and then you can coalesce and blend in your looker instance, and it just makes everything a little bit more scalable. Not to mention... go ahead.

Eric Dodds (14:29)

I was going to say, I think one thing that would be good, and just to make sure everyone's on the same page, when we talk about standardized schemas, I'd love to hear from your point of view, what does that mean, and especially in the context of cross property, because I think one thing that I think is helpful and just for the audience who may not be familiar with it is that the schemas are standardized out of the box, and I think that's a really big piece of the foundation of why you can do this, but I would love for you to speak to that just a little bit in case anyone's not familiar with standardized schemas.

Eric Nelson (15:03)

Yeah. The standardized schemas that RudderStack produces really allows us to identify more or less the various properties that we're tracking. We have a customer portal where it automatically produces a set of identifying features, like pages, tracks, users. You can track all of those things. Then you can add additional data into those schemas, so you know exactly what you're targeting, where it's going, and you can specify the various RudderStack keys in order to either funnel to QA data, to release candidate data or to production data when you push your code live. It just really simplifies... go ahead.

Eric Dodds (15:47)

I was going to say, just thinking through a specific example here in the context of Mattermost, let's say you have a user signup event or account create event. The schema for that is going to be the same across mobile, web and other platforms. When the devs are instrumenting it, it's going to produce the same payload structure no matter the platform.

Eric Nelson (16:13)

Exactly, yeah. Produces the same properties, same payload structure. Everything is pretty uniform. It really simplifies that whole blending of data.

Eric Dodds (16:23)

Great.

Eric Nelson (16:24)

All right. Another one of the benefits from the macro is it allows you to track these nested data sources. When you're blending all these tables together, you want to be able to track the source of the data. We have things like anomaly detection algorithms that we run, so we identify when their odd spikes or dips in data, so being able to track the source of that data using this Master User Analytics table is key in troubleshooting for any sort of bugs or errors that we might encounter, or any sort of anomalous data interactions. Again, it's customizable, so it allows you to do it across the board. We have a customer portal, we have our web app, our desktop app where we can blend all of this data together and specify nuances to each of those if you want to go in and actually modify the macro itself.

All right. The next portion kind of walks you through the macro that I've created currently. Eric, I don't know if there's anything else that you want to speak to prior to jumping into this and the live walk-through.

Eric Dodds (17:31)

Let's do it. No, this is great.

Eric Nelson (17:34)

All right. Basically, what you get with this macro is you have a set of variables that you can input. It goes in the model file or the individual relation that you're building and you specify what schemas you want to target, the standardized schemas we mentioned earlier, and then whether or not there are any table inclusions or exclusions. If you know there's a specific table that's generated, whether it's erroneously or generated and is of no use, you can exclude those, or if there's only a subset of the tables within a schema that you want to blend together, then you can just include those and you just add them to a list. Essentially, after that, what it does is it pulls these relation objects, so dbt creates relations object from your warehouse looking at the information schema. It identifies all of those tables and then from there it looks at those relation objects, iterates through them, retrieves columns and creates dictionaries for all of the columns contained in each relation. Then from those column dictionaries that are tied back to a key relation, you create a super set, where you're capturing the column name and a column data type. If it exists already in the super set, then it won't be added, and if it doesn't exist yet, then it's appended.

What you're doing when you're actually generating a build script is you're looking through both of these dictionaries and you're saying okay, if this relation has this column, then we're good to go. If it doesn't, we'll cast a dummy column as a null value with the right data type and we'll more or less add that to that. We're looping through each relation, so we're adding it as a column property, and we add it in there so that they can be unioned in a way that you don't encounter any errors, because you know you have to have the exact type of data in order to union successfully. 

I guess what I'll do now is just a live model walkthrough to show you what I'm talking about, as opposed to blabbering on here. Let's just jump right into it. I have most of what I need up, I believe. A lot going on in this screen, but what you'll see to the left here, this is the helper dot SQL file that we used to build our macros that we referenced in our model files for dbt. There are several different nested macros and macros, but basically what you're getting is you're pulling in these RudderStack tables and you're iterating through them, but I'll show you what that looks like in real time. Let me just execute some code. I've just got to move some Zoom stuff out of the way here. All right. 

What we'll do is we'll go to our repo, and we then we'll say, make dbt bash. We're basically containerizing and running a dbt image so that we can execute this code locally and then we can examine the log file in the SQL that it produces. Dbt run, and then we'll specify the target as prod so that we're just running this job as though it's a production job and it will actually upend data if there is any to upend to our production tables. Then we'll do the model file user events telemetry, which is a blended table of all of our mobile, desktop and web app data, so all of our event data. Any user interactions that we're currently tracking will appear in this table across all of the various operating systems and platforms.

What you do is you hit run here and it will iterate through. This is the actual file itself. As you can see, once I expand this up here, it's very low code. You have 10 lines of code, only nine of which that actually have text in them. Once you build up that macro, which is kind of the brunt of the work and it's modulized and we can actually push it live and make it available to people so that they don't have to build it themselves, then you just need to input these values here, like I was saying, the variables where you're specifying the schema that we're targeting, which is our mobile production schema for this one. Let's open the actual user events telemetry as opposed to the mobile schemas one. Let's go into... here we go. You specify some dependencies if you need to, where you're referencing various tables, and then you specify the schema, the database it's pointing to, and then the tables that you want to include in this instance. Then you're just [inaudible 00:22:21] these relations, and it produces. 

All right. It already ran. We run this job pretty frequently, so there was zero rows actually inserted because it runs so frequently. Then it produces a log file. This log file is going to show you what is actually generated by executing. You'll see why you would never in a million years want to write all of this [inaudible 00:22:49]. What you're seeing is this block here, it's not the prettiest looking, but it is essentially the list of all columns and properties captured by any event that we're currently collecting in the platform. You can see, if I were to zoom in a little bit here... no zoom option, but there's a cast, and what you're doing is this is null column, for instance, and you're casting it as a character varying and then this length. What you're doing is you're basically identifying the max length of any of the columns for this specific context page search property, and you're making sure that you're incorporating that max length. 

Then if it doesn't exist for a specific table, you're casting it as a null value as well. Basically, everything's taken care of. That column's super set I mentioned compares to the relations and it iterates through and then creates this one long loop of several tables. This is just one of the tables that we're unioning, but it occurs up here as well. This is all one SQL script. As it was said earlier, you would never want to write this yourself. It also creates those nested visual relations as well. Go ahead. 

Eric Dodds (24:16)

Yeah. Let's talk about the nested visualizations in a second. One thing, and this is somewhat unrelated, but it's just really fun to see in the wild. I remember when Mattermost migrated from Segment to RudderStack. One thing that's interesting and convenient is that we're ABI compatible. I love that you're able to just use the existing tables in your warehouse of [inaudible 00:24:41] Segment and we're just writing directly to those tables because the schemas are exactly the same from an API standpoint. That's something we talk about a lot, but it's really fun in the wild to see the incoming data all come from RudderStack, but you didn't have to do any sort of renaming or restructuring in the warehouse. That just makes me very happy.

Eric Nelson (24:58)

Makes my life a lot easier too. I was very concerned during the transition period about that, but it turned out that all of the properties were pretty much identical.

Eric Dodds (25:09)

Yeah. It works really, really well. It was fun to see that in the wild. The nested visualization, let's talk about the practical... I think we can infer some of the practical benefits, but I'd just love to know, what does that look like at Mattermost? Why is that a big deal? How does it make things easier, et cetera?

Eric Nelson (25:32)

Yeah. The thing that we really like to do... obviously, you want to identify, say, I'm troubleshooting an issue for the mobile team. We have this Master Analytics Table, but you only want to see mobile events. What it allows you to do is identify your mobile schema, filter by that schema, and include all of only the events that are within that. Then if you want to target a specific mobile event, then you there's literally a hierarchy of nesting that occurs, so you can track downstream. Say you want to know how many people are downloading plug-ins or admins or downloading plug-ins from their phone. You could track plug-in download events from mobile by admins, so it just allows you to really track that downstream effect and down the funnel.

Eric Dodds (26:22)

Right. Rapid drill down, as opposed to just wandering around the warehouse trying to remember what the name of that specific table was.

Eric Nelson (26:29)

Yeah. As well as understanding the distribution of usage on your platform, so how many active users, you can pivot data by that and look at them stacked side by side so you can see okay, how many users do we have on the web app daily or weekly? How many users are on the desktop and how many user are using mobile? That's another benefit.

Eric Dodds (26:50)

Yeah. From the standpoint of an analyst, that makes so many things easier, especially with a [inaudible 00:26:58] cross platform setup. When you're cross platform and you talk about a metric like weekly active users, it's always more complicated than that. It's generally weighted heavily towards a particular platform or set of platforms. I can absolutely see from the standpoint of an analyst how that would really expedite some of the reporting use cases where you have a metric in mind but you basically have to break it down by platform or source or some other sort of high level way.

Eric Nelson (27:38)

Yeah. Not having to hard code that in a script in order to do that in the future, so when you're unioning, it just makes it so much easier.

Eric Dodds (27:49)

Very cool.

Eric Nelson (27:51)

All right. Then I can take you guys through some of the actual code itself or the macro itself, but more or less what you'll see is you can use Jinja, which is a dbt syntax. It's kind of a pythonic overlay to SQL that you can manipulate. You iterate through these relations, you're creating these relation column dictionaries, and then from there you're creating this column super-set dictionary, and it's basically just checking to see if it's already stored or if the size of the column is different. If it is, then they'll update. If not, does nothing. Then from there, that is when you use this incremental build logic. If you want to incrementally build your data, you specify this target here, which is another function of dbt and you're windowing across the last 12 hours up until the current time stamp, so you can limit any sort of variations or inaccuracies.

We've had issues where timestamps are in the future because of testing, 20 years. If that gets inserted and you're incrementally building, it causes a lot of issues because you're not incrementally building anything before that timestamp any longer. We've noticed some anomalies there, so hard coding that and making sure you don't have to do that for every single model that you're building is also important. Like I said, it's customizable, so we have different targets that we're using too. Say, a specific table or a schema contains a specific term, then we're leveraging some custom logic that I've built in myself. Then from there, this is where that looping occurs, and all those columns are generated in that one log file I was showing you earlier.

Eric Dodds (29:43)

Sure. The implications of managing testing these cases as well, I think, is a really interesting additional benefit here. That just makes a lot of that stuff and keeping that data tidy or excluding it or managing those edge cases is really clever. I love that.

Eric Nelson (30:06)

Mm-hmm (affirmative). Yeah. Basically, like I said, it's customizable. You leap through a lot of this. I've had to kind of add on as the years have progressed at Mattermost because of various whether you're trying to condense the table to make it more performant or you're trying to specific types of events. It just makes it very easy. It's a quick loop from there and it creates that master script that you'd never, ever want to write. Yeah. That's the macro. Those are the outputs and that's as easy as it is to run it. Just hit next [inaudible 00:30:49] the CLI.

Eric Dodds (30:51)

Awesome. Well, we have plenty of time for questions. We don't have to take up the entire hour, but we'd love to open it up for questions. Feel free to raise your hand or pop a question into the Q and A. Happy to unmute anyone if you want to ask your question or chat, but we have plenty of time so we'd love to hear from you. One question, how many hours did you put into building this? I know it's been an iterative process, but if someone were to take this basic framework, what do you think the lift is to actually replicate something like this?

Eric Nelson (31:49)

Yeah. Initially, when I was first [inaudible 00:31:53] dbt for Mattermost, it was a completely new topic, new software, just new functionality for me, a new language, Jinja. It did take me some time initially to get everything stood up and make sure I was executing things properly. I would say, collectively, probably a week straight of work if I were to have condensed it all together, but it has saved me a month of work at least after the fact. Now that the framework has been built, I wish I had had that. I actually leveraged some of Fishtown's macros already, but I had to customize them for the specific use case. That expedited the process a little bit, but this should expedite it even more. If you're familiar with Jinja and dbt, it should be a very quick process to stand up, I would say. A week tops. A day if you're very familiar.

Eric Dodds (32:57)

That's impressive. I was going to guess more, but I guess relative to the amount of time that it saves you, it seems like a no-brainer as far as the investment. Another question that came in, great concept, but can you go over the practical use case of what the aliquot model can be used for? Live, I'm not super sure what the columns and rows would be.

Eric Nelson (33:25)

All right. Yeah. I'd be happy. Let me actually just give you a demo of some code that is shareable. One of the things that I've been working on... let's see. We have a cloud dashboard where we're checking the engagement of our cloud customers, and this is blending together all of this usage data, all of our server telemetry as well. I guess the best way to go about this though would be to show you. We can just scroll down here. Again, it allows you to accurately track daily active usership, monthly active usership across all of your platforms, as opposed to having to try to blend them together. Can you give me a little bit more about the context that he's asking for? He said properties or columns?

Eric Dodds (34:28)

Not super sure what the columns and rows would be. Let's see. This is just an anonymous attendee, so I don't think I can unmute them, but feel free to give a little more detail there.

Eric Nelson (34:44)

Yeah. Let's do a QA dashboard instead. This will give you an example of the actual properties generated. Right here is all of our web and desktop quality assurance data, which is more or less dev test data that gets generated and the columns that it generates. Let me refresh that. What you can see here is I've created this properties column, and you'll notice that there are some values... all the null values are omitted here, but all the values that are populated are contained here. As you can see with the test data, you have all of these properties that would need to be blended together in unions and they're only contained in certain tables. What it does is it essentially generates these for you with the right data type so that you don't have to do any of this work yourself. The nice thing about dbt is it creates all of the dependencies for you so it understands what you're referencing, the sources, and the other tables that you're referencing, and it allows you to already produce DDL based off the script without you having to define the DDL yourself. That's another one of the major benefits here.

For instance, this one here, if you get an event, which is a workspace reactivation not needed, it's a specific type of event but it's blinded with all these others. A lot of times events will come in different relations within a schema, and so what this is doing is just blending them all together for you.

Eric Dodds (36:30)

Yeah. If I had to simplify it, I would say that the columns and rows represent specific user behaviors. That makes it really easy to say okay, we want a report on workspace reactivation, so it produces the tables to make that really easy. Feel free to give us more color on that, again, anonymous attendee. Great question. We're happy to dig in more.

Had another question from Christian. He said, this is probably a dumb question, but how do RudderStack, GA360 and Adobe Analytics relate to one another? That is not a dumb question at all. It's a great question. I'll give me high level answer, but would love to hear your thoughts as well, especially working as an analyst. I would say the fundamental difference between GA360 and Adobe Analytics type tools and RudderStack is that RudderStack gives you the raw data and allows you... I would say there's two things. RudderStack generates the raw data that represents user behavior. We talked about standardized schemas at the beginning of the talk. Those standardized schemas are JSON payloads that represent user behavior. That could be a page view on your website. That could be a mobile app install, mobile app open, account created. All of the user behaviors you would track in a normal analytics setup. 

What RudderStack does is you instill the STKs and it actually emits the raw event data. When someone does a page view, you get that raw payload. The STK emits that. It goes through RudderStack's system, and then there are two things that happen that are significant differences between GA360, Adobe Analytics and RudderStack. One is that we can send that raw data to all sorts of cloud tools. We actually have many customers who send the RudderStack data to GA360 or to Adobe Analytics to feed those with the raw data. They actually populate those analytics tools using RudderStack data. Why would you want to do that? Number one, you're collecting that first party data using the RudderStack script and you don't have to bog your site or mobile app down by installing a bunch of third party scripts, but more importantly, it allows you to also send that payload to any number of other destinations. Let's say you send it to GA360, you send it to Adobe Analytics, the marketing team wants to get it into Marketo, the sales team wants to get it into Salesforce, you want to send it to Intercom. 

All of these various tools that you have, because you're capturing the raw payload, you can syndicate it to hundreds of tools across your stack. The other one, the other major distinction is that, again, you have the raw payload. You can send it into your data warehouse. That's really the use case that we're talking about here today. All of this raw data is coming into a data warehouse, and I believe you use Snowflake at Mattermost, right?

Eric Nelson (40:07)

Correct. Yeah.

Eric Dodds (40:08)

All those raw payloads get sent into the data warehouse and then it allows you to build BI on top of your data warehouse doing all the types of things that Eric Nelson talk about. It really is, I would say, a foundational level. You're capturing the raw data and you can do anything with it that you want across your stack, including warehouse. It's a much more extensible way to drive both your cloud tools and your warehouse space BI. The other thing I would say, and this is just sort of a subset of the warehouse case, is that generally with larger tools it takes a long time to actually get your data. If you want raw data out of GA360, A, it's phenomenally expensive, hundreds of thousands of dollars, but you can only get dumps, I think, every 24 hours. It's really slow. Modern companies who are trying to do realtime reporting and optimization, especially if you have user focused direct to consumer type situations, or just large amounts of data coming in realtime, Mattermost is a communications app, so lots of data coming in real time, those tools can't actually give you your raw data in a timely manner, which is sort of becoming increasingly important. That was a lot of blabber. Eric Nelson, your thoughts?

Eric Nelson (41:38)

Yeah. I'd say that the first thing is cost savings. I think it's a little bit more cost effective to go about instrumenting everything using RudderStack, and also the customizability. Website interactions, actual events that are occurring on your website, it's, I would say, a simpler way of instrumenting button clicks and then being able to capture that, and like Eric Dodds was saying, sending it to the relevant places, so into our data warehouse, to surface in our BI tool, and also mark a lead record as having [inaudible 00:42:19] engaged in a certain area on your website. It's more functional and it's more versatile than GA has been, at least in my experience.

Eric Dodds (42:30)

Yeah. I think one of other thing I would say, and this is just feedback. We've had lots of customer. Again, it's less of a choice like, should I use GA or should I use RudderStack. Most of our customers use both and they feed GA with RudderStack. I think another thing, just thinking about the customers that we work with who also use GA or Adobe Analytics is the user level tracking with RudderStack, because you can populate the schemas with specific traits tends to be much, much richer than with tools like GA or Adobe Analytics where the structure of user trades tends to be a lot more rigid and sort of follow a more limited structure. Whereas with RudderStack you can sort of populate the payload that represents the user in any way that you want, which is another good thing. Really good question, Christian. Definitely not a dumb question at all. We get asked that all the time. Yeah. Happy to provide any more information there. Cool. Any more questions? We'll give everyone 10 or 12 minutes back if another question doesn't pop up here in another minute or so. Eric Nelson, thank you so much for taking the time. This is incredible and I appreciate you using RudderStack, of course, and showing everyone all the awesome stuff you've done with our schemas and dbt.

Eric Nelson (44:13)

Yeah. Thanks for having me. I appreciate it.

Eric Dodds (44:15)

Cool. All right. Thanks for joining us everyone. We will send out a recording of this webinar afterwards so you can watch it again if you missed something.

Eric Nelson (44:25)

All right. Thank you.

Get Started Image

Get started today

Start building smarter customer data pipelines today with RudderStack. Our solutions engineering team is here to help.

Sign up for freeGet a demo

COMPANY

  • About
  • Contact us
  • Partner with us
  • 🚀 We’re hiring!
  • Privacy policy
  • Terms of service

JOIN THE CONVERSATION

Learn more about the product and how other engineers are building their customer data pipelines.

Join our Slack Community

READ OUR DOCUMENTATION

Technical documentation on using RudderStack to collect, route and manage your event data securely.

Go to Docs

© RudderStack Inc.

This site uses cookies to improve your experience. If you want to learn more about cookies and why we use them, visit our cookie policy. We’ll assume you’re ok with this, but you can opt-out if you wish cookie settings.