Events

DBT vs LookML – Modern Data Workflows with RudderStack

What we will cover:

  • What are dbt and LookML?
  • Basic concepts of each
  • What each does best
  • Why dbt & LookML are better together
  • Modeling event stream data from RudderStack
  • Using dbt and RudderStack to send modeled data from your warehouse
  • Q&A

Speakers

Gavin Johnson

Gavin Johnson

Product Marketer

Product Marketer at RudderStack. Ex-PMM at New Relic & AT&T. Ex-consultant at Deloitte. Ex-sys admin. (Sometimes) Ex-developer.

Alex Dovenmuehle

Alex Dovenmuehle

Co-Founder at Big Time Data

Alex is obsessed with driving business impact with data and automation. He's always looking to create automated and scalable business processes, supported by software and data, that allow businesses to scale their operations.

Transcript

Gavin Johnson (00:00)

I just want to check with them. So before we jump into the presentation, I want to introduce who we are. I'm Gavin Johnson, I'm the product marketing lead at RudderStack. And this is Alex Dovenmuehle. Did I get that right, Alex?

Alex Dovenmuehle (00:13)

Yes. You've been practicing. So you get the points.

Gavin Johnson (00:17)

I have been practicing. Alex is the co-founder of Big Time Data. And he is a former data engineering lead at Mattermost and Roku. So he has a lot of experience with building these types of data stacks, as well as the types of tools you use with them. A lot of experience of both the dbt and Looker.

Alex Dovenmuehle (00:37)

That is right.

Gavin Johnson (00:40)

Okay. So today what we'll go over is, what are dbt and LookML. What are the basic concepts of each? What does each do best? And why dbt and LookML, actually are better together? It's not an either or situation. Before we jump into that, though, let's talk about how RudderStack is related to dbt and LookML. And really, RudderStacks related to dbt and LookML right here at the data warehouse. So what we do is, we provide software that basically helps you move all your data around into your data warehouse, to all of your tools that need to use data, as well as from your warehouse, those tools.

So event streaming, very similar to what you'd see from segment. We do that very well, but we store your data in your data warehouse instead of in our infrastructure. So you can use that for analysis. It makes it very, very usable for something like dbt or LookML, because we're already putting that data where those tools would operate. And then we also have Cloud extract, which is a Cloud application to a data warehouse at ELT. Very similar to what you'd see from Fivetran. And then our most recent functionality, the one that people are really excited about is a reverse ETL functionality that we call warehouse actions. 

Basically you take any data in your warehouse. So a lot of times it's data that's post modeling or that's come out of an ML processing system, and take that and then send that to the event streams and all the tools that we support too. So we do all of this, we connect all these tools, that's what we do. But we store all the data in your data warehouse. So you have access to it, which is perfect for dbt and perfect for LookML. And we'll warehouse first, that's something I was driving to. Our architectures warehouse first, we don't store your data, we don't persist it. And we build your customer data for you in your data warehouse. So you want to apply dbt and LookML to that customer data lake, and we're the ones that can help you build it very easily.

Alex Dovenmuehle (02:35)

All right. Let's get into it.

Gavin Johnson (02:38)

Let’s do it.

Alex Dovenmuehle (02:39)

Delicious. So a dbt is a great tool. It stands for data build tool. If you're looking for it on the internet, it is getdbt.com. And I'll just introduce it as though you have no idea what dbt is. At least I'm not sure if you do or not. But basically it's the T in your ELT process. So basically, you get all your data into your data warehouse and then you use dbt to transform it in a scalable way. You can take your raw data, like Gavin was saying, the streaming data, the Cloud extract data, where you're using Stitch or Fivetran, or whatever, but all your raw data. And then you can transform all of that into clean, enriched data. So then you put your BI tool on top of it, your Looker, Tableau, whatever. Obviously we'll be talking more about Looker in this.

And the really interesting thing about dbt is how it is based on SQL, so it's very familiar and it adds just a little bit to it to give you a way to organize all your SQL and how you're building up these models. And when you're building models on top of models, and maybe some model depends on 10 different models or whatever, stuff like that. It really lets you manage that complexity without getting totally crazy with 1,000 line SQL files and all that crazy stuff. And so that's where the data lineage comes in. Because by the way that dbt works, they can tell like, "You have this raw source table. And then it's referenced by this other model, which is referenced by this other model up your chain."

And you're able to actually see a graph of where all that data's coming from, how it was transformed. And it also lets you search through all these things. So if you're a new hire and you're coming into a company who uses dbt in the right way, it makes it very easy to onboard because you can understand how all this data joins together and where it's all used. And so, I think dbt has tried to make the quintessential or an analytics engineer, where you're mostly doing analytics stuff, but also a little bit engineering because of the way the dbt is. But myself is more like on the data engineering side. I can heavily use dbt as well.

And so just so you understand some of the concepts that we'll be using here, your dbt project is what it sounds like. It's your overall container for all your code and everything that dbt needs to run, the commands. A source is a raw data source. So generally it's just like a database table somewhere. And you specify what schema is it in? What's the table name, that stuff. And then you can reference that source in your models, which brings us to models. And so the models are where the real magic happens. So basically you have these sources and then you'll start building models based on those sources. But then you'll start joining all those models together, again, all in SQL and be referencing all those models in a bunch of different places.

And then because you're referencing models and it knows the lineage of what source and what other models, the model you're working on was created from, that's where you get this dependent acyclic graph, which can also... It's called the lineage graph in dbt parlance. But that will show you how did this data come to be? What is this model built upon? Seeds is a special source basically. I don't know if it's just CSV, but we only ever use CSV. But basically, it's those tables that you have where you basically have static data that's not really going to change, and it's not dynamic from another source. You just want to put in... Sometimes we use it for things like excludable users.

If your company uses your own product, you don't want that to show up in the analytics, so you exclude that. And so you put this in the Seed, and that's what that is. Materialization is a concept that basically tells dbt, and this is going to be crazy how to materialize that model. So you have a number of different options. Obviously the easiest one is a table where it literally just builds up the SQL and then it'll dump all that stuff into a table.

And it creates a table for you. It handles if you add new columns or remove columns, and all that stuff. But you can also do an incremental materialization, where it's still a table. But instead of rebuilding the entire table the whole, every time, it just rebuilt part of it. And generally you use that for event streaming data where you can say, "Okay, I just want to rebuild since the last time I ran this." If you're running every hour or maybe you're running it every day and you want to do that kind of thing. You can also materialize as a view, which is literally just like a database view. 

So that means when that model gets used again, or if Looker is using that, referencing that view model, the database actually has to recalculate the results of that... I'm missing along... Ephemeral. That's the one. Ephemeral, basically all it does is, instead of making it a view or really a database object at all, it just compiles the SQL... dbt will compile the SQL at runtime, and put it in a CTE or common table expression, which can be handy if you're reusing things, but you don't really want to have a view for it and you don't want to be persisting it and having to update it and that kind of thing. So I think that's hopefully a decent overview of all the things.

Gavin Johnson (08:52)

I mean, I think that covers it. It's simple in implementation. When you actually go in [crosstalk 00:08:58]. But the technology behind it and the way that they build the graph and do some of these things, like you were mentioning with the Seeds being able to exclude full groups of users just with a CSV file, and the way they materialized, the different options they give is very technically advanced in and just really good and super useful. Talking about materializing tables, I'm sure there's been 1,000 times five years ago when you would literally just write a piece of SQL that builds a table out of multiple queries to join data together. And you'd write that table here. You're doing your own materialization back in the day. You don't have to do that anymore.

You can literally leave it to dbt and make it consistent, and really, really good, and build on each other's work and not just have everything be big looping in 1,000 line expressions, like you said.

Alex Dovenmuehle (09:45)

Yeah. That's the nightmare we're trying to get away from.

Gavin Johnson (09:50)

Exactly. So I LookML is part of Looker. It's the modeling portion of Looker. Looker has two parts. One part is a bit modeling. It's LookML, and the other part is literally the visualization, which business users can use. LookML is just the language that describes the dimensions is aggregations and relationships of your data. It basically just describes your data model. It's a custom syntax. It's like YAML, but not really. Because I think it started with the ML. It's just a custom format, but it's relatively easy to pick up. It's nothing complex. And what LookML does well, is it separates the structure of your data from the content.

So if you don't need users seeing certain fields and tables, if they don't need to be seeing times that events were sent to the database or stuff like that, you can just hide that from them. It makes it very easy. And frequently analysts will be using dbt. I'm sure data engineers do use it to some degree as well, but it's really not a tool that does all the things data engineers need to do, which is the reason it really falls more on analysts. The basics with LookML are projects. Projects are literally any data source that you want to expose inside of LookML. You need to have a data source and then have a project for it. 

And once you have that project, it's literally just a structure. And the main thing that you have to have as a model. And in that model, you can reference the data source and tables itself to expose to users. But you need to define your fields, and it'll auto define the field. So it's not really a big deal. But one thing that you can do that's cool is, when you expose these, you can also do views, just like you were talking about with dbt. dbt can build views. You can literally just build a view, a SQL expression that can connect data from multiple tables or trim down data as you need, and use that as a data source in your models as well. 

So you can expose tables that literally live on your data warehouse or you can expose views that live inside of Looker itself. And then Explorers are what you do. Basically you can take the table or the view that you want to expose your users, put them in an Explore. And then when a business user logs in, they'll have an Explore feature, they can click on it and then they'll be able to explore that table or that view. And in your model, you also define what's visible not to them as I mentioned. So they only see the fields you want them to see. It makes it very, very easy for a business user to use, joins as you would expect a joins. You can join a different data sources or views. And persistent derived tables is actually the one thing about LookML that's interesting, and also something that they advise not really using. 

So persistent derive tables are, frankly, just materializations back into your database, your data warehouse. And I think they advise not to use them because they aren't really an optimized tool for doing that. And they say it should be the last option to use. So this is one of those things where you see it makes a lot of sense to use dbt to do that type of work, and then and then consume that data inside of Looker. And even Looker has an opinion that, "Yeah, you should probably do that and not use us to materialize back." So interesting that they do provide that functionality, but don't really advise anybody to use it.

Alex Dovenmuehle (13:17)

Yeah. That was probably one of those... Some customer was like, "Hey, I really need this." And they're like, "Okay, we'll build it." And then it turned into a bad idea that everybody wishes they had never actually build. Actually at a previously at Heroku, we were fairly early users of Looker. And this was before dbt times, the stone ages. And there were a lot of persistent drive tables that we had to use that... And especially they were mostly used for things that... They took forever to run, because they were crazy queries that joined a ton of stuff together. I mean, it was just a nightmare trying to... Either the query is wrong, or it takes forever to run. It was just a mess trying to figure out what was going on with these PDTs as we'll call them.

And so we ended up... After we did migrate to dbt, it was a project. It was like, somebody needs to go pull out all these stupid, persistent drive tables, put them into dbt, and let's never talk about PDTs again. So yes, I will strongly suggest not to go down that path.

Gavin Johnson (14:34)

I've read a few articles on the topic of dbt and Looker, and a couple other analytics tools, what you should use where. And the more than one person has mentioned that they've had to have full projects to rip PDTs out and go into dbt. So it doesn't sound like it's-

Alex Dovenmuehle (14:51)

Yeah, it's uncommon yet. Cool. So basically these two tools, they're both very useful. And they're close enough to where sometimes people think they can do things that they should really do in dbt and LookML and vice versa. And so we just wanted to highlight, okay, what are the differences here? And then we'll get into how should you really combine these tools to get the most out of each of them in a scalable way that also limits complexity, and is robust and all that good stuff like that. So dbt, I mean, it's literally just running SQL inside your data warehouse. So all this stuff is just in the warehouse.

And then the LookML, basically Looker is just parsing your LookML, and we'll generate within Looker when the business users are using it, it just generates the SQL and runs it on your warehouse. But then also with the PDTs, and don't do this, you can't actually purse these things in the warehouse. So then, again, just getting back to the dbt materialization, do use tables, incremental ephemeral, you've got tons of options on how you want to materialize your things. I think personally, I generally use incremental for most things. I can use tables if the data set is a little bit smaller and the runtime doesn't really matter if it's not going against a bunch of data. So it's a quick operation to completely refresh the table. I have started using ephemeral a little bit more for certain things, where it's just like those common little bits of code that you always...

The example I'm thinking about is, you've got users and then you've got a membership table, which is a joining table to organizations and your product, like access control kind of thing. You've run that query all the time, that little piece of code is all the time. So just get that all into one place in your dbt, and you can have an ephemeral model.

Gavin Johnson (17:09)

Because it makes sense to actually persist that, because you're literally just combining two pieces of data that already exists, right?

Alex Dovenmuehle (17:15)

Yeah.

Gavin Johnson (17:15)

Maybe use it, but it doesn't really make a lot of sense to persist. It doesn't take a long time to run. It's something that you just want to have there when you need it. Right?

Alex Dovenmuehle (17:22)

Yeah. And you definitely want to make sure just even from a data quality perspective, it's like, make sure that if you're doing an incremental build, you don't get any wonky results while it's doing that, especially depending on how your source data's coming in. We see sometimes product data where they'll hard delete some of these memberships. So you don't have a flag where you could actually incrementally build that table anyway. So that kind of thing. On the LookML side, Looker will cache the results where you to try to improve performance. I wonder how much time they spent on the whole caching layer, because I'm sure like everybody... Sometimes people complain about, "My Looker is slow." And it's, "Well, that's probably because your models aren't built right."

And we'll get more into dbt and LookML together, and how you can get around that in a second. And again, we're back to the PDTs. I guess if you learn anything from this, it's going to be don't use PDTs. That's all anybody's going to say about it. But you know what, we're doing God's work, man. So I'm okay with it. [crosstalk 00:18:27]. So then, again, dbt is all SQL. So all the permissions are just data warehouse permissions. You give it a user, you can give it a role. And I mean, Snowflake and BigQuery and Redshift and all that, they have sometimes different ways of handling permissions. But essentially it's like, if you can run the query when you're logged in as that person with that role, then dbt can do it too.

And they do have a pretty good... They show what permissions they need to run things, the base set of permissions, especially if you're running BigQuery, because BigQuery does permissions a little bit weird. So that's how that goes. And then with LookML, like you said, with the Looker roles and users and groups and things, you can control what data people are seeing. And that's actually been really interesting for us at Mattermost. Because with our implementation, we actually let board members have their own little section, and that's all they can see. So they can log into Looker and look at a board view of the data, which is pretty cool, right?

Gavin Johnson (17:22)

Yeah.

Alex Dovenmuehle (19:37)

You can tell your board guy, "Hey, you can just log into our Looker, and you can go play around and look at stuff." So there's a lot of cool stuff you can do with the Looker roles, limit people's ability to see if you're doing financial stuff and you, for whatever reason, don't want people to see that, or whatever. You can do all sorts of fun stuff with that. And I think that's about it all now. So getting into... When you're thinking about dbt versus LookML, you're really trying to manage complexity in a lot of ways. And it's, which tool is best at dealing with which kinds of complexity.

And so, really to me, dbt is the tool that you would use to really do those complex transformations, especially because then you're going to have your data engineers or your analytics engineers work on it, who they're going to be really close to that raw data. They're going to know that raw data. And so when they're building their dbt models, they're going to know all the nuances of how all that data needs to be combined together, massaged, filtered, transformed, all that kind of stuff. One thing that we like to do a lot is build models that are based on the raw data, but make them into models that hide the complexity.

So the models that we build that turn into tables, a business user could query them. Because we know not only the raw data, but also how the business thinks about their data. You can create these models that they could have all sorts of crazy stuff going on under the covers, but really this one model is really easy to query and it limits the ability of people to get the wrong answer, is always the thing I'm striving to do. And so that goes into the data cleansing and aggregation. What the business user wants to see is not having to query where you have... At Heroku we had billions of events going into this event table a month, and it's like, no business user is going to go through that stuff. It's ridiculous. 

So with dbt, you can cleanse that data, aggregate it up to the level that actually makes sense and has value. And we'll get into this in a second, is you add LookML on top of that, have it referenced those built up clean models. And then all of a sudden, all your stuff in Looker becomes a lot easier too, because you're not having to make weird concessions to deal with weird data. It's all being handled at the dbt level.

Gavin Johnson (22:28)

Yeah. You don't have to play around with the models in LookML anywhere near as much if you're using dbt.

Alex Dovenmuehle (22:33)

Yeah, exactly. And so with dbt, and really in general, just the way we think about it, especially the warehouse first thing and all that stuff, it's like, you've got all this data all over the place. You've got marketing data, Stripe data, you've got Salesforce, you've got the product data, just data, data, data, everywhere, all day. And it's from all different kinds of sources that don't really combined very easily. And that's where dbt really shines, is I can manage the complexity of joining all this data together, and making something that actually can join across all these properties. And again, that you can build all this models up and then that gives you a simpler implementation on the Looker side.

Gavin Johnson (23:25)

And if you think about even beyond Looker, if you are using a reverse ETL tool, we can be talking about user profiles, we can be talking about segmentation and stuff like that too. Having a place where you could use one piece of analysis that's very applicable across multiple business lines or multiple groups that want to use it, is really worthwhile. So I think that's a really nice way to use dbt, especially with the new tech that's coming out, reverse ETL. It turns into a lot more than just a data cleansing tool at that point, which is where a lot of people are bucketed at. And that's not at all the case. The modeling it can do now becomes extremely powerful.

Alex Dovenmuehle (24:08)

Yeah. And in fact, with the reverse ETL tools, now that we've started to implement those for some clients. And our approach now has been basically building models that are specific to the reverse ETL tool in a way where it's like, I know that the reverse ETL tool is expecting my data format to be exactly this right. And so I'll use dbt to get all my data in, however it needs to, and then get it into just this one table. Because not only does that... With the dbt docs, you can see the lineage graph of how did you get to that data. But also it makes it super clean where you're like, "If I'm looking at this kind of model, then I know that this one's for reverse ETL or warehouse actions," however you want to do it.

Gavin Johnson (24:59)

Yeah. It's a very, very useful tool now with how data engineering and how it's growing. The way that the technology is growing and maturing, dbt is becoming more and more important.

Alex Dovenmuehle (25:12)

They love dbt.

Gavin Johnson (25:14)

They do. So what does LookML do best? LookML is great at expanding data access. So if you have a clean transform data center table that you sent through dbt in your data warehouse, that you want... Once again, like you said, if you want your board members have access to it, or if you want specific groups inside your businesses have access to it, that you don't want to have access to other types of data, you can give them access via these rules. And you can really expand who can see your data and who can understand it.

Also for business users, just frankly, the ease of use of building looks and dashboards inside of Looker with the data that you provide them, makes it super easy for people that are not data analysts to use this tool and derive value out of it. So there's a very real valuable place for LookML. It makes data exploration super easy too. Sometimes I can tell you... And Martin, you don't 100% always know what you're looking for. We had an uptake in conversions, and we have ideas why that might've happened. But being able to explore and look at your website traffic, as well as your conversion traffic and understand where that traffic came from and why that happened, it makes it easy to do that. Also, the data types in LookML are super nice for people.

They have advanced data types, like distance duration, locations, zip code. And they also break down all date times super well. Handling date times in SQL is a nasty mess. But it's ultra easy in LookML if you want to group by the hour, or something like that, or join by the hour too. So it's nice for some of that stuff. And also, it's good if you need to apply a view or transform data that you want only for a specific visual, or a specific data set for a specific group of people. Something that's probably... Once again, as you were mentioning, a group or somebody may want to know what you would write into an ephemeral table might be a decent thing to build into a view inside of LookML, like a user versus the account or the organization that they're in. 

So it's something to go with people up relatively easily, but you wouldn't do that inside a dbt and make it ephemeral table that disappears. Something that's more narrow or more less used, I guess would be a good way to put it, is a good place for LookML to come in. But still, as Alex mentioned it, if you can just get rid of that complication and do it in your data warehouse with dbt, that's the better approach at this point.

Alex Dovenmuehle (27:50)

And that gets us full circle to dbt and LookML better together. At a lot of our clients, we're always trying to push Looker and obviously dbt. dbt is the kind of thing where it's like, if a client was like, "No, we don't want to use dbt." I'd be like, "Well, we don't want to work with you." It's a non-starter not to use it at this point, I think. And again, that's because it can deal with so much complexity in dbt, and build these scalable data models. And also, especially for our purposes as consultants is, being able to leave behind, once we're done with the engagement with somebody, something that's really easily understood and somebody who... I mean, they have to understand SQL and some data and stuff, but it makes it a lot easier for them to onboard and explore how's the data being transformed, and where's it going, and where has it been.

So in my mind, you want to reference all your built up aggregated dbt models in LookML as opposed to those raw tables. And you create all these models in dbt, Looker is just looking at the cleanest pristine, everything's amazing, the data's clean, all that stuff. And that'll, really, it'll make your analyst's life easier too, is they don't have to deal... It's like, "Here's your clean data model. Go nuts." They can do what they should be doing best, which is supporting the ability of these dashboards that stakeholders are wanting, or also being able to dig into the data for more insightful exploration things, which is what Looker is good at, and that's what you want to use it for.

So with that, we've covered it. But all this data, access governance, and data governance, and Looker, who can see what. And also because of the way that you're defining your explorers, and views, and dashboards, and things in Looker, you're limiting the mistakes that a business user can make. Because we all just want things to work. I mean, even me who's a technical person, I can figure a lot of things out when something's not going right, even if it's just my computer, my phone is not working. But it's like, I don't want to have to deal with that. Give me my data and I want my numbers to be right. And I don't want to be able to mess it up. 

So that's really the benefit of Looker there. And then obviously with Looker, I mean, you can't really get better... Just being able to create dashboards and just click around and doing filtering and stuff. And like you were saying with, especially the date filters and things that you can do with dates, is so much easier to use than anything else that I've seen so far.

Gavin Johnson (30:50)

Yeah, it is. It's super, super easy to use for that type of thing. And that makes visualization really, really easy. So, we covered this in the presentation, but we might as well walk back through it. If you want to dig into dbt into using it, and really starting to try it out, the same with Looker, if you want to try it out, if you want to really start using this and leveraging it for your business to help your business do well, or if you already are. I think the first thing you really need to do is you need to build your customer data lake, and RudderStack is the best way to do that.

Sending your event data as well as data from your Cloud tools like Salesforce or Marketo into your data warehouse, so that you can do some of these advanced transformations on them. So you can have really enriched data that you can drive real value out of, is what RudderStack excels at. We excel at bringing that data to your data warehouse, and then also helping you activate it and send it to the tools that you need. So start using RudderStack, part one. Part two is, start using dbt, start trying this out, and start trying to clean and transform and prep your data with it. Because I promise you, once you start trying it, and I bet you, Alex, can back this too, it's going to make your life a lot easier. 

Just being able to make things nice and set for the way the business user needs to see it, and take all of the complexity out of it for them so they only get a focus or only need to focus on what they're best at, or what they really care about, or what they need to be best at, is extremely valuable. And then lastly, control your data access and let people build the visualizations in Looker. Let them go wild. After you've already built the models, you can let people do whatever they need to do to find out what do they need to find out.

Alex Dovenmuehle (32:27)

That's the dream. Get on. Just don't use persistent drive tables.

Gavin Johnson (32:33)

[crosstalk 00:32:33]. So at this point, I guess we can... If has any questions, you can raise your hand. There's no hand. Questions? Also, I think we have a chat. So if you want to type your question in the chat, rather than asking it, you can do that too. All sorts of stuff. So, I guess, Alex, while we're waiting on it, anybody have any questions, I guess describe at Heroku what you were using Looker for. And I guess when you were introduced to dbt, how did you get introduced to it?

Alex Dovenmuehle (33:19)

It's actually funny. It was actually a data engineer on my team was like, "Hey, have you heard this tool dbt?" I was like, "No." And so we started looking at it, and I told him, I was like, "Dude, this looks great. You should just totally go nuts with it." And so I gave him... I think I gave him most of a month to start doing it. And after seeing what he was able to do... Because back in those days, we just had SQL everywhere, like everybody does. And just being able to see the benefits of it, it was like, we have to go whole hog on this. And so now we basically are just going out to all these clients to make that happen.

Gavin Johnson (34:15)

So we got a question. The question is, can you give an example of RudderStack, dbt, and Looker together, with some micro stack data? I don't have an example set up with macro stack data, mainly because I don't have access to our dbt because I'm not an engineer. Is the primary reason. Our team does use dbt to build tables. I can jump in and show you what some of that data looks like in Looker, if you'd like. But I don't really have an example set up. Alex, though, you've implemented this a bunch of places. Can you give an example of one that you've set up before and what people used it for, how they use it?

Alex Dovenmuehle (34:50)

Yeah. I mean, like I said, we use... Well, I mean, at Mattermost, we're using RudderStack, dbt and Looker. I'm trying to see if I can pull up... I don't think they'd like me showing internal data. So that's the only issue. [crosstalk 00:35:09].

Gavin Johnson (35:09)

It's also hard for dbt too. dbt is a command line [crosstalk 00:35:12].

Alex Dovenmuehle (35:12)

Yeah. Well, and I want to show, if I could find it, the doc site. But I am not finding it at the moment. I mean, that's one thing that's been really cool. Let me see if this might lean on, maybe. Give me just one second. So as I'm trying to find an actual example, here we go, of dbt. I'll answer this other question from Peter. To what extent would you consider dbt and Snowflake as a very powerful code generator for Snowflake operations? For example, if I'm working entirely in Snowflake, is there any dbt runtime companion, or is everything running directly in snowflake? And that's actually interesting question. That's actually something I've thought about, and hopefully I'm understanding your question correctly. But yeah, so by default, dbt is literally just running SQL. So it's just running SQL in Snowflake.

But I have started to think about, what's the next level after dbt? Because if you think about dbt as an abstraction over SQL is Snowflake... Oh, my God! Can you make an obstruction over dbt, which I think is what you're getting at is, can you do stuff where... The way I'm thinking about it is like, dbt is basically generating SQL, but what if you had something on top of dbt that generated dbt stuff? Now you start cooking. But anyway, that'll be for another session, because that stuff gets a little bonkers. Do you care if I share my screen? I don't even know if I can share my screen.

Gavin Johnson (37:06)

Let me stop sharing mine and then, I think, you can share yours. Give it a try.

Alex Dovenmuehle (37:15)

There's this one...

Gavin Johnson (37:18)

I'll see if I can just let you share the screen.

Alex Dovenmuehle (37:20)

Yeah. Oh, man. I'm trying to find where this... Here we go. No. Man, I'm going to be so sad about myself if I can't find this. Where does that go? Config details? So what I can show you, at least, is... Yeah, here it is. Here's the boy. All right. Let me share here in a second. Oh, God! Is my Zoom blown up? Here we go. All right. Well, actually, before we just blow your mind away with this thing, this is the dbt docs website. So this gets automatically generated based on your dbt code. And so what's really cool about this, as you can see is, it's showing you the columns data types, it's giving an approximate size, all this stuff that you might be interested in knowing.

And again, if I'm coming in as a new analyst, it's like, "Wow! This table has 34 million rows. That's a pretty good amount." You can also show... You can just click here like, "What is this reference by?" So this is referenced by this thing, but look at what it depends on. And then you can also see the code directly in it as well.

Gavin Johnson (38:51)

We should have mentioned that this is the host of dbt, like a Cloud interface. dbt is also an open source command line tool as well that you can basically run all this stuff, but see all the nice stuff that Alex is showing you, you don't get much of that with it.

Alex Dovenmuehle (39:04)

Yeah. And so, this is what I was talking about with the lineage graph. I mean, I'm showing you guys the most ridiculous model that we have, partially because the data that Mattermost has is just bananas. But what's really cool about this is you can click these different things and it'll show you how the data's flowing. And then you can see how is this getting all joined together? And I can just explore. And now I know how does this all get built up? And where does this model live? And one thing I will also note is, this is not... Nobody wrote thousands of lines of code to join all this and union it together. This is all using dbt macros, which is something we didn't really cover too much. But dbt macros, you can actually run queries on your database that then you can use the results of those queries to then generate more queries, if you will.

So basically what this is doing is querying a database schema for all the tables in it that match a certain pattern, basically. I don't think we're using regular expressions, but you could. And it generates all this union stuff and that's how we get... And the whole point of this is to reduce complexity. So this is this crazy data model that you have to take from... I mean, what is this? 100 tables, 50 tables, and then you're going to get that down to one table that's actually consumable in Looker. And so that's the real power.

Gavin Johnson (40:46)

That's what dbt alone does. So the power of dbt, you can [inaudible 00:40:50] the command line tool. That visualization, though, is super cool. Because if you think about it, if you work on especially the origin nodes of that, you want to know what that could possibly impact so you can see for errors in the downstream actual results of using that model. So that's good. Also, it makes it super easy to see, "Hey, we have something going weird in this table." You can literally just go and see where it came from. It makes it super easy to figure out where problems are, and identify them, and fix them. And if you do mess something up, it's all version controlled, I believe, too. So it makes it [inaudible 00:41:27] too.

Alex Dovenmuehle (41:28)

And also you can... Let's say you totally mess something up and you need to rebuild a whole set of those tables, you can use their syntax and do... They call it a full refresh. And even if any of your tables are incremental, it'll instead, completely rebuild them from the ground up. So that's pretty useful. You're not having to think about, "Oh, crap! I got to rebuild this table from a month ago to now," or whatever the time ranges that I messed up. You can just completely rebuild it. Let's just do it. Start over. A couple of comments I saw in the chat. That is correct. So the dbt CLI will generate the docs locally, and it'll run a little server locally too. But obviously the nice thing about the dbt Cloud is, well a, you don't have to host it anywhere, and that's nice. And it all auto automatically gets generated.

And then you also have your regular access control, things like that. And also, I mean, somebody mentioned SQL Plus Jinja. I mean, that's really... dbt is basically just SQL and Jinja together and a templating language for those who are unaware. And because of that, that's where you really get the power of... Like I was saying earlier with these macros, you can do some really cool stuff that let you loop over datasets and create... It's basically like you're generating even more queries, and you can generate even more complex stuff without the code being. So the output of it is crazy. The underlying SQL will be crazy, but the output and the tables that could generate it aren't. And again, that's the whole point of dbt, how do I take this complex data set that I have and get it to somewhere that's easy to use? And then again, when you're looking at ML, you have something that's really easy to use on that side, that doesn't have a bunch of craziness that you would have to otherwise do.

Gavin Johnson (43:34)

Yeah.

Alex Dovenmuehle (43:36)

And I will drop these in the chat, if anybody wants to look more at Snowflake and dbt together. So the cool thing about Mattermost is, it's all open source. And so we've opened source all of our dbt code and our Looker code. I mean, this is a... Mattermost is a series C company or something... Fairly large company. We have a lot of data. This is what it really looks like to... Here's where your dbt models and your Looker models are going to look like that really pulls all this stuff together. So if you guys wanted to go check that out... I mean, again, it's all open source. If you guys want to it, go nuts.

Gavin Johnson (44:33)

I think the only thing we missed in the chain was RudderStack, and that's because RudderStack is so simple. It's literally like using... To event stream in RudderStack, it's literally like using Google analytics. You put a Java script snippet on your site, or if you're using Gatsby or another platform, we have a lot of plugins for those. And it will send a [inaudible 00:44:52] data to RudderStack. It's super simple to use. And then once it's hit the RudderStack, you can send it to your warehouse and whatever other tools you want. You can send it to Google Analytics, you can set it to GA4, you can run both side by side, and you can send it to a ton of other tools. I think there were 100 integrations or something at this point. I can't remember the exact number. But yeah, it's very useful, and it's super easy to set up.

And then you can just literally set up a warehouse to destination, and your data will go there. And then you can start building on it. I could go through and show you guys how to hook it up, but it's really not super exciting. And frankly, that's great. That's what you want a data pipeline to be. You want it to be, go in, make it super easy to do it, and then never have to look at it again.

Alex Dovenmuehle (45:33)

Actually, I'll bring up my my Mattermost experience, because I think it's a pretty good story. So when I joined Mattermost they were using Segment. And because of the pricing model and it's an open source product, so there's a lot of people that use Mattermost just on their own servers and stuff, and we get telemetry from those. So, I mean, you're talking about thousands and thousands of servers daily that are sending events in that kind of stuff. So before I had joined, they had limited how much data they were sending, how many events they were tracking, because they didn't want their Segment built to be crazy high. And then I was like, "Dude, we need to get off Segment." And that's when RudderStack came in, and it's been really, really easy to... It was so easy to migrate Mattermost over from Segment to RudderStack.

I mean, it really didn't take barely any engineering time. And then we were able to turn back on all that telemetry, and then we were able to implement it on docs.mattermost.com, www.mattermost.com, just like all the web properties and all this stuff. And by doing that, you get all this data in. And again, with dbt, you can then start joining it all together. How did this person who visited the website last week, did they go to the docs website? And what did they do there? And then how do they use the product once they actually got into it? What's their journey and all that stuff. I mean, obviously I'm a RudderStack fan.

Gavin Johnson (47:09)

It's a super useful product. And a lot of our customers were prior segment users, a whole lot. Because we do what they do really, really well. And we do a few things more than they do, and our pricing model is much nicer.

Alex Dovenmuehle (47:23)

I agree, for sure.

Gavin Johnson (47:26)

So I guess, are there any more questions? We'll give you guys another minute or so. And then if there aren't, we'll close out this webinar.

Alex Dovenmuehle (47:35)

Sounds good.

Gavin Johnson (47:39)

So you've mentioned Big Time Data. I guess, it might be worthwhile for you to mention what you guys do. You alluded to it a few times. You haven't explicitly said it. So what do you guys do with Big Time Data?

Alex Dovenmuehle (47:50)

So I think the main thing is really being able to helping companies actually leverage their data that they have, and helping them also collect more data if they need to. But what we noticed was, every company that we would go to, they had all this data all over the place. Like I was saying earlier, Marketo, Stripe, Salesforce, blah, blah, blah, the product data, the event streaming data, the web data, all this stuff. And they could do some slight analytics. But mostly it was, "I'm going to go try to figure out what's going on at Stripe, and just this little tiny piece." So what we do is, let's get all your data into a data warehouse. Go for that data warehouse first approach.

We're going to use dbt to transform that data, like we've been talking about here, into something that's actually useful, and then we'll put Looker or another BI tool on top of it, but actually let you get to, so that stakeholders can actually view, understand, and actually have impact on your business.

The whole point of it is how can I actually make actionable decisions based on this data, and then see, after I've made that decision, the uplift or whatever that you're looking for in the data as well? And then the next part of it is, with the reverse ETL warehouse actions thing is, let's go enrich some of these other tools... No, I'll use the example of Salesforce. It's like, you've got all these salespeople who were in Salesforce all day. They don't want to go clicking around to 100 different things. Just give them a little summarize data about the customer that they're talking to, so that they can have an intelligent conversation when they have a call with that client.

And so basically you transform the data with dbt. You can use RudderStack warehouse actions reverse ETL, and then get that data into Salesforce. And you know that the data's cleaned because you've vetted all your data models through dbt. And so it makes it so that people aren't having, "I saw this number in this one. I saw that number in that tool." And then what really ends up happening is, nobody trusts the data except the data that's in their tools. So the salespeople are just like, "I don't care about that. I just look at Salesforce," you know what I mean? So that's the thing that we...

Gavin Johnson (50:15)

You know what else it helps with too, actually... You mentioned pushing back into Salesforce. But frankly, if you think about a lot of the times now when a marketing ops team, or let's say a demand generation person [inaudible 00:50:27]... Frankly, let's go back to a product manager. It doesn't matter. If anybody from any of these different groups across your company wants data, a lot of times they'll send an email to an analyst, like, "Hey, could you send me a spreadsheet? Could you me a Google sheet? Could you just send me a dump of the data, so I can use it, so I can do segmentation or something else." And then they go and they put it in a CSV, and then they upload in whatever tool they want to use it in. And it works, but you get updates whenever people get around to it, once a week, once a month. And you have multiple people in a chain doing transformations to data.

So if something goes wrong, you don't know who broke it or where. But then dbt, and to do the transformations, and then something like RudderStack warehouse actions, and to actually do the reverse ETL and pushed in, and do it on a continual basis instead of just doing it once a week, and doing it automatically, instead of having people handling it, it's a more streamlined approach. You should not have Spreadsheets and emails and your freaking data pipelines. Your data pipeline should be coded and automated. And that's what we do well. So yeah, that's interesting that that's how people were using it. And it makes sense. And I'm super glad that we were able to make that, and that you guys are using it, your clients know.

Alex Dovenmuehle (51:38)

Yeah, man. It's all the rage these days.

Gavin Johnson (51:41)

It is. Reverse ETL is so hard. Anyway, let's close out the webinar. It was good talking to you, Alex. It was good. Everybody, thank you guys all for attending. And we'll talk to you all soon. Have a good day.

Alex Dovenmuehle (51:53)

All right. See you.