Building a Customer Data Platform on Your Data Warehouse

Blog Banner

Why a Customer Data Platform?

The promise of Customer Data Platforms(CDPs) is to unify all the customer data into one platform. They promise to enable interesting use cases, from typical reporting to advanced user activation scenarios, such as driving personalized UI experiences and giving out offers based on customer’s behavior.

Most of the CDP products out there started as something else and morphed into what they are today. There are good reasons for this. Customer-related data is always important, not only to understand the behavior of the customers but also to act upon it. Also, customer-related data tends to be big, raw, and full of problems and inconsistencies. This data should be consumed by people without technical skills, such as a growth manager. These are some of the reasons that made the existence of CDPs viable in the past two to three years.

The Need for the Data-Driven Enterprise

Today it’s possible to literally capture the entire journey of your customers, from what products they searched, to what they clicked or ignored, to an eventual purchase. Also, it’s possible to enrich this journey with data coming from every customer touchpoint the enterprise has. The touchpoints could be anything from customer support and management systems to marketing platforms and invoice management products.

This rich dataset can enable a lot more use cases beyond the traditional reporting for enterprises with limitless possibilities. This is currently driving the development of a number of products that together form a complete data stack or infrastructure that enterprises can utilize to capture, analyze, and productize this data set.

The cornerstone of this data stack is the data warehouse. This technology has evolved so much in the past few years that it can now support almost infinite scale, the separation between storage and processing, advanced analytics functionalities, and even machine learning. Hence Cloud Data Warehouses(CDWs) are on a rage, being one of the main enablers of data-driven enterprises.

It’s no surprise that one of the most valuable startups in Silicon Valley currently is Snowflake, a CDW provider who just raised $500M at a $12B valuation.

Why Build Your own Customer Data Platform on Your Data Warehouse

As CDW technology evolved, technology today exists for building a custom, scalable, and efficient CDP. But why would someone prefer to build their own CDP instead of buying and using an off-the-shelf vendor solution?

There are plenty of reasons, but we would like to focus on the following.


No matter how mature an off-the-shelf vendor solution might be, it will always be a generic one. Customers are unique to each business, even in the same market. How to define a customer? What sources of data the business has? What data models are used? How should the customers be identified? These are just some of the issues that a CDP has to address. There’s no better way to answer these questions than to have access to all the raw data a company has collected in a data warehouse or Data Lake.

As businesses are evolving, more people need to interact with customer data. Everyone needs to probe the data. This includes marketing managers who use BI solutions such as Looker as well as data scientists who use Jupiter notebooks and advanced SQL. No one knows who’s going to be using the data in the future but it’s almost certain that the list will just grow longer. Solutions like a data stack built around CDW or Data Lake guarantee that the company has a flexible infrastructure that allows data access to everyone.

Data Privacy and Governance

We are entering an era where more and more control will be put around data from different actors, including the government. Companies will have to comply and be ultra-protective of the data they manage. Anything that is generated by the customers will be considered sensitive information. Adding more and more products around the data stack, where data has to be replicated across different tools makes privacy and data governance a tricky job.

One of the best ways to address this issue is to rely on well-defined and industry-accepted data architecture. Here, the data is stored in well-defined repositories such as a Data Lake and a data warehouse, and strict policies around data access and security are implemented.

Thus, implementing your Customer Data Platform over your data warehouse will also contribute to the data privacy and governance of your company.


We always focus on new value creation, but reducing costs is another important impact technology has. Putting together an infrastructure to process this volume of data costs money. Add to it the SaaS margins, we are looking at hundreds of thousands of dollars of investment to buy and deploy third-party SaaS solutions. Instead, by building a Customer Data Platform on an existing data warehouse, a company can reuse already-procured cloud infrastructure.

But Data Warehouse Infrastructure Alone is not Enough

The data infrastructure a company has is completely useless without data. Hence, the layer of the data stack that captures, collects, and delivers data is extremely important. This layer is responsible for the following:

  • Help data engineers consistently and securely collect the event stream data from the source (website or app) into the data warehouse.
  • Allow the analysts and data scientists to drive the advanced analytics and activation use cases discussed above. Again, this layer should provide consistency and data quality-related guarantees. This will allow the above data citizens of the company to focus on creating value instead of figuring out if and how the data should be used.

The importance of this layer running on top of the data warehouse is paramount. That’s what we are building at RudderStack.

Interested, talk to us.

Sign up for Free and Start Sending Data

Test out our event stream, ELT, and reverse-ETL pipelines. Use our HTTP source to send data in less than 5 minutes, or install one of our 12 SDKs in your website or app. Get started.

February 26, 2020
Soumyadeb Mitra

Soumyadeb Mitra

Founder and CEO of RudderStack