Data Warehouses versus Data Marts
In the worlds of business intelligence and outcome modeling, the terms data warehouse and data mart are often used interchangeably. The differences are worth knowing, though, so in this post we’ll compare and contrast the two. For in-depth analyses of the respective concepts please see our posts Key Concepts of a Data Warehouse and Key Concepts of a Data Mart.
What is a data warehouse?
A data warehouse (DW) is a type of data collection, created by extracting and combining data from multiple sources into a single target. A warehouse is designed to tackle objectives from business intelligence (BI) analysis to managing inputs for machine learning. It is often used in support for strategic decisions, that is,for model generation and predictive analytics. It is sometimes referred to as an enterprise data warehouse (EDW).
What is a data mart?
A data mart is a subset of the total information held in a data warehouse.
Logistically, a data mart is a curated subset of all the data, tailored for a specific line of research, serving the needs of a single department or business goal. Given their smaller scope and storage footprint, data marts are usually cheaper and faster for querying.
Conceptually, the data warehouse is data-oriented, whereas the data mart is project-oriented. The warehouse, as the name suggests, aggregates data for an entire business, while the mart aims to satisfy a niche group of customers.
Unsurprisingly, given its larger scope, the process of designing a data warehouse is complicated and takes a good deal of time. However, the effort put into a data warehouse pays off when designing a data mart. Given that the warehouse data sources are well understood, designing a data mart is a straightforward process of cherry-picking the data.
Comparing a data warehouse to a data mart
While the above may satisfy a cursory need for understanding the differences between them, let’s delve into more detail. Remember that these are how the industry uses these terms in general; your specific needs and implementation may differ.
Scope of collection
As mentioned above, the process of collecting data for a warehouse has great reach, spanning many different sources. Cleansing, sanity-checking, and transforming the collected data into a well-defined aggregate takes time, network and computing bandwidth, and (therefore) money.
Extracting a subset of this cleansed data from the data warehouse into a data mart is relatively trivial by comparison.
A data warehouse and a data mart have different audiences. The warehouse is a resource available to the entire organization at large, as inputs for machine learning and as support for strategic decisions through model generation and predictive analytics; in short, all the business intelligence (BI) needs.
The data mart, being a curated subset of all the data, is extracted from the warehouse with a specific research goal, for a specific department, or to support a single business goal. There may be data marts for sales, finance, marketing, and engineering.
In both cases the data is read-only, with consumers able to sample data without the ability to change the warehouse or the mart. This protects the data and enables more widespread distribution of it.
The lengthy, challenging task of designing and implementing a data warehouse is necessary to provide a single integrated data source that paints a comprehensive, coherent view of the historical data and decisions made by the business.
A data mart, on the other hand, is designed to provide a single business division with exactly the data required to make an informed decision on a single (or related) series of topics.
It is precisely because the data warehouse captures a large part of the business surface area, which usually comprises many systems working with their own native data formats, that the undertaking is formidable. A data mart takes advantage of all the work done on the warehouse, and is relatively trivial to design, implement, and populate.
Different types of decisions depend on different types of data. The data warehouse supports strategic decisions. The data mart does the same for tactical decisions.
A strategic plan looks to describe both an organization's vision and its mission statements. A strategic plan is a broad, long-term look, drawing on information from finance, operations, and a clear understanding of the external business environment.
A tactical plan answers the question of how to achieve an element of the strategic plan. It consists of short-term, narrowly-focused action items, targeted at business units or departments.
Many different types of data are stored within a data warehouse. This is because future needs aren’t yet known, so “everything” needs to be captured, resulting in a heterogenous variety of data types and schemas.
A data mart, being built for a particular need or audience and containing a tiny subset of the warehouse’s data, has a more homogenous data schema.
Data storage topology
A data warehouse is an integrated, time-variant, and non-volatile collection of data. “Time-variant” means the warehouse’s data is tied to a particular time period; it may be loaded daily, hourly, or some other regular periodic schedule. Within that period of time, though, the data is consistent and does not change.
The consolidation of so many different types of data structures from a wide variety of sources requires a more technical data storage solution. It’s not uncommon to use complex designs, like star, centipede, or snowflake schemas.
Data marts — pieces of a warehouse
Data warehouses and data marts are essential to the strategic and tactical decision-making process of a business. While they both support business intelligence analysis, large-scale data collection has to be broken down to manageable subsets for particular use cases. This fractional dataset is represented in the data mart, which can feed a specific team or department the machine learning, model generation, and predictive analytics needed for their tactical decision making. A well-designed data warehouse can provide the modular slices of the whole data pie on a case-by-case basis to the data marts