Feeling stuck with Segment? Say 👋 to RudderStack.

SVG
Log in

Feeling stuck with Segment? Say 👋 to RudderStack.

SVG
Log in

Blogs

Why it's hard to build a 360-degree view of your customer

Written by
Soumyadeb Mitra

Soumyadeb Mitra

Founder and CEO of RudderStack
Eric Dodds

Eric Dodds

Head of Product Marketing
Blog Banner

Developing a deep customer understanding and disseminating it throughout the business is a primary goal (and challenge) for companies using data to drive commerce today. Because the more you know about your customers, the more value you can provide, and the more value you deliver, the higher your customer's lifetime value.

Data is the foundation of this deep customer understanding in the digital age. The mechanism to deliver this data-driven understanding is often called a 360-degree view of the customer, or customer 360. This coveted single source of truth can drive significant value, from uncovering previously hidden insights to enabling targeted marketing campaigns and fueling powerful ML use cases like personalization.

Every company wants to enable these use cases to improve business outcomes. However, anyone working in data knows that building a 360-degree view of your customer is no small task. But why is it still so hard in a world with modern data tooling? To answer that question, we’ll dive into a bit of history. Then, we’ll unpack the technical challenges behind delivering a customer 360.

The failure of SaaS CDPs and the rise of data infrastructure

Collecting your organization’s customer data in a single location is the first challenge to building a unified customer view. For most companies, this means aggregating data from various sources spanning websites, applications, and cloud tools. Data integration in and of itself is a serious technical challenge, but it’s just the beginning of building a customer 360. Traditional customer data platforms emerged to simplify this work and make customer 360 easy, but they never lived up to the billing.

These legacy CDPs were built with limited integration flexibility and proprietary customer profile models, so they ultimately failed to deliver a golden customer record. In fact, they exacerbated the underlying problem because they created additional data silos.

The failure of these legacy CDPs reminded companies that building a customer 360 is fundamentally a data problem. That’s why we argue Data and Engineering Teams should own the CDP – a trend currently playing out across the market.

Thankfully for these technical teams, the recent commoditization of customer data pipelines and the scalability of cloud data warehouses makes it easier to pull every bit of customer data into a centralized store.

But creating a customer 360 and delivering value with it requires mastering the whole data activation lifecycle, and data collection is just the first step. After collecting your customer data, it must be unified into complete customer profiles to build a customer 360.

Why is it hard to build a customer 360 on the warehouse?

Before we dig into the details, let’s define what a 360-degree view of the customer looks like in your data warehouse. On a basic level, a customer 360 is a table with one row per user and columns representing everything you know about that user. These columns are often called user traits or attributes. Traits generally fall into a few categories:

  • Unique identifiers are the unique IDs for your users from every tool. These could range from web session anonymousId values to customerId values from payment systems.
  • Known user attributes are all the data points about a customer pulled from various tools and data sources. These are often demographic (job title, age, etc.), behavioral (lead source, product usage, etc.), and stage or state-related (sales opportunity status, active/inactive status, subscription status, etc.).
  • Computed traits are traits calculated by combining data sets that contain information related to the user. These are also called user features. User features are often related to key business metrics and are used to drive insights and optimizations. Examples include total revenue per user, last ten products viewed, and average time between logins. Some of these features are standardized. For instance, every eCommerce company needs features related to products viewed and abandoned carts. Other use cases and business models require custom features.

With a complete repository of every user trait and feature, shipping projects like full user journey analytics, granular marketing audiences, and advanced personalization become easier and faster.

Producing this unified table might not seem too difficult, but computing customer 360 is more complex than it looks.

Create a Customer 360 with RudderStack

See how you can use the Warehouse Native CDP to overcome data collection, unification, and activation challenges.

Customer 360 challenge 1: User identities

When it comes to data, the typical user journey begins with anonymous activity on a website or app before a user eventually signs up, logs in, or makes a purchase. When that key identifying event happens, the user provides a known identifier (or set of identifiers) like an email or phone number. As a result of this anonymous to known transition, the anonymous events are associated with anonymous identifiers like a cookie ID or device ID before they get associated with the known identifiers (email, phone).

To construct this single user's journey accurately, you must logically combine the events associated with the anonymous ID and those associated with the known identifier into a single timeline representing the user's entire journey.

Computing semantic user features like ‘number of products viewed before first purchase’ requires this logically combined user journey, not the raw events. For most companies, though, managing anonymous identifiers and known identifiers, then combining the raw data, is a massive undertaking.

Things get more complex in multi-device scenarios where the same end user can be associated with multiple anonymous identifiers, such as a cookie ID on a browser and a device ID on a mobile device. You must tie all of these identities to one end user. To make things more complex, these associations are most often discovered over time.

Consider the following scenario. If a user creates an account with their email address in a browser and then creates an account with their phone number on a mobile device, you may not know they’re the same user until they give you additional information – perhaps they provide their phone number during a checkout event in their browser after logging in via email – you can use to tie the email and phone together.

Even for this basic case, the required logic is significant:

  • Anonymous activities in the browser must be associated with the email address after the initial account creation in the browser.
  • Anonymous activities in the mobile device must be associated with the phone number after account creation in the mobile app.
  • All activities (anonymous and known) and unique identifiers (email, phone, etc.) must be merged into a single timeline and user profile representing the single user’s journey.

Stitching identities like this requires maintaining an identity graph and computing transitive closure. Performing this identity resolution process in SQL is non-trivial.

Further, this stitching must be done for event-based user journeys and for data extracted from other SaaS sources like marketing tools, CRMs, customer success tools, and payment systems via ETL. Some of this data is event-based (like payments), while some are basic relational data (like lead records from a CRM). This requires you to make decisions about timestamps on relational data and how to structure joins into a single table.

If that wasn’t enough, everything we just described refers to deterministic identity stitching, but there are many use cases for non-deterministic or probabilistic identity stitching.

It’s no wonder that this is the first significant roadblock companies face when they set out to build their customer 360.

Customer 360 challenge 2: Event semantics and metadata

Event Semantics

Data points with timestamps (events) are fundamental for building a 360-degree view of the customer, but user traits and features in the customer 360 table rarely map 1-1 to events. Features are highly semantic, involving multiple dimensions and events, which create additional challenges for the team building them.

For example, a feature like ‘user lifetime revenue’ requires summing transactions from the website and mobile app plus any subscription revenue and reconciling with financial transactions from the payment system. This seemingly simple use case requires working with multiple events from four different data sources and performing numerous mathematical operations.

Even individual events have semantic complexity. An ‘added to cart’ event could occur on different first-party platforms (web and mobile), third-party platforms (like affiliate sites), or even at different positions on the same page (header or features section).

Semantic features and events also need to take into account the output of the identity stitching step mentioned above. The raw events need to be associated with the anonymous and known identifiers, while features need to be computed over all events across all IDs belonging to a user.

Accomplishing this requires many complex, repetitive SQL joins and unions across multiple events and identities.

Another challenge teams face with feature semantics is the amount of time it takes to compute attributes from the ground up. This is a challenge even when the schemas of the data sources and the metrics themselves are largely standardized. This dynamic is pervasive for metrics that are important for ML use cases. For example, the semantics for a metric like ‘total user revenue in the last 30 days’ don’t vary significantly from company to company.

For business models like eCommerce, 90% of key metrics could work off of a standard set of features, drastically accelerating time to market for companies building customer 360.

Metadata

Once you compute semantic features, you must also keep track of important metadata related to those features. At a high level, your metadata should include:

  • A description of the metric (what it means)
  • The time of last update
  • Provenance – who defined and built the metric
  • Any access/ownership requirements

With your metrics definitions and metadata in a centralized location, data producers (data engineers, analytics engineers) and data consumers (analysts, product managers, data scientists, and marketers) can collaborate more easily. The centralized metadata gives consumers access, clarity, and confidence without sacrificing visibility and control for producers.

It’s also important to track historical versions of your metrics, especially for ML algorithms. For example, a churn algorithm may model off of features like revenue and website activity in a 7-day period prior to the churn date. If the definition of revenue changes, the historical version of the metric must be marked as deprecated and the new metric recomputed. Because recomputing the entire history of a metric can be very costly, it’s best to recompute on demand.

Needless to say, building out the models and pipelines required to manage event semantics and metadata based on an identity graph (that you’re also managing) is a complex undertaking.

Current tooling for semantics and metadata

As you would expect, many tools exist to address these pain points. There are several metrics layer or semantic layer tools to help data teams more easily manage metric semantics like the ones discussed above. Data cataloging and observability tools also exist to address the metadata challenge from various angles at various points in the pipeline.

Many of these tools are great, but most are young, and their solutions to these problems have yet to mature fully. The most important facet to note when it comes to customer 360, though, is that these tools are for defining global metrics in a traditional batch ETL analytics workflow.

They make managing aspects of semantics and metadata easier than doing everything manually or building your own tooling. However, they still require significant configuration and management for projects like customer 360. This is because customer 360 profiles focus on deriving user-specific metrics from customer data.

Finally, the end-user experience for most of these tools requires deep expertise in a particular language, specifically SQL or Python. So, the number of people within an organization who can build and manage these components of the customer 360 is limited. This limitation forces a significant amount of translation between multiple teams, and it’s a big reason customer 360 projects can take so long.

Customer 360 challenge 3: Data Ops

Even if your team builds the identity stitching and user feature layers, maintaining the underlying pipelines, scheduling, and infrastructure requires full-time data engineering and data ops teams. Pipelines can require orchestration across tools like dbt, Airflow, ETL jobs, and more. This fragmentation necessitates a health monitoring layer to ensure continued operation through the lifecycle of the data flow and compute process.

The resources required to build and manage this infrastructure and orchestration is another big roadblock for companies on the quest to create a usable customer 360.

RudderStack Profiles makes customer 360 accessible

We built our customer 360 solution, RudderStack Profiles to solve these technical challenges for you. RudderStack profiles takes care of the technical challenges behind identity resolution and automates the creation of a customer 360 table for you, so you can focus on improving business outcomes and ship high-impact projects faster.

Create a Customer 360 with RudderStack

See how you can use the Warehouse Native CDP to overcome data collection, unification, and activation challenges.

October 18, 2022