Identity resolution: What it is and how to use it for customer personalization

Identity stitching is the process of matching different customer identifiers across multiple devices and digital touchpoints to build a cohesive customer profile. As customers interact with products and services across channels, they generate data under many identifiers: an anonymous ID during an anonymous website visit, a device ID from a mobile app, an email address at signup, and an account ID at login. Without stitching, these appear as separate individuals in downstream systems, inflating user counts, fragmenting behavioral history, and undermining personalization and attribution. Identity stitching connects those fragments by producing a single canonical identifier per individual, which becomes the key for computing features, building customer profiles, and activating data downstream.

This article covers what identity stitching is, how the identity graph works, the difference between deterministic and probabilistic matching, what data stitching requires, the risk of identity explosion, and how RudderStack Profiles implements identity stitching in the warehouse.

Key concepts

Identity stitching is the process of matching different customer identifiers across devices and touchpoints and mapping them to a single canonical identifier, enabling a complete and accurate view of each individual across all data sources.
Identity graphs are the data structures that power identity stitching, representing customers as networks of nodes (individual identifiers such as email addresses, device IDs, and anonymous IDs) and edges (relationships between those identifiers observed in event data).
Deterministic matching resolves identity by linking identifiers that are directly confirmed to belong to the same individual, such as an anonymous device ID connected to a verified email address at login, producing high-confidence matches without inference.
Probabilistic matching infers identity connections using non-deterministic signals such as device co-location, IP address overlap, and behavioral patterns, generating likely matches where direct identifier links are unavailable.
Customer 360 is the unified customer profile produced by aggregating features and behavioral history across all data sources using the canonical identifier generated by identity stitching, typically output as a feature view table in the data warehouse.
Identity explosion is a failure mode in identity stitching where a highly connected identifier, such as a shared device or generic email address, incorrectly merges many distinct customer profiles into a single cluster, undermining the accuracy of the identity graph.
Entity resolution is the broader discipline of which identity stitching is a subset, linking data records to any real-world entity such as households, accounts, or products, not exclusively to individual people.

What is identity stitching?

Identity stitching is the process of matching different identifiers across multiple devices and digital touchpoints to build a cohesive and complete customer profile. Customers generate data under many identifiers as they move through digital experiences: an anonymous ID when they first arrive on a website, a device ID from a mobile app, an email address when they register, and an account ID when they log in from a new device. Absent any stitching, each of these identifiers is recorded as a separate individual in analytics tools, CRM systems, and marketing platforms.

The output of identity stitching is a canonical identifier: A single stable ID that links all of a customer's known identifiers together. That canonical identifier becomes the key used to pull together behavioral history across tables, compute features such as lifetime value or churn risk, and build a complete customer profile. This unified profile is commonly called a Customer 360, and it serves as the single source of truth for customer-facing workflows across product, marketing, and support.

Identity resolution is a closely related term. Identity stitching refers specifically to the process of connecting identifiers. Identity resolution refers to the outcome: a resolved, accurate view of a customer across all data sources. In practice, the terms are often used interchangeably, but the distinction is useful when evaluating what a tool or data pipeline actually does at each stage.

How does identity stitching work?

Identity stitching works by building and maintaining an identity graph. An identity graph has two components: nodes and edges. Nodes represent individual identifiers that belong to a customer, such as an email address, a device ID, an anonymous ID, a phone number, or an account user ID. Edges represent the relationships between those identifiers observed in event data. When two identifiers appear together in the same event, for example an anonymous ID and an email address both present in a login event, an edge is created between them, recording that they likely belong to the same person.

Once the graph is built, identity stitching performs a graph traversal, identifying all nodes connected by edges, directly or indirectly, and grouping them into a single cluster. That cluster is assigned a canonical identifier representing one individual. As new events arrive and new identifier relationships are observed, the graph is updated and clusters are re-evaluated. The result is a continuously maintained structure that maps which identifiers belong to each individual.

How stitching handles known and unknown users

Before a customer identifies themselves, their activity is captured under an anonymous identifier, typically an auto-generated ID assigned by an SDK or tracking system. This allows behavioral data to be collected at an individual level even without any personally identifiable information. When the same user later logs in, makes a purchase, or fills out a form, a known identifier such as an email address or account ID is attached to the same session.

Identity stitching links the anonymous identifier to the known identifier by recording both in the same event and creating an edge between them in the graph. All prior anonymous behavior can then be attributed to the known customer. If the same user logs in from a second device, the new anonymous identifier generated on that device is also linked to the same account, extending the cluster further. The canonical identifier remains stable, anchoring all data about that individual regardless of which device or session generated it.

How deterministic and probabilistic matching apply to identity stitching

Identity stitching approaches differ primarily in how they establish the connections between identifiers. The two broad categories are deterministic matching and probabilistic matching. Neither approach is universally better; the right choice depends on the data available, the scale of the use case, and the consequences of incorrect matches.

Deterministic matching

Deterministic matching links identifiers that are directly confirmed to belong to the same individual. A confirmed link exists when two identifiers appear together in the same event or transaction: an anonymous ID and an email address in a login call, a device ID and an account ID in a purchase event, or a session identifier and a known user ID provided explicitly by the application. These links are established by observed data, not inference.

Deterministic matching produces high-confidence results and is appropriate when data security, compliance precision, or attribution accuracy is a priority. Because matches are based on confirmed co-occurrence rather than statistical likelihood, deterministic approaches are less likely to merge profiles incorrectly. The primary limitation is coverage: deterministic matching handles known users and authenticated sessions well, but produces sparse results for anonymous users who never identify themselves.

Probabilistic matching

Probabilistic matching infers identity connections using non-deterministic signals, estimating that two identifiers likely belong to the same person based on patterns in the data. Common signals include device co-location, shared IP addresses, similar behavioral patterns across sessions, and digital fingerprinting attributes. Where deterministic matching draws a confirmed line between two identifiers, probabilistic matching assigns a likelihood score that two identifiers represent the same individual and connects them if that score exceeds a threshold.

Probabilistic approaches are useful at scale and for anonymous data, where direct identifier links are rarely available. They are also commonly used in conjunction with third-party identity resolution services, which supplement first-party data with additional signals. The tradeoff is error rate: probabilistic matching will generate some incorrect merges, which can create compliance risk if profiles are incorrectly combined, and can corrupt downstream models and audiences if the error rate is significant. For use cases requiring compliance precision, such as honoring data subject rights or applying consent rules per individual, the error profile of probabilistic matching must be carefully evaluated before use.

Most organizations benefit from starting with deterministic matching to cover authenticated sessions and then evaluating probabilistic approaches only where anonymous data volume and use case requirements justify the additional complexity and risk.

What data identity stitching requires

Identity stitching requires two types of inputs: Identifiers, which become nodes in the graph, and event data, which provides the co-occurrence signals that become edges.

Identifiers are the anchors of the graph. The most reliable are first-party identifiers collected directly from the customer: email addresses and phone numbers provided at registration, account IDs assigned at login, and any internally generated user IDs. These are the inputs that deterministic matching depends on. Anonymous identifiers such as device IDs and session-level anonymous IDs are also nodes in the graph, and they become connected to known identifiers when they co-occur in events. All identifiers used for stitching must be consistent across data sources: they should be formatted as strings, not as mixed types or numeric values.

Event tables, such as page view tables, track event tables, and identify call tables, serve as edge sources. These tables provide the records that show which identifiers appeared together at a given moment. The identity graph draws its edges from these records; without them, there is no signal to connect identifiers across sessions and devices. The selection of which event tables to include as edge sources matters: including tables with irrelevant or overly generic identifier columns can introduce noise into the graph and increase the risk of incorrect merges.

The risk of identity explosion

Identity explosion occurs when a highly connected identifier, such as a shared email address used by multiple household members, a generic company email alias, or a frequently shared device, creates an edge between many distinct individuals, causing their clusters to merge incorrectly. The result is a single canonical identifier that represents what should be many different customers, corrupting downstream profiles, audiences, and features.

Avoiding identity explosion requires careful selection of which identifier types to include in the graph. Identifiers that are not reliably unique to a single individual, such as household email addresses or shared device IDs in a corporate environment, should be excluded or handled with additional rules that limit how many connections a single node can form. Cardinality rules, which set maximum connection limits between identifier types, are one documented approach to controlling this risk.

Benefits of identity stitching for business

The most immediate benefit of identity stitching is data quality. Consolidating fragmented identifiers into accurate per-individual profiles eliminates duplicate records and the downstream errors they cause: duplicate marketing sends, inflated user counts, miscounted conversions, and incorrect attribution. With reliable identity data, decisions made by product, marketing, and analytics teams are grounded in accurate counts and histories rather than fragmented approximations.

Identity stitching also strengthens compliance. Data subject rights under privacy regulations require organizations to locate, correct, and delete all data associated with a specific individual. If customer data is held under many unresolved identifiers across multiple systems, honoring those requests accurately is difficult. A resolved identity, with a canonical identifier linking all records back to a single individual, makes it possible to identify and act on all data associated with that person consistently.

For customer experience, identity stitching enables omnichannel consistency. When a customer contacts support after making a purchase on a mobile app, the representative's tools should reflect the full history of that customer, not just activity from a single channel. When a personalization system generates a recommendation, it should draw on all behavioral signals attributed to that individual, not a subset limited by device or session. Resolved identity makes these experiences possible by providing a single anchor for all customer data.

Identity stitching also improves attribution. Marketing attribution depends on connecting campaign touchpoints to conversion events that may occur on different devices or in different sessions. Deterministic stitching enables cross-device attribution by linking the device that saw an ad to the device that completed a purchase, when both are connected through a known identifier such as a login event.

Identity stitching use cases

Personalization and customer engagement

Personalization systems require complete, accurate behavioral histories to generate relevant recommendations, messaging, and experiences. Identity stitching provides the unified customer record that makes this possible. When email addresses, device IDs, and account IDs are resolved to a single canonical identifier, all behavioral signals across channels can be aggregated into a single customer profile. That profile can then be used to drive personalized product recommendations, triggered lifecycle campaigns, and contextually relevant in-app experiences that reflect the customer's actual history, not a fragmented subset of it.

Fraud detection

In financial services, insurance, and other sectors where fraud detection is critical, identity stitching provides the comprehensive data profile needed to identify anomalous behavior accurately. Fraud detection depends on understanding what behavior is normal for a given individual. An action that appears suspicious in isolation may be routine in context, and context requires a complete view of the individual's history across accounts and devices. Identity stitching, by linking identifiers to a single canonical profile, enables fraud detection systems to evaluate behavior against a complete history rather than a session-level snapshot.

Marketing attribution

Attribution requires connecting the marketing touchpoints a customer encountered to the conversion event they eventually completed. These touchpoints and conversions often occur across different devices and sessions. Deterministic identity stitching enables cross-device attribution by linking the device or session associated with an ad exposure to the account that later converted, when both are connected through a known identifier such as a login event or email address. The result is attribution that reflects the actual customer journey rather than one limited to single-device or single-session windows.

Data governance and compliance

Identity stitching supports data governance by creating a consistent, resolvable record of who each customer is across all systems. This matters most for data subject rights workflows under privacy regulations, where an organization must be able to locate and act on all data associated with a specific individual. A resolved canonical identifier makes that lookup reliable and complete. Importantly, the matching approach used during stitching affects compliance risk: probabilistic matching that incorrectly merges two individuals' profiles could result in responding to a data subject request with the wrong person's data, which organizations with significant compliance obligations should account for when selecting their identity strategy.

Identity stitching vs entity resolution: What's the difference?

Entity resolution is the broader discipline of linking data records to real-world entities. In this context, an entity is any unit an organization tracks: an individual person, a household, an account, a company, a subscription, or a product. Entity resolution connects data records that refer to the same entity, even when those records use different identifiers or come from different systems.

Identity stitching is a specific type of entity resolution focused on individuals. It connects the identifiers associated with a single person across systems and touchpoints. All identity stitching is a form of entity resolution, but entity resolution is not limited to people. An organization might apply entity resolution to match company records across a CRM and a billing system, or to match product records across a catalog and a warehouse inventory table.

The practical distinction matters when evaluating tools and approaches. A system designed for identity stitching optimizes for the specific characteristics of customer identity data: high identifier cardinality, continuous event streams, anonymous-to-known transitions, and per-device tracking. A general entity resolution framework may not make those optimizations by default.

Where RudderStack fits

RudderStack is the agentic CDP for the AI era. For teams implementing identity stitching, RudderStack Profiles is the warehouse-native framework for defining identity rules and computing customer features directly in the data warehouse, without requiring hand-written SQL.

Profiles implements identity stitching using the id_stitcher model, configured in YAML as part of a Profiles project. Teams define the identifier types relevant to their entity, such as user_id, anonymous_id, and email, and specify which event tables serve as edge sources, such as identify calls, page views, and track events. Profiles then performs connected component analysis over the resulting identifier graph, grouping all connected identifiers into a single cluster and assigning each cluster a canonical identifier called the rudderId.

Because identity stitching is configured declaratively in YAML rather than custom SQL, changes to identity rules are reviewable as code: Pull request diffs show what changed in the identity configuration, and the system resolves dependencies across entity definitions automatically. RudderStack Profiles also supports cardinality rules, which set maximum connection limits between identifier types to prevent identity explosion in cases where a shared or generic identifier would otherwise incorrectly merge many distinct profiles.

The identity graph produced by Profiles serves as the foundation for the Customer 360 feature view, a warehouse table where each row represents a unique individual and each column is a computed feature or attribute. This feature view can be queried directly in the warehouse or activated downstream through Reverse ETL to populate CRM systems, marketing platforms, and personalization tools.

For auditability, the Profiles ID Stitcher Audit tool, embedded in the Profile Builder CLI, allows teams to analyze the health of their identity graph, including the count of distinct IDs before and after stitching, cluster size distributions, and drilled-down views of how specific identifiers are connected. This supports ongoing governance of the identity graph as data volumes and identifier types evolve.

Summary

Identity stitching is the process of connecting customer identifiers across devices, sessions, and systems into a single canonical identifier that enables accurate, complete customer profiles. The identity graph, built from identifier nodes and co-occurrence edges observed in event data, is the structure that makes this possible. Deterministic matching, which relies on confirmed identifier links, is appropriate for high-precision use cases including compliance and attribution. Probabilistic approaches extend coverage to anonymous data but introduce error risk that organizations must account for, particularly in compliance-sensitive contexts.

RudderStack Profiles implements deterministic identity stitching through its id_stitcher model, configured in YAML, running connected component analysis in the data warehouse, and producing a rudderId canonical identifier as the foundation for Customer 360 profiles.

RudderStack Profiles implements identity stitching directly in your warehouse. See how it works in the Profiles documentation or explore the Profiles product page.

FAQs

Identity stitching is the process of matching different customer identifiers across multiple devices and digital touchpoints to build a cohesive customer profile. When a user visits a website anonymously and later logs into an account, identity stitching links the anonymous identifier to the known account, merging what appeared to be separate interactions into one profile anchored by a single canonical identifier. Identity resolution is a closely related term: identity stitching is the mechanism, and identity resolution is the outcome it produces.
Identity stitching is the specific process of linking identifiers together into a connected graph. Identity resolution is the outcome: a unified, accurate customer profile produced by stitching identifiers across devices, channels, and data sources. Identity stitching is the mechanism; identity resolution is the result. In practice, the two terms are often used interchangeably, but the distinction is useful when evaluating what a tool or pipeline actually does at each stage.
Identity stitching builds and maintains an identity graph consisting of nodes and edges. Nodes represent individual identifiers such as email addresses, device IDs, anonymous IDs, and account IDs. Edges represent the relationships between those identifiers observed in event data, created when two identifiers appear together in the same event. Connected component analysis is then applied to the graph: all nodes connected by edges, directly or indirectly, are grouped into a cluster representing a single individual and assigned a canonical identifier. As new events arrive and new identifier relationships are observed, the graph is updated and clusters are re-evaluated.
Deterministic identity stitching links identifiers that are confirmed to belong to the same individual, such as an anonymous device ID connected to a verified email address at login. It produces high-confidence matches and is the appropriate approach when data security, compliance precision, or attribution accuracy is a priority. Probabilistic identity stitching infers connections using non-deterministic signals such as device co-location, IP address overlap, and behavioral patterns, estimating that two identifiers likely belong to the same person. Probabilistic approaches are useful at scale and with anonymous data, but carry a higher risk of incorrect merges. Organizations with compliance obligations should carefully evaluate that risk before using probabilistic matching, since an incorrect profile merge can result in responding to data subject requests with the wrong person's data. Most use cases benefit from starting with deterministic matching and supplementing with probabilistic approaches only where data volume and use case requirements justify the added complexity.
Identity stitching requires two types of inputs: identifiers and event data. Identifiers are the nodes of the identity graph: email addresses, phone numbers, account IDs, device IDs, and anonymous IDs. The most reliable are first-party identifiers collected directly from the customer, such as email addresses provided at registration. All identifiers must be consistent across data sources and represented as strings. Event tables such as page view tables, track event tables, and identify call tables serve as edge sources: they provide the records showing which identifiers appeared together at a given moment and are used to build the edges of the graph. Selecting which event tables to include matters, because including tables with overly generic or shared identifiers can introduce noise and increase the risk of identity explosion.
What is identity explosion and how is it prevented?
Identity explosion occurs when a highly connected identifier, such as a shared email address, a generic company alias, or a frequently shared device, creates edges between many distinct individuals and causes their clusters to merge incorrectly into a single canonical identifier. The result is a profile that incorrectly represents many different customers as one individual, corrupting downstream features, audiences, and compliance workflows. It is prevented by carefully selecting which identifier types to include in the graph, excluding identifiers that are not reliably unique to a single person, and applying cardinality rules that limit the maximum number of connections a single node can form to identifiers of a given type.
What is the difference between identity stitching and entity resolution?
Entity resolution is the broader discipline of linking data records to real-world entities, which can include households, accounts, products, companies, or individuals. Identity stitching is a specific type of entity resolution focused on people: connecting the identifiers associated with a single individual across systems and touchpoints. All identity stitching is a form of entity resolution, but entity resolution is not limited to identity use cases.
What are the benefits of identity stitching for business?
Identity stitching improves data quality by eliminating duplicate records and consolidating fragmented interactions into accurate customer profiles. It enables personalization by giving marketing, product, and support teams a reliable view of each customer's behavior across channels. It supports compliance by creating a consistent profile against which data subject requests can be honored. It enables attribution by connecting campaign touchpoints to conversions across devices and sessions. And it reduces the risk of sending duplicate or inconsistent communications to the same customer across tools and channels, which is a common consequence of unresolved identity data.
How does identity stitching support compliance and data governance?
Identity stitching consolidates what would otherwise be fragmented, duplicate records into a single consistent profile per individual. This consolidation matters for compliance because data subject rights, including the right to access, correct, or delete personal data, must be honored against a complete view of that individual's data, not just a subset of it. Without identity stitching, an organization may honor a deletion request against one record while leaving others untouched across different systems. A unified canonical identifier makes it possible to identify all data associated with an individual and act on it consistently. The matching approach used during stitching affects compliance risk: probabilistic matching that incorrectly merges two individuals' profiles could result in responding to a data subject request with the wrong person's data. Organizations with significant compliance obligations should prioritize deterministic matching where precision is required.
Identity explosion occurs when a highly connected identifier, such as a shared email address, a generic company alias, or a frequently shared device, creates edges between many distinct individuals and causes their clusters to merge incorrectly into a single canonical identifier. The result is a profile that incorrectly represents many different customers as one individual, corrupting downstream features, audiences, and compliance workflows. It is prevented by carefully selecting which identifier types to include in the graph, excluding identifiers that are not reliably unique to a single person, and applying cardinality rules that limit the maximum number of connections a single node can form to identifiers of a given type.
Entity resolution is the broader discipline of linking data records to real-world entities, which can include households, accounts, products, companies, or individuals. Identity stitching is a specific type of entity resolution focused on people: connecting the identifiers associated with a single individual across systems and touchpoints. All identity stitching is a form of entity resolution, but entity resolution is not limited to identity use cases.
What are the benefits of identity stitching for business?
Identity stitching improves data quality by eliminating duplicate records and consolidating fragmented interactions into accurate customer profiles. It enables personalization by giving marketing, product, and support teams a reliable view of each customer's behavior across channels. It supports compliance by creating a consistent profile against which data subject requests can be honored. It enables attribution by connecting campaign touchpoints to conversions across devices and sessions. And it reduces the risk of sending duplicate or inconsistent communications to the same customer across tools and channels, which is a common consequence of unresolved identity data.
How does identity stitching support compliance and data governance?
Identity stitching consolidates what would otherwise be fragmented, duplicate records into a single consistent profile per individual. This consolidation matters for compliance because data subject rights, including the right to access, correct, or delete personal data, must be honored against a complete view of that individual's data, not just a subset of it. Without identity stitching, an organization may honor a deletion request against one record while leaving others untouched across different systems. A unified canonical identifier makes it possible to identify all data associated with an individual and act on it consistently. The matching approach used during stitching affects compliance risk: probabilistic matching that incorrectly merges two individuals' profiles could result in responding to a data subject request with the wrong person's data. Organizations with significant compliance obligations should prioritize deterministic matching where precision is required.
Identity stitching improves data quality by eliminating duplicate records and consolidating fragmented interactions into accurate customer profiles. It enables personalization by giving marketing, product, and support teams a reliable view of each customer's behavior across channels. It supports compliance by creating a consistent profile against which data subject requests can be honored. It enables attribution by connecting campaign touchpoints to conversions across devices and sessions. And it reduces the risk of sending duplicate or inconsistent communications to the same customer across tools and channels, which is a common consequence of unresolved identity data.
Identity stitching consolidates what would otherwise be fragmented, duplicate records into a single consistent profile per individual. This consolidation matters for compliance because data subject rights, including the right to access, correct, or delete personal data, must be honored against a complete view of that individual's data, not just a subset of it. Without identity stitching, an organization may honor a deletion request against one record while leaving others untouched across different systems. A unified canonical identifier makes it possible to identify all data associated with an individual and act on it consistently. The matching approach used during stitching affects compliance risk: probabilistic matching that incorrectly merges two individuals' profiles could result in responding to a data subject request with the wrong person's data. Organizations with significant compliance obligations should prioritize deterministic matching where precision is required.

Can't find what you're looking for? Give us a shout!

Identity resolution: What it is and how to use it for customer personalization

Key concepts

What is identity stitching?

How does identity stitching work?

How stitching handles known and unknown users

How deterministic and probabilistic matching apply to identity stitching

Deterministic matching

Probabilistic matching

What data identity stitching requires

The risk of identity explosion

Benefits of identity stitching for business

Identity stitching use cases

Personalization and customer engagement

Fraud detection

Marketing attribution

Data governance and compliance

Identity stitching vs entity resolution: What's the difference?

Where RudderStack fits

Summary

FAQs

What is identity stitching?

What is the difference between identity stitching and identity resolution?

How does identity stitching work?

What is the difference between deterministic and probabilistic identity stitching?

What data does identity stitching require?

What is identity explosion and how is it prevented?

What is the difference between identity stitching and entity resolution?

What are the benefits of identity stitching for business?

How does identity stitching support compliance and data governance?

Company

Company

Products

Products

Read our documentation

Resources

Resources

Join the conversation