Define maximum connection limits between different ID types to provide fine-grained control over identity stitching.
7 minute read
This guide explains how to use cardinality rules to limit the maximum number of connections between different ID types in your identity graph.
This feature is only available for Snowflake currently — support for other warehouses is coming soon.
Overview
ID Graph cardinality rules let you define maximum connection limits between different ID types in your identity graph, providing fine-grained control over how identities are stitched together.
Cardinality rules help prevent identity explosion and maintain data quality by limiting how many connections a node of one type can have to nodes of another type.
This is particularly useful for maintaining data quality when dealing with shared identifiers like email addresses, device IDs, or anonymous IDs that might be used across multiple users.
Cardinality rules work seamlessly with other Profiles features:
Input filters reduce data volume before graph processing. The cardinality enforcement step comes after this in the pipeline, so if expected nodes are missing from your graph, input filters can be the reason.
Manual exclusion/inclusion rules provide explicit control over specific connections and happen after the cardinality enforcement step in the pipeline. If you find identity clusters that violate cardinality rules despite having the rules configured, those extra edges might be coming from manual inclusion overrides.
Key features
Data quality: You can prevent identity graphs from becoming overly connected due to shared or recycled identifiers.
Performance: Maintain optimal graph size by controlling edge proliferation.
Compliance: Enforce business rules about identity relationships.
Transparency: Get full audit trail of which edges were filtered and why.
Incremental consistency: Historical violations persist across runs for reliable behavior.
Performance considerations
Consider the following when implementing cardinality rules:
Initial runs may experience increased runtime as the system processes and enforces the new rules across your entire dataset.
Incremental runs should see minimal performance impact once the rules are established.
Trade-off evaluation: The increased runtime in initial runs delivers more precise ID stitching and better data quality. Evaluate whether the improved accuracy justifies the additional processing time for your use case.
How cardinality rules work
Directional rules
Cardinality rules are directional — they apply from a source ID type to a target ID type. For example:
user_id → email: max_edges = 2 means each user_id can connect to at most 2 email.
Note that this does not limit how many user_id an email can connect to (unless you add a separate rule)
Rule enforcement
When a rule is violated:
All edges from the violating source are removed (not just the excess ones)
Nodes are preserved as isolated stubs to maintain graph integrity
Violations are audited with full details about what was removed and why
Historical context is maintained across incremental runs
Incremental behavior
The feature works seamlessly with incremental runs:
Historical violations persist: If a node violated a rule previously, new edges to that node type are automatically blocked.
New violations break all edges: If a new edge causes a violation, all current edges from that source are removed.
Rule removal: If you remove a rule from your configuration, the entire id graph is built from scratch, so historical violations for that rule are no longer enforced.
Configuration
You can add cardinality rules to your pb_project.yaml under the id_types section, as shown:
The following limits ensure optimal performance for incremental runs:
Maximum edges per rule: 10 (The maximum_edges value for each rule cannot exceed 10)
Maximum target types per source: 5 (The maximum_edges list for each source ID type cannot have more than 5 entries)
The above limits only apply when using cardinality rules. Without these rules, the regular ID stitcher can connect unlimited edges, as usual.
Examples
This section provides examples of how to configure cardinality rules for different use cases:
id_types:- name:user_idmaximum_edges:- email:2# Users can have up to 2 emails- device_id:5# Users can have up to 5 devices- name:emailmaximum_edges:- user_id:1# Each email belongs to only 1 user- name:device_idmaximum_edges:- user_id:2# Shared devices (family tablets)
id_types:- name:anonymous_idmaximum_edges:- email:1# Anonymous sessions link to 1 email- name:emailmaximum_edges:- user_id:1# Emails don't get shared between users
How rule application works
This section explains how cardinality rules are applied to your identity graph.
Fresh and incremental runs
Cardinality rules behave differently depending on whether you’re running a fresh or incremental run:
Fresh runs: Rules are applied to all incoming data. If a node stays within its limits, all edges are preserved.
Incremental runs: Rules consider historical violations. If a node previously violated a rule or if new data causes a violation, all edges from that source are removed, including previously valid ones.
Fresh run
# Input datauser_1 → email_auser_1 → email_b # Valid:within limit of 2# Result: Both edges allowed
Incremental run
# Previous run: user_1 had 2 emails (at limit)# New data: user_1 → email_c # This causes violation# Result: ALL edges from user_1 are broken (including previous ones)
email@example.com connects to 3 user_ids, violating the email → user_id: 1 rule
All three connections are broken, not just the excess ones.
All nodes are preserved as isolated stubs.
Three audit entries are created documenting the violation.
Monitoring and debugging
This section explains how to monitor and debug cardinality rules.
Cardinality Audit table
Each violation is logged in the cardinality audit table, which resides in the same schema as the ID Stitcher output tables. The table name is derived by appending the suffix _CARDINALITY_AUDIT to the name of the ID Stitcher model and contains the following columns:
Column
Description
run_id
ID of the run when the violation occurred
model_hash
Hash of the model configuration
id1
Source ID value
id1_type
Source ID type
id2
Target ID value
id2_type
Target ID type
reason
Always "CARDINALITY_VIOLATION"
rule_details
JSON with max_edges and current_count
Best practices
This section provides best practices for configuring the cardinality rules correctly in your Profiles project.
1. Start conservatively
Begin with lower limits and gradually increase based on your data patterns:
# Start hereid_types:- name:emailmaximum_edges:- user_id:1# Adjust based on audit resultsid_types:- name:emailmaximum_edges:- user_id:2# If you see legitimate shared emails
2. Monitor audit tables
Make sure to review the Cardinality Audit table regularly to understand:
Which rules are triggering most often
Whether legitimate connections are being blocked
If limits need adjustment
3. Test with historical data
Before deploying rules to production:
Run rules against historical data in a test environment.
Analyze the audit results.
Adjust limits based on findings.
4. Consider bidirectional rules
Make sure to consider both directions of a relationship:
# Often you want both directionsid_types:- name:user_idmaximum_edges:- email:2# Users can have multiple emails- name:emailmaximum_edges:- user_id:1# But emails belong to one user
Troubleshooting
Issue
Solution
Legitimate connections being blocked
Review audit table and increase limits if needed.
Identity graph still too connected
Add more restrictive rules or lower existing limits.
Debugging
Check rule configuration in pb_project.yaml.
Query audit table to see what is blocked.
Review stub metadata in the final ID graph.
Compare fresh vs. incremental run results.
Migration guide
This section provides tips on managing cardinality rules in your Profiles models.
Enable rules on existing models
Add rules gradually: Start with one rule and observe effects
Test in development: Use a copy of production data to validate
Monitor after deployment: Watch audit tables and graph metrics
Adjust as needed: Fine-tune limits based on real-world results
Disable rules
To stop enforcing a rule, simply remove it from your configuration:
This site uses cookies to improve your experience while you navigate through the website. Out of
these
cookies, the cookies that are categorized as necessary are stored on your browser as they are as
essential
for the working of basic functionalities of the website. We also use third-party cookies that
help
us
analyze and understand how you use this website. These cookies will be stored in your browser
only
with
your
consent. You also have the option to opt-out of these cookies. But opting out of some of these
cookies
may
have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This
category only includes cookies that ensures basic functionalities and security
features of the website. These cookies do not store any personal information.
This site uses cookies to improve your experience. If you want to
learn more about cookies and why we use them, visit our cookie
policy. We'll assume you're ok with this, but you can opt-out if you wish Cookie Settings.