Identity Resolution Rulesets: Design Unified Profiles That Hold Up

Q: How long does a ruleset take to run?

See the full answer in the How long does a ruleset take to run? section of this article.

Q: Can I roll back a merge?

See the full answer in the Can I roll back a merge? section of this article.

Q: Does identity resolution honor privacy opt-outs?

See the full answer in the Does identity resolution honor privacy opt-outs? section of this article.

Q: Is there an API to resolve ad-hoc?

See the full answer in the Is there an API to resolve ad-hoc? section of this article.

The Core Problem

Your customer exists as many records. CRM has john@acme.com with name “John Smith.” E-commerce has js@acme.com with name “J. Smith.” The loyalty app has member #12345 with address “123 Main St” and phone “555-0100.”

Without resolution, each system sees a fragment. Your segments are fragmented. Your activations send duplicate messages. Your agent grounds on incomplete profiles.

Identity resolution in Data Cloud is how you merge these into one Unified Individual profile. The quality of the resolution determines the quality of everything downstream.

Anatomy of a Ruleset

A ruleset has:

Matching rules — conditions under which two source records count as the same person.
Reconciliation rules — when fields disagree, which value wins.
Match confidence — deterministic vs. probabilistic, scored.

Each ruleset runs on a schedule across the selected source DMOs and produces unified profiles.

Matching Rules

Three categories.

Deterministic Matches

Exact matches on strongly identifying fields.

Same email address (normalized — lowercase, trimmed).
Same phone number (normalized — digits only, country code).
Same external ID (CRM Contact Id, loyalty member number).
Same hashed email (if sources hash for privacy).

Deterministic matches are safe. They rarely produce false positives as long as the identifiers are actually identifying.

Probabilistic Matches

Fuzzy matches on combinations of fields.

Same first name + last name + postal code.
Same last name + phone area code + same zip.
Name edit distance below threshold + same address.

Probabilistic matches are where false positives happen. Two people with common names sharing a zip merge when they shouldn’t.

Transitive Matches

If rule R matches record A to record B, and rule R matches record B to record C, then A and C are the same person by transitivity.

Transitivity amplifies matches — both correct and incorrect. A single bad probabilistic match can cascade into a merged super-profile of unrelated people.

Designing a Safe Ruleset

Start With Deterministic Only

Phase 1 of every deployment should be deterministic rules only. Run it, inspect merged profiles, and verify the merges are correct.

Only add probabilistic rules after you understand the baseline.

Rank Rules by Confidence

Probabilistic rules should have explicit confidence scores. Rules scored 90+ are high-confidence (unique combinations). Rules scored 50–70 are medium. Below 50 — reconsider.

Confidence affects conflict resolution and what downstream can trust.

Use Blocking Fields

Data Cloud’s matching engine uses “blocking” to avoid comparing every record with every record. You block on a high-cardinality field (email domain, postal code) and only compare within the block.

Blocking is how ruleset evaluation stays tractable at scale. Without it, matching millions of individuals is infeasible.

Avoid Single-Field Probabilistic Rules

“Match if last name is the same” — a disaster. Millions of people share a last name.

Always combine fields: “Match if last name + postal code + street number” is much safer.

Reconciliation Rules

When two source records match, their field values must be reconciled into one unified profile.

Strategies:

Most recent wins. The source with the newest “Last Modified” timestamp wins.
Most reliable source wins. You rank sources: “CRM beats e-commerce beats loyalty.” CRM’s value survives.
Concatenate or list. For fields like email (which can have multiple values per person), don’t pick one — keep all.
Most complete wins. The source with a non-null, non-empty value wins over sources with blanks.

Different fields deserve different strategies. Phone number is probably “most recent.” Customer lifetime value might be “sum across sources.” Email should be a multi-value list.

Monitoring for False Merges

False positives (merging two people who aren’t the same) are the scariest outcome. A merged profile sends a customer someone else’s emails, shows them someone else’s history, or routes their case to the wrong account team.

Monitoring:

Merge ratio. Track the ratio of source records to unified profiles. A sudden drop means more merging is happening — investigate whether new rules are too aggressive.
Unusual unified profiles. Profiles with 10+ source records, mismatched addresses, or conflicting phone numbers are suspect.
User reports. Provide a way for agents or users to flag merges they think are wrong. Review weekly.

Monitoring for False Separates

The opposite problem: two sources that should have merged don’t. The customer is still fragmented.

Monitoring:

Source records with no matches. If 40% of CRM Contacts never match anything in e-commerce, your rules may be too strict — or the integration has quality issues.
Common identifiers checked. Spot-check 50 customers known to exist in multiple systems. Are all their records unified?

Handling Source Data Quality

Bad input data makes good rules ineffective. Common issues:

Email variations. john@acme.com and JOHN@ACME.COM should match. Normalize before comparing.
Phone formatting. (555) 0100 and 5550100 must be normalized.
Name variations. Nicknames (Jon/John), missing middle names, different orderings.
Address noise. Abbreviations (St./Street), unit numbers, typos.

Data Cloud provides some built-in normalization, but the most robust approach is to clean data upstream — in ETL — before it lands in DLOs.

Rule Change Management

Rulesets are metadata. They deploy, version, and rollback like any other metadata.

Discipline:

Test rule changes in sandbox with a representative dataset.
Keep a changelog — “Added last-name+zip probabilistic match; monitor for false positives.”
Don’t change rules and data volumes simultaneously. Change one thing, measure, then change another.

Operational Cadence

Weekly: review merge ratio and flagged profiles.

Monthly: sample false-positive and false-negative audits.

Quarterly: rule effectiveness review — which rules are contributing most, which are redundant.

Frequently Asked Questions

How long does a ruleset take to run?

Depends on data volume and rule count. Small orgs (tens of thousands of records): minutes. Large orgs (tens of millions): hours. Plan scheduling accordingly.

Can I roll back a merge?

Not cleanly. Unmerging a unified profile means re-running resolution without the offending rule and re-activating segments. Costly. Prevention beats cure.

Does identity resolution honor privacy opt-outs?

You configure this. Records marked as “do not sell” or “opted out of marketing” can be excluded from activation, but typically still participate in resolution for operational needs.

Is there an API to resolve ad-hoc?

Yes — the real-time identity API lets you pass a candidate record and get a unified profile Id in sub-second time. Use for web personalization, not for bulk operations.