Guidescross-device identificationuser matching

Deterministic and Probabilistic Matching: How Do They Work?

deterministic matchingprobabilistic matchingemail address matchingIP address matchingdevice IDcookie trackingcross-device attributioncross-device trackingpersonally identifiable information (PII)customer data platform (CDP)data management platform (DMP)FTCGDPR

Identifying and tracking visitors across devices is one of the more persistent challenges in digital advertising — whether you're an advertiser, marketer, or publisher. There's no foolproof method for recognizing the same online user as they move from one device to another, largely because the dominant tracking mechanism — cookies — was never designed for a multi-device world.

Cookies have served desktop and laptop tracking reasonably well for years, but they can't be transferred between devices. The practical result: if a visitor hits your website on their desktop one day and their smartphone the next, most systems will record them as two distinct visitors rather than one. That's a significant blind spot for attribution, targeting, and audience analysis.

Two techniques have emerged to address this: deterministic matching and probabilistic matching. Neither is perfect, but together they represent the current state of the art for cross-device identity resolution.

What is Deterministic Matching?

Deterministic matching identifies the same user across different devices by linking user profiles that share a common, verifiable identifier.

Every device a person uses tends to accumulate its own user profile — a collection of data points tied to how that individual uses that particular device. Deterministic matching searches across these separate profiles and connects those belonging to the same physical person using a reliable shared data point.

Common identifiers used in deterministic matching include:

  • First and last name (if sufficiently uncommon)
  • Address
  • Email address
  • Date of birth
  • Phone number

How Deterministic Matching Works

In online advertising and marketing, email address is by far the most widely used deterministic identifier. Because an email address is unique to an individual, it can be used to link that person's profiles across a wide range of datasets and platforms.

Applications like Facebook, Google Apps, and Twitter are well-positioned for deterministic matching because they require users to sign in with an email address across every device they use.

The strength of deterministic matching is accuracy — it typically achieves around 80–90% accuracy. The weakness is scale. Not every application or website requires a login, and a significant portion of web browsing happens without any authenticated session. That limits how broadly deterministic matching can be applied.

To address the scale problem, publishers increasingly use two broad strategies to capture deterministic data (most commonly, an email address) from their audiences:

Encouragement. Publishers offer additional features, content, or functionality in exchange for a user providing their email address.

Restriction. Publishers gate access to content or features unless the user logs in or registers.

These approaches work reasonably well for large publishers — major news sites, for instance — but they're harder to execute for small and mid-sized publishers like niche blogs, where readers are unlikely to create an account just to read a handful of posts.

What is Probabilistic Matching?

Probabilistic matching takes a different approach: rather than relying on a confirmed shared identifier, it uses algorithms and behavioral signals to infer that two or more device profiles likely belong to the same person.

Consider a simple scenario: a person owns a smartphone, a laptop, and a tablet, and uses all three at home. All three devices share the same IP address and location, and the browsing patterns across them tend to mirror each other. That combination of signals makes it highly probable that a single user is behind all three devices.

The challenge gets more complex in shared-household scenarios. If two people in the same household each have their own smartphone, tablet, and desktop, every one of those devices will share the same IP address, Wi-Fi ID, and location. To distinguish between individuals in that situation, probabilistic algorithms need to incorporate additional signals — age, gender, browsing interests — and look for patterns that are consistent across a subset of devices but not all of them.

How Probabilistic Matching Works

Probabilistic matching algorithms are typically trained on a mixed dataset that includes both deterministic and probabilistic signals — usually on the order of a few hundred thousand records. The algorithm learns to recognize which combinations of probabilistic signals reliably correlate with confirmed identity matches. Once trained, the algorithm is applied to much larger datasets — potentially in the millions — where no deterministic identifier is present.

Compared to deterministic matching, probabilistic matching trades accuracy for scale. You don't need email addresses or other explicit personal data submissions; the system works from environmental and behavioural signals alone. That makes it deployable across a far larger share of the internet.

The drawbacks are real, though:

  • Lower accuracy than deterministic approaches
  • Lack of transparency — probabilistic algorithms are typically proprietary, meaning the matching logic is opaque to anyone using or being affected by it
  • Privacy exposure — regulatory bodies including the FTC in the United States and the European Union's Article 29 Data Protection Working Party have classified data points such as IP addresses and device IDs as personally identifiable information (PII). Companies relying on probabilistic matching may face requirements to either stop collecting these signals or obtain explicit consumer consent.

Applications: What Are These Techniques Used For?

Cross-Device Attribution

Attribution has always been difficult in digital advertising, and cross-device behaviour makes it harder. When a consumer interacts with a brand across multiple devices before converting, there's no straightforward way to connect those touchpoints into a coherent journey — at least not without identity resolution.

Deterministic and probabilistic matching address this directly. By identifying the same user across their devices, advertisers can construct a more accurate picture of the customer journey, attribute conversions correctly, and make better-informed decisions about budget allocation.

Cross-Device Tracking

Cross-device tracking — sometimes called cross-device targeting — is closely related to cross-device attribution, but the objective is different. Rather than attributing a conversion after the fact, cross-device tracking is about recognizing a user across devices in order to serve them relevant advertising across those platforms.

In practice, this means an advertiser can observe a user browsing a product on their laptop and then serve a retargeted ad to that same user on their smartphone. For an e-commerce retailer, that kind of continuity across devices can meaningfully improve conversion rates.

Closing Thoughts

Neither deterministic nor probabilistic matching offers 100% certainty — both involve trade-offs between accuracy, scale, and privacy. As multi-device consumer behaviour continues to grow, these techniques are becoming increasingly central to how AdTech and MarTech platforms approach audience identification. Improving the accuracy and ethical grounding of both methods remains an active area of development across the industry.