Guidesdata management platformsprofile merging

How Profile Merging and Audience Building Work in a DMP

DMPprofile identifierscookie_idemail matchingaudience segmentsmaster IDtimestamp sortingconcurrent processingPII hashingevent routingprofile deduplication

Almost every data management platform (DMP) on the market allows advertisers to create audiences and use them for a range of purposes — improved online ad targeting, advanced analytics, and more.

To create those audiences, the platform must first build user profiles composed of numerous profile identifiers. Understanding how that process works — and specifically how profile merging fits into it — requires starting with the fundamentals: what audience building is, what profiles are, and what profile identifiers represent.

How Audience Building Works in a DMP

Audience building is one of the core data processes in a DMP.

Once advertisers create an audience in a DMP, they can export it to other systems — such as a demand-side platform (DSP) — for improved ad targeting.

An audience is simply a group of user profiles that share a common user identifier.

For example, an advertiser might create an audience called "Visitors from the USA." That audience would contain all profiles carrying the attribute country = USA.

A diagram explaining how profile merging works in a data management platform (DMP)

Here's what the diagram above illustrates:

  • A new event occurs — in this case, a website visit.
  • The event carries several profile identifiers: cookie_id, country, and click_id.
  • Those identifiers are matched to an existing profile. Any new identifier — here, the click_id — is added to that profile.
  • The profile is added to any existing audience for which it meets the conditions. In this case, it would be added to the Visitors from the USA audience because of the country = USA attribute.

Note: Most DMPs hash personally identifiable information (PII) such as email addresses rather than storing it in plaintext. The examples below use unhashed email addresses for clarity.

Audiences are built on a set of processing assumptions, with the process beginning from an input event (e.g., a web visit) that may contain different user identifiers. Every event generally needs at least one profile identifier before a profile — and subsequently an audience — can be created.

What Are Profiles and Profile Identifiers?

A profile is the set of data collected from events tracked by a DMP. It represents a single user and may contain the following:

  • profile id
  • cookie id (list)
  • hashed email (list)
  • sid / uuid (list)
  • country (last seen)
  • name (nullable)
  • device_type (last seen)
  • device_vendor (last seen)
  • device_os (last seen)
  • browser_vendor (last seen)
  • gender (nullable)
  • company (nullable)
  • company size (nullable)
  • matching ids (list)

This list can be extended depending on the specific use case of a given DMP. Some fields are left empty until relevant data arrives.

The general rule is straightforward: if an input event contains an unknown identifier — one not yet present in the DMP — a new profile is created. If the event contains an identifier the DMP already recognizes, the existing profile is updated with the incoming event data.

After updating profiles with event data, two profiles may end up sharing a common identifier. When that happens, the DMP needs to perform what is known as profile merging.

What Is Profile Merging?

The profile-merging operation ensures there are no duplicate identifiers or attributes within a given profile, and that no two profiles carry the same unique identifiers (such as email addresses). It accomplishes this by consolidating all profiles that share a common identifier into a single profile.

Because events can carry multiple identifiers, they can arrive from the same user but with different identifiers across separate events.

Consider the following three events:

Event 1: A user visits publisher.com using Firefox: {cookie_id = 7M-Q1P8-6AWG-1N3I}

Event 2: The same user subscribes to a newsletter on publisher.com using Chrome: {email = [email protected], cookie_id = eyJraWQiOiJzZXN}

Event 3: The user fills in a form on publisher.com using Firefox: {email = [email protected], cookie_id = 7M-Q1P8-6AWG-1N3I}

All three events come from the same person — but before Event 3 arrives, the system has no way of knowing that. Until then, Events 1 and 2 are treated as two entirely separate profiles.

Once Event 3 reveals the common identifiers, it becomes clear these should all belong to one profile. Leaving them as separate profiles would result in duplicate records and stale, incomplete data for a single user.

At a minimum, profile merging means joining IDs and profile attributes together. Given the large volume of IDs and attributes that events can carry, only a subset of collected data may ultimately be used for audience creation.

When multiple user identifiers are found across profiles being merged, the system must also determine which identifier becomes the master ID — the single ID used to represent the merged profile and to assign future event data.

One practical approach is to construct a list of all known IDs, sort it, and use the first element as the master ID. This is the simplest method, but the right approach varies by DMP use case and business requirements.

After profile merging, DMP taxonomies, segments, and audiences must all be regenerated to reflect the consolidated profile data. A merged profile may also qualify for segments or audiences that neither of its source profiles belonged to individually.

How to Merge Profiles Together

To carry out the profile-merging operation effectively, a merging strategy must be chosen. The challenge becomes clear when considering two profiles that both contain user-submitted information — for instance, a newsletter subscription form and a contact form filled in by the same person at different times.

An image with a newsletter subscription form and one contact form.

The merging operation has to decide which value is the correct one when conflicts arise. There are four main approaches.

1. Overwriting Existing IDs and Attributes

The simplest approach is to overwrite all existing IDs and attributes with the new, incoming ones.

This can be implemented either by designating a master ID that stays fixed regardless of new data, or by replacing the master ID each time a new ID is collected.

2. Alphabetical Sorting

Alphabetical sorting resolves conflicts by sorting attribute values alphabetically across the profiles being merged and using the first result.

Using the example above where conflicting name values are "Ben" and "Obi-Wan," alphabetical sorting would select "Ben" as the correct value.

3. Timestamp Sorting

Timestamp sorting selects the attribute value that carries the earliest or latest recorded timestamp, depending on configuration.

In most situations, timestamp sorting is the preferred method. Continuing the example, the event containing the name "Ben" was received first, so it would be selected over "Obi-Wan."

One important detail: timestamp sorting is determined by event time, not processing time. The time the event actually occurred takes precedence over the time it was processed by the system.

4. Wait-and-See (Multi-Value Retention)

A more complex approach retains all conflicting values until a definitive sorting method — such as timestamp sorting — can be applied. This defers the final determination until enough context is available to make a reliable decision about which value is correct.

Choosing the Right Merging Strategy

Selecting a profile-merging algorithm is rarely a purely technical decision. The right choice depends on the DMP's use case, the type of data being merged, and business requirements. Each option involves trade-offs that need to be evaluated in context.

Merge Order

Another dimension to consider is the order in which merges are performed. When only two profiles need to merge, the operation is straightforward. But there will be cases where three or more profiles must be merged in a single operation.

For example, if three profiles need to be merged, the first two are merged together, and then the third is merged with the result of that first operation. The sequence matters: depending on the algorithm and attribute values involved, a different merge order can produce a different final profile.

Timestamp-ordering events before merging is one way to establish a consistent sequence, but depending on the business use case, a periodic verification service may also be needed to ensure profile merges remain accurate over time.

Handling Concurrent Merging

DMP systems typically operate under very high processing demands — both in terms of speed and data volume. Concurrent profile merging addresses this by distributing processing across multiple parallel processes.

The complication is that when multiple events are being processed simultaneously, merges become significantly more difficult to co-ordinate.

The core problem is a race condition: if two processes each receive an event that should resolve to the same profile, but neither process yet knows about the other, both may independently create a new profile. This results in two profiles that should have been one — and at the scale a DMP operates, this scenario is practically inevitable without mitigation.

Routing Events to Dedicated Processors

The established approach to this problem is to route events to specific processors based on their identifier, using a master router that determines which processor handles which event. This way, all events belonging to the same profile are handled sequentially by one processor, eliminating the race condition.

When a new event arrives, the router checks its ID and dispatches it to the appropriate processor. A well-designed routing algorithm can also distribute load evenly across processors.

This reduces concurrent merging to a series of simpler, sequential merges — at the cost of a single point of sequential processing per profile. Even if two events from the same profile arrive simultaneously but carry different identifiers, they will both be routed to the same processor and handled one after the other.

For this to work correctly, each processor needs access to all profiles — not just the ones it has handled directly. If each processor maintains its own profile store (e.g., a separate database), profiles must be copied between processors as needed to ensure every process can evaluate merges correctly.

Key Takeaways

A few practical conclusions from examining how profile merging works in practice:

  • Merging is a business decision as much as a technical one. The right algorithm depends on the DMP's specific use case and what the data represents — there is no universally correct approach.
  • Profiles change over time. Different merge operations, combined with the fact that data arrives at different times (real-time from web events, batch from first-party data onboarding), means profile composition is not static.
  • Data quality depends on merge correctness. To avoid contaminating user profiles with inaccurate information, each collected and populated attribute needs a well-justified merging rule applied consistently.