Run an audience overlap analysis in AWS Clean Rooms

Advertisers, publishers, and advertising technology providers are actively seeking efficient ways to collaborate with their partners to generate insights about their collective datasets. One common reason to engage in data collaboration is to run an audience overlap analysis, which is a common analysis to run when media planning and evaluating new partnerships.

In this post, we explore what an audience overlap analysis is, discuss the current technical approaches and their challenges, and illustrate how you can run secure audience overlap analysis using AWS Clean Rooms.

Audience overlap analysis

Audience overlap is the percentage of users in your audience who are also present in another dataset (calculated as the number of users present in both your audience and another dataset divided by the total number of users in your audience). In the digital media planning process, audience overlaps are often conducted to compare an advertiser’s first-party dataset with a media partner’s (publisher) dataset. The analysis helps determine how much of the advertiser’s audience can be reached by a given media partner. By evaluating the overlap, advertisers can determine whether a media partner provides unique reach or if the media partner’s audience predominantly overlaps with the advertiser’s existing audience.

Current approaches and challenges

Advertisers, publishers, third-party data providers, and other entities often share their data when running audience overlaps or match tests. Common methods for sharing data, such as using pixels and SFTP transfers, can carry risk because they involve moving sensitive customer information. Sharing this data to another party can be time consuming and increase the risk of potential data breaches or unauthorized access. If the receiving party mishandles the data, it could violate privacy regulations, resulting in legal risks. Also, any perceived misuse or exposure of customer data can erode consumer trust, leading to reputational damage and potential loss of business.

Solution overview

AWS Clean Rooms can help you and your partners effortlessly and securely collaborate on and analyze your collective datasets—without copying each other’s underlying data. With AWS Clean Rooms, you can create a data clean room in minutes and collaborate with your partners to generate unique insights. AWS Clean Rooms allows you to run an audience overlap analysis and generate valuable insights while avoiding risks associated with other current approaches.

The following are key concepts and prerequisites to use AWS Clean Rooms:

  • Each party in the analysis (collaboration member) needs to have an AWS account.
  • One member invites the other member to the AWS Clean Rooms collaboration. It doesn’t matter which member creates the invitation. The collaboration creator uses the invitee’s AWS account ID as input to send invitations.
  • Only one member can query in the collaboration, and only one member can receive results from the collaboration. The abilities of each member are defined when the collaboration is created.
  • Each collaboration member stores datasets in their respective Amazon Simple Storage Service (Amazon S3) bucket and catalogs them (creates a schema with column names and data types) in the AWS Glue Data Catalog. You can also create the Data Catalog definition using the Amazon Athena create database and create table statements.
  • Collaborators need to have their S3 buckets and Data Catalog tables in the same AWS Region.
  • Collaborators can use the AWS Clean Rooms console, APIs, or AWS SDKs to set up a collaboration.
  • AWS Clean Rooms enables you to use any column as a join key, for example hashed MAIDs, emails, IP addresses, and RampIDs.
  • Each collaboration member associates their own data to the collaboration.

Let’s look at a scenario in which an advertiser collaborates with a publisher to identify the audience overlap. In this example, the publisher creates the collaboration, invites the advertiser, and designates the advertiser as the member who can query and receive results.


To invite another person to a collaboration, you need their AWS account ID. In our use case, the publisher needs the AWS account ID of the advertiser.

Create a collaboration

In our use case, the publisher creates a collaboration using the AWS Clean Rooms console and invites the advertiser.

To create a collaboration, complete the following steps:

  1. On the AWS Clean Rooms, console, choose Collaborations in the navigation pane.
  2. Choose Create collaboration.
  3. For Name, enter a name for the collaboration.
  4. In the Members section, enter the AWS account ID of the account you want to invite (in this case, the advertiser).
  5. In the Member abilities section, choose the member who can query and receive results (in this case, the advertiser).
  6. For Query logging, decide if you want query logging turned on. The queries are logged to Amazon CloudWatch.
  7. For Cryptographic computing, decide if you want to turn on support for cryptographic computing (pre-encrypt your data before associating it). AWS Clean Rooms will then run queries on the encrypted data.
  8. Choose Next.Create a collaboration
  9. On the Configure membership page, choose if you want to create the membership and collaboration now, or create the collaboration but activate your membership later.
  10. For Query results settings defaults, choose if you want to keep the default settings to receive results.
  11. For Log storage in Amazon CloudWatch Logs, specify your log settings.
  12. Specify any tags and who is paying for queries.
  13. Choose Next.
  14. Review the configuration and choose to either create the collaboration and membership now, or just the collaboration.

The publisher sends an invitation to the advertiser. The advertiser reviews the collaboration settings and creates a membership.

Create a configured table and set analysis rules

The publisher creates a configured table from the AWS Glue table (which represents the metadata definition of the S3 data, including location, so it can be read by AWS Clean Rooms when the query is run).

Complete the following steps:

  1. On the AWS Clean Rooms console, choose Configured tables in the navigation pane.
  2. Choose Configure new table.
  3. In the Choose AWS Glue table section, choose your database and table.
  4. In the Columns allowed in collaboration section, choose which of the existing table columns to allow for querying in the collaboration.
  5. In the Configured table details section, enter a name and optional description for the configured table.
  6. Choose Configure new table.Create a configured table and set analysis rules
  7. Choose the analysis rule type that matches the type of queries you want to allow on table. To allow an aggregation analysis, such as finding the size of the audience overlap, choose the aggregation analysis rule type.
  8. In the Aggregate functions section, choose COUNT DISTINCT as the aggregate function.
  9. In the Join controls section, choose whether your collaborator is required to join a table with yours. Because this is an audience overlap use case, select No, only overlap can be queried.
  10. Select the operators to allow for matching (for this example, select AND and OR).
  11. In the Dimension controls section, choose if you want to make any columns available as dimensions.
  12. In the Scalar functions section, choose if you want to limit the scalar functions allowed.
  13. Choose Next.Aggregate functions
  14. In the Aggregation constraints section, choose the minimum aggregation constraint for the configured table.

This allows you to filter out rows that don’t meet a certain minimum threshold of users (for example, if the threshold is set to 10, rows that aggregate fewer than 10 users are filtered out).

  1. Choose Next.Specify query results controls
  2. Review the settings and create the table.

Associate the table to the collaboration

AWS Clean Rooms requires access to read the table in order to run the query submitted by the advertiser. Complete the following steps to associate the table:

  1. On the AWS Clean Rooms console, navigate to your collaboration.
  2. Choose Associate table.
  3. For Configured table name, choose the name of your configured table.
  4. In the Table association details section, enter a name and optional description for the table.
  5. In the Service access section, you can choose to can use the default settings to create an AWS Identity and Access Management (IAM) service role for AWS Clean Rooms automatically, or you can use an existing role. IAM permissions are required to create or modify the role and pass the role to AWS Clean Rooms.
  6. Choose Associate table.Associate the table to the collaboration

The advertiser also completes the steps detailed in the preceding sections to create a configured table and associate it to the collaboration.

Run queries in the query editor

The advertiser can now navigate to the Queries tab for the collaboration and review tables to query and their analysis rules. You can specify

the S3 bucket where the output of the overlap query will go.

The advertiser can now write and run an overlap query. You can use a hashed email as a join key for the query (you have the option to use any column as the join key and can also use multiple columns for multiple join keys). You can also use the Analysis Builder no-code option to have AWS Clean Rooms generate SQL on your behalf. For our use case, we run the following queries:

#Query 1 – count of overlapping users between advertiser and publisher datasets

SELECT COUNT(DISTINCT advertiser.hashed_email)
FROM consumer as advertiser
INNER JOIN impressions as publisher
ON advertiser.hashed_email = publisher.hashed_email

#Query 2 – count of users in advertiser dataset

SELECT COUNT(DISTINCT advertiser.hashed_email)
FROM consumer as advertiser

Run queries in the query editor

The query results are sent to the advertiser’s S3 bucket, as shown in the following screenshot.

The query results are sent to the advertiser’s S3 bucket

Clean up

It’s a best practice to delete resources that are no longer being used. The advertiser and publisher should clean up their respective resources:

  • Advertiser – The advertiser deletes their configured table associations and collaboration membership. However, they don’t have to delete their configured table because it’s reusable across collaborations.
  • Publisher – The publisher deletes their configured table associations and the collaboration. They don’t have to delete their configured table because it’s reusable across collaborations.


In this post, we demonstrated how to set up an audience overlap collaboration using AWS Clean Rooms for media planning and partnership evaluation using a hashed email as a join key between datasets. Advertisers are increasingly turning to AWS Clean Rooms to conduct audience overlap analyses with their media partners, aiding their media investment decisions. Furthermore, audience overlaps help you accelerate your partnership evaluations by identifying the extent of overlap you share with potential partners.

To learn more about AWS Clean Rooms, watch the video Getting Started with AWS Clean Rooms, and refer to the following additional resources:

About the Authors

Eric Saccullo headshotEric Saccullo is a Senior Business Development Manager for AWS Clean Rooms at Amazon Web Services. He is focused on helping customers collaborate with their partners in privacy-enhanced ways to gain insights and improve business outcomes.

Shamir Tanna headshotShamir Tanna is a Senior Technical Product Manager at Amazon Web Services.

Ryan Malecky headshotRyan Malecky is a Senior Solutions Architect at Amazon Web Services. He is focused on helping customers gain insights from their data, especially with AWS Clean Rooms.