Privacy Sandbox Simplified : No Code. In Simple, Plain English !!!
If you work in the AdTech domain or are remotely connected to it, I believe you would have surely heard of identity wars and Privacy Sandbox. Safari & Firefox deprecated 3rd party cookies long back and then, Apple also blocked IFAs and now, they released Private Relays to block IP address too. At the same time, you would have heard about Anti-trust lawsuits against Google and Google is planning to deprecate 3rd party cookies from Chrome starting Q1 2024. Although cookies were invented for altogther a diifferent purpose in 1995, online advertising industry latched on to it and it became a multi-billion dollar industry pegged to move towards being a trillion dollar industry. but this is not the time to go into history of cookies. This is time to deep dive into the future. and the best way to do that is to simplify Privacy Sandbox by Google.
Google has been working on Privacy Sandbox initiative for last few years now renaming it multiple times. Many other companies are participating in the discussion and working with them to evolve a solution for the entire industry. Prima Facie, everything looks great except one major thing (or 3 more things that I’ll mention in the last section of the article), it is excrutiatingly difficult for people to understand Privacy Sandbox. Forget the regular people, even seasoned AdTech experts and even engineers are struggling to understand how the whole thing works. Believe me when I say this or actually, don’t believe me but just navigate through hundreds of pages of documentation laced with APIs, technical jargons, code and thousands of hyperlinks providing you the best experience of escape room (or a labyrinth) you can’t find a way out. So, I thought of making an attempt to simplify it and kick off a discussion after spending more than a month to read through everything. My disclaimer still exists that I can be wrong in my understanding due to a fact that the project is still evolving and it is too cumbersome to learn everything by just reading hundreds of pages.
Anyway, Lets Begin.
What is Privacy Sandbox ?
As Google wants to deprecate 3rd party cookies and block advertising IDs that can identify a user (no judgements on right or wrong, let the courts and industry decide), they needed to provide an alternate to the world. While Apple and Firefox did the same without worrying about what happens to advertising industry, Google took a lead in building a solution. Rightly because if they don’t do it, their whole existence will have a question mark.
Privacy Sandbox is an attempt by Google to provide a solution to facilitate personalized advertising or targeting advertising in somewhat similar way as it exists today while still taking away the 3rd party cookie from the system. In a simple language, remove the support for 3rd party cookie (data) and build cohorts ( groups similar users) to facilitate targeted advertising on these cohorts rather than individuals. By this change, all the advertisers (and related parties like SSPs, DSPs, DMPs and such platforms) will not be able to leave their trace (3rd party cookie) on the user device so that they can look back at it to identify user. Look at the oversimplified diagram below.
I hope the above diagram could successfully explain the key difference between cookie ( or in future, ID ) based solution and Privacy Sandbox by Google.
Lets deep dive a bit more and I’ll make sure not to use any references of code, APIs, and other technical stuff. Privacy Sandbox is a collection of various modules. But the three main key modules or components are :
-
Topics : Pre-defined cohorts published by Google and Chrome assigning users to cohorts as and when users are browsing the internet. As of today, there are approximately 500+ predefined cohorts ( oops!! Topics) and there are multiple ongoing discussions to increase the number to 1500 or so to match IAB spec of seller defined audiences.
-
Protected Audiences : again, These are cohorts just that they are now defined by publishers/retailers/ad platforms. The key difference is that these cohorts are still managed by Chrome. There is an ongoing discussion about it being server side ( truster server etc). but more on it later.
-
Attribution & Reporting : Anyone who is part of advertising wants to get a report and also, figure out the conversion or attribution to sale etc. Chrome manages all attribution and reporting and ad platforms ( or interested parties) can get this data.
There are a bunch of other modules but to understand Privacy Sandbox, the above 3 modules are the most important ones. But I’d like steer away from the confusion and complexity at this stage.
Topics ( umm… Cohorts)
Think of zodiac signs. A long long time ago (IDK when; too lazy to google), someone decided to divide the human beings into 12 zodiac signs. So, 7 billion people on earth are now classified into 12 categories with an assumption that they must be having similar traits. So, if there are 400 million scorpios (human beings, not the arachnids) in the world, they must be having similar traits. What Google decided was that 12 categories are too less to divide 7 billion people, let’s create 500+ categories (and may 1500+ in the future). and if these 7 billion people can be into 500 categories (ofcourse with permutation and combinations), then the privacy problem goes away and targeting advertising will work magically like it works today.
Jokes apart !! Google wanted to address the problem of privacy by deprecating 3 party cookies (and rightly so !). So, as explained in the above diagram, they introduced topics as a set of APIs where ad platforms can register their interest for any topics and Chrome can then allow the auction to happen for any ad slot if the publisher allows it. Each Topic has an identifier called topic ID. This list of topics is pre-defined and published so that all the interested parties can follow this standardized taxonomy and interact based on the topic ID. Typical steps would look something like this.
- User browsing a website gets allocated to a pre-defined topic by Chrome.
- Chrome refreshes the topic list of a user every week ( or every day, not sure).
- When an ad spot occurs on a publisher website ( like yahoo finance), a bid request is sent with a topic ID.
- Ad Platforms who are interested in that topic would recognise the topic ID and decide whether they want to bid or not.
- Google Chrome will receive the bids from the interested ad platforms.
- Google Chrome will run an auction on the user device itself. (there are mechanisms to connect auction algos etc. but that discussion is for another time).
- Based on the auction algo, Chrome fetches the winning ad creative and gives it to publisher website.
- publisher shows the Ad to the user.
The steps are pretty simple like the usual bidding process. The key difference is that ad platforms now do not know about the individual user as their is no 3rd party cookie so they need to bid based on their understanding of cohort. Also, the second key difference is that he whole process is managed on user device (a.k.a Google Chrome)
That’s it about Topics in a simple manner. And everything else is just the engineering documentation to understand the integration details.
Protected Audiences ( umm… platform defined cohorts)
The key drawback of topics is that these cohorts worked like zodiac signs as they are pre-determined by Google. What happens if any publisher or ad platform wants a different set of cohorts as they feel the current taxonomy is limiting and not doing justice to their business. So, the module protected audiences is created by Google. for the lack of any better word, I call them platform defined cohorts as they can defined by any player in the online advertising ecosystem. Lets look at the following diagram.
There a few key differences to note in Protected Audiences compared to Topics.
- Topics is a set of pre-defined cohorts (standardized taxonomy) where as Protected Audiences are the cohorts ( or audience segments in legacy terms) owned by ad platforms and managed by Chrome on the device. As I mentioned earlier, there is a discussion on moving this to server-side, but let’s keep that out of scope for this article.
- Any Ad platform can register interest in Topics and place bids but as protected audiences are owned by ad platforms, they can either use it only for the campaigns running on their platform or may permit others depending on the terms of usage of protected audiences.
- The most important difference is that protected Audience can leverage first party data and can be a lot more accurate (or relevant) than topics.
If both Topics and Protected audiences are available, Chrome will make sure to make both of them available on the bid stream (ofcourse, with required permissions).
Typical steps to build and user protected audiences can look like this :
- Ad Platform ( or publisher) informs Chrome that it wants to create a protected audience. Typically, these audiences will be created based on first party data.
- Chrome will register the owner and the rules to add users to protected audiences defined by the ad platform.
- if a user falls into the realm of the protected audience rule set, Chrome will add that user to that audience segment.
- When an ad spot occurs on a publisher website ( like Yahoo finance), Chrome will decide whether the bid request needs to content protected audience signal or not based on ad platform partnership with the publisher.
- ad platform gets a bid request containing the protected audience signal and decides to place the bid.
- Rest of the auction process remains the same where Google Chrome receives the bids, runs the auction algo, fetches the creative and gives it to publisher to render it.
Protected Audiences is a key module where ad platforms and publishers can create their own audiences and still retain the rights to their own data and where they can sell their audiences to advertisers and generate data revenue as they do it today.
Attribution and Reporting
This is a module that may need significant upgrade. One of the biggest gaps in the online advertising industry is accurate measurement of attribution and providing as much granular reports as possible. With Privacy Sandbox, this becomes a lot more probabilistic that deterministic. Basically, Chrome is classifying the user actions in two sections :
- Source : when a user views an ad or clicks on an ad, it’ll be classified as source.
- Trigger : When a user takes any action post the view or click, that action or event is called as Trigger. For example: adding a product to cart, purchase a product, view a product etc.
With the help of this module, Chrome can provide two levels of reporting:
- Event Level : The detailed event log will be sent by Chrome if its within specified attribution window. For example: If a user views an ad, then clicks on after a few seconds and then adds a product to cart and purchases it, a bunch of event logs will be sent to the registered ad platform.
- Aggregate Level : Chrome will aggregate all data and create a report of that aggregated data. I don’t know why will any ad platform be interested in aggregate data unless they are licensing data to others and building controls over it.
There is nothing magical in attribution and reporting module. But there are a quite a few constraints as of today.
- The attribution window is supposedly 3 day window. I am not sure if its configurable or dynamic based on publisher or ad platform needs. What it means is if a user views an ad or clicks on ad today to browse the product but makes a purchase on the 4th day, then the purchase event will not be attributed to the ad provider.
- In the event level reports, only last 3 triggers per source will be maintained. So, if a product sales lifecycle had 5 events ( example: view product, add to cart, remove from cart, change sku by changing color or size, add to cart, purchase), only last 3 will be sent back to the ad platform and that too if all of them happened with in the 3 day attribution window.
- although a lower ( sub 1%), but some level of noise will be added to attribution data. In a simple language, some fake data will be added to the actual data.
- Reporting windows are set to 2 days, 7 days or 30 days.
- Custom Parameters are not supported in event level reports. For Example: if you want to see the product info, price etc in the event log, you cannot get that data back.
My take on Privacy Sandbox ( Questions I am still looking answers for )
I read through a lot of Privacy Sandbox documentation published by Google and other industry players. Generally, I am always in awe of Google products. Whether its Gmail, Android OS, Drive, Docs, Play store, Google Pay and ofcourse, Search, Google has been great at building products and making it simple for users and developers alike. But with Privacy Sandbox, I felt a bit different. So, I am listing out a bunch of questions that are open for me and I am still looking for answers. If you know the answers, feel free to message me on twitter or LinkedIn or just comment in this article.
- More of a request to Google : Can Privacy Sandbox team create a human understandable documentation or workflow for Privacy Sandbox ? I mean create a simple non-tech version first before you publish the links to github.
- Chrome & Android seems to be the centre-piece for Privacy Sandbox. How does it solve the industry problem ? There are still a significant number of users who use iPhones, Safari and Firefox browsers. So, how can Privacy Sandbox address the Apple ecosystem challenges ?
- Even within Google ecosystem, if Chrome retains the control, isn’t the whole system biased towards Google ? I mean how will Google make sure the Google services ( DV360, Performance Max, Ad Sense or Google properties) are not getting more benefits than open internet players ?
- I could not find “user consent” much discussed in any of the workflows. Does Privacy Sandbox assumes consent not being necessary anymore as individual targeting is not happening anymore ?
- While there are discussions about cohorts being server side ( trusted server stuff), why didn’t Google take that approach from the beginning and invested a huge deal in letting Chrome control it ?
- Why can’t Google hide all the API complexity and just release a bunch of SDKs ( whether client side or server side) to make life for the ecosystem players easy in integration ?
I strongly feel that privacy sandbox is a step in the right direction but I think the odds are stacked in favour of Google. On top of it, there is another discussion of ID (email/phone number) based identity and privacy discussion going on that conflicts with privacy sandbox. and my opinion is that for most of the ecosystem players, it’ll be integration requirements to support for ID and ID less environments. any company that takes only one approach might lose in the long term unless one of the strategy itself fizzles out.
This was my first attempt to simplify Privacy Sandbox topic. Now, I am on to my next version of Privacy Sandbox to do a bit more deeper dive. or shall I make an attempt to build a service around it🙂 ..
by the way, if you like the article or have disagreements or would like to highlight any errors, please leave a comment. you can follow me on twitter or LinkedIn too for further discussions.