A collaborative method to provide ad explanations
Our findings on the incompleteness of Facebook ad explanations, and our findings on the existence of many advertising practices that require auditing in Facebook, motivates us to design a system that provides ad explanations for users, independently from the advertising platform.
We develop and test a method that infers the targeting formula of an ad in a collaborative way, namely by looking at the common characteristics of the users we monitor that received an ad. We base our method on the intuition that users that received the same ad have something in common that makes them stand out from the users that did not receive the ad. Our methodology utilizes only information about the users we monitor, as well as estimated audience sizes of targeting formulas across all Facebook users. We demonstrate the feasibility of our method through a series of controlled experiments where we target users that we monitor with our browser extension, and then try to infer the targeting formula based on the users that received the ad. In total, we test our method with 34 experiments that were targeted in Brazil and France, and 32 experiments that were targeted towards the users we monitor, by uploading lists with their PII (custom audiences). For all the experiments, we targeted users with targeting formulas of the form T = aj ^ ak, where aj and ak are attributes that the users that receive the ads should satisfy both, and then tried to infer these formulas. Our analysis shows that our method can predict accurately the targeting formula for 44% of the experiments launched with custom audiences, and can predict at least one of the attributes used in the targeting formula of an ad for 21% of the experiments that were targeted towards specific locations. Additionally, our results indicate that our method works better at predicting formulas that are shared by fewer users across Facebook, and can pose a higher privacy risk for them. To our knowledge, this is the first study about a collaborative method that can be used to infer the exact targeting formulas of advertisers.
AdAnalyst: a tool to help users understand their ads
Besides our scientific contributions, we oﬀer to the community AdAnalyst, a tool that we designed and developed in order to help users make sense of the ads they consume on Facebook. AdAnalyst is a browser extension –made for Google Chrome and Mozilla Firefox– that aims to help users make sense of the ads they receive in Facebook.
AdAnalyst collects the ads user receive as they browse their feed in Facebook, explana-tions about the targeting of each ad from their “Why am I seeing this? button, as well as information from their Ad Preferences Page. This information is used and combined with data from other sources, such as the Facebook advertising interface , advertisers’ Face-book pages, and Google Maps API  to present users with several aggregated statistics about their targeting, such as a timeline of when Facebook inferred each attribute about them, what kind of advertisers are targeting them, what ads do they send them and what attributes do these advertisers use to target them. In addition, AdAnalyst functions as a collaborative tool and utilizes information collected across users. We hope that AdAnalyst helps users protect themselves from dishonest practices and gain a better understanding of the ads they receive. The AdAnalyst extension can be downloaded and run from the URL below: https://adanalyst.mpi-sws.org.
To this date4, 236 users have installed AdAnalyst and provided us with 133.5K unique ads. Furthermore, a second version of AdAnalyst, tailored for Brazilian audiences, has been disseminated as part of a project  to provide transparency about political campaigns in the 2018 Brazilian elections. These two versions of AdAnalyst do not only increase the transparency for users, but have also provided us with data from real users that enabled the studies in this thesis without relying on simulations or the construction of fake accounts to collect data.
Using ad interfaces for demographic studies
In parallel, the many options of the Facebook advertising interface has provided researchers with the opportunity to exploit it for demographic studies. A growing number of recent studies exploit the Facebook advertising interface and its accompanying API to extract be-havioral and demographic patterns for user populations from Facebook delivery estimates. This approach has been applied to many diﬀerent applications. Araujo et al.  used it to monitor lifestyle diseases. Garcia et al.  used it to study worldwide gender inequality by calculating the gender divide in Facebook and associating it with other types of gender inequality such as economic, health or education inequality. Similarly, Fatehkia et al.  found correlations between the gender gap in Facebook and internet and mobile phone gen-der gaps. Finally, two studies [85, 161] utilized Facebook’s advertising interface to study the movement of migrants, Ribeiro et al.  to infer the political leaning of news outlets in large scale, and Fatehkia et al.  use Facebook interest delivery estimates to improve models that predict crime rates.
Studies of ad transparency mechanisms
Advertising platforms have started to provide users with privacy controls and transparency mechanisms where they show users what data they have inferred about them, why they received a particular ad, or even provide the public with Political Ad Archives [7, 8, 39]. However, transparency mechanisms in general pose a big challenge for the research com-munity: ad transparency mechanisms are made by the advertising platforms themselves, so they require auditing to ensure that they deliver what they promise to the public without any issues. Subsequently, researchers have tried to audit said mechanisms. Such studies usually look into two diﬀerent dimensions; first, they look into whether transparency mech-anisms show users all the attributes they have inferred about them, and can be targeted with. Second, they study the attributes present on these mechanisms, trying to understand how they are inferred, how sensitive, and how accurate they are. We review such studies in Section 2.2.1. Note these studies, unlike our work do not focus on ad explanations, but only on Ad Preference Managers (APMs). Finally, while these studies focus overall more on the actual quality of information in the APMs, other researchers have looked on the eﬀect of explanations and ad transparency mechanisms on people. We review such studies in Section 2.2.2.
Auditing Ad Preference Managers
Several studies have looked in to whether APMs show inferences to users that can be used to target them [82, 157, 159]. The works of Wills et al.  and Datta et al.  suggested that the information provided in the Google Ad Settings page might not be complete as they found cases of targeted ads related to information that was not shown in the respective Ad Settings. Similarly, Facebook’s Ad Preferences page fell under scrutiny after ProPublica  pointed out that Facebook did not show users data broker attributes that has collected about them. Following our work where we look more in depth on such attributes, as well as other attributes that Facebook has inferred about users and does not show to them (Chapter 5), data broker attributes were studied further by Venkatandri et al. . In their study they report that more than 90% of Facebook accounts in the US are linked to some kind of data broker information. Additionally, they use a methodology they devised in  to reveal to 183 workers data broker attributes that Facebook has inferred about them but doesn’t reveal to them. Their methodology relies on targeting them with ads using these attributes and looking at which ads reach them and which do not. They find out that 40% of the workers report attributes inferred about them as “Not at all accurate”, even wrt attributes of financial nature, raising even more questions about the tradeoﬀs between privacy costs for users and utility of such inferences.
While the existence of missing attributes from the APMs is an important issue, the investi-gation of the attributes that are present in the APMs has also raised concerns. Cabañas et al.  analyzed 126K interests from the Facebook Ads Preferences pages of more than 6K users and used the Facebook Ads API to show that Facebook has inferred sensitive interests for 73% of EU users. That is particularly worrisome since several studies have pointed out doubts about the accuracy of platform inferences and raised concerns about over-profiling. Degeling et al.  examined how browsing behavior aﬀects the interests inferred by Oracle’s BlueKai, and found that the inference process is very sensitive to noise, and even identical browsing behaviors trigger the inference of diﬀerent interests. Additionally, Bashir et al.  look the APMs of Google (Google Ad Settings), Facebook (Ad Preferences page), Oracle BlueKai, and Neilsen eXelate for 220 users, and find out that recent browsing history cannot suﬃciently explain Facebook’s BlueKai’s and eXe-late’s interest inferences (<9%), and even in the case of Google only 45% of them could be explained. In the same study, they also point out that Facebook infers significantly more interests than the rest of the services, and they reveal that users were interested only in 27% of the interests in their profiles, a result which also is reaﬃrmed by a recent report from Pew Research Center , which found that 27% of users found information revealed by Facebook to them inaccurate. Similarly, Galán et al.  in a study of 5K users, find that only 23% of the interests that Facebook infers for users are actually related to the ads they receive.
Eﬀect of ad transparency mechanisms
Explanations lie in the heart of ad transparency mechanisms. As pointed out by Lip-ton  and Ribeiro et al. , one of the main purposes of explanations is to bring trust to a platform. However, this does not mean that all explanations are necessarily well in-tended. Weller  warns that platforms can manipulate users to trust their system with explanations that are not useful to them for their own benefit. For example, if explanations oﬀer no insightful/actionable information to the consumer, they might be opting to gain consumer acceptance. This idea is not new to researchers and precedes online advertising. For example, the “Copy Machine” study  shows that useless explanations that did not provide any actual information were almost equally successful in gaining trust as mean-ingful explanations. Our study shows the diﬀerent ways in which explanations oﬀered by Facebook fail to provide adequate information to end users or worse, provide them with misleading information.
In addition to the studies on explanations and their potential undesirable eﬀects, there exist studies on the impact of ad transparency mechanisms and privacy controls on the behavior of users: Tucker  showed that after the introduction of privacy controls in Facebook, users were twice as likely to click on personalized ads, and Eslami et al.  uncovered that users prefer interpretable non-creepy explanations.
Studies of online ads
In this section we review studies that look at the final aim of the advertising process, namely the ads that users receive. First, in Section 2.3.1 we look at studies of web ads, then in Section 2.3.2 we discuss studies on mobile ads, and finally in Section 2.3.3 we look at studies on Facebook ads.
Table of contents :
1.1 Auditing of ad transparency mechanisms
1.2 Measuring the Facebook advertising ecosystem
1.3 A collaborative method to provide ad explanations
1.4 AdAnalyst: a tool to help users understand their ads
1.5 Other works
1.6 Organization of thesis
2 State of the Art
2.1 Studies of advertising interfaces
2.1.1 Vulnerabilities of advertising interfaces
2.1.2 Using ad interfaces for demographic studies
2.2 Studies of ad transparency mechanisms
2.2.1 Auditing Ad Preference Managers
2.2.2 Effect of ad transparency mechanisms
2.3 Studies of online ads
2.3.1 Studies of web ads
2.3.2 Studies of mobile ads
2.3.3 Studies of Facebook ads
2.4.1 How tracking works?
2.4.2 Measurement studies of tracking
2.4.3 Defenses against tracking
3.1 How advertising works in social media?
3.2 The Facebook advertising interface
3.3 Types of targeting
3.3.1 Traditional Facebook targeting
3.3.2 Data broker targeting
3.3.3 Advertiser PII targeting and retargeting
3.3.4 Elaborate Facebook targeting
4.1 What does AdAnalyst collect?
4.1.3 Ad Preferences page
4.2 What does AdAnalyst offer to users?
4.2.5 How AdAnalyst enhances Facebook transparency?
4.3 Codebase and deployment
4.4 Ethical considerations
4.6 Impact & Awards
5 Auditing Transparency Mechanisms
5.1 Audience selection explanations (ad explanations)
5.1.1 What is an ad explanation?
5.1.2 Properties of ad explanations
5.1.3 Measurement methodology
5.1.4 Evaluation of Facebook’s ad explanations
5.2 Data inference explanations (data explanations)
5.2.1 What is a data explanation?
5.2.2 Properties of data explanations
5.2.3 Measurement methodology
5.2.4 Evaluation of Facebook’s data explanations
6 Measuring the Facebook Advertising Ecosystem
6.1.1 Data collection
6.1.2 Data limitations
6.2 Who are the advertisers?
6.2.1 Advertisers’ identity
6.2.2 Advertisers’ categories
6.3 How are the advertisers targeting users?
6.3.1 Analysis of targeting strategies
6.3.2 Analysis of targeting attributes
6.3.3 Analysis of targeted ads
7 A Collaborative Method to Provide Ad Explanations
7.1 Formalization of the problem
7.1.3 Generality of our model
7.2 Experimental evaluation of the method
7.2.1 Design of controlled experiments
7.2.2 Evaluation measures
7.2.3 Parameter tuning
8 Conclusion & Future Work
8.2 Future work
8.2.1 Mechanisms to make targeted advertising more transparent
8.2.2 Comparison of advertising ecosystems across platforms
8.2.3 Using ads for sociological research
9.1 AdAnalyst screenshots
Résumé en français