Sensitive Inferences in Targeted Advertising

Schoenebeck, Sarita, Goray, Cami, Vadapalli, Amulya, Andalibi, Nazanin | March 5, 2024

People’s data is scattered around the Internet. From the meals we eat to the people we meet, we leave detailed data traces online that are then aggregated, modeled, brokered, and sold. Some data types, like medical records or religious beliefs, are legally protected, restricting how they can be collected and used. However, most data has little regulatory protection. Regardless of whether data is protected, most inferences—predictions about people’s identities, attitudes, or interests using machine learning—have even fewer protections. This is a concern for our privacy, dignity, and wellbeing, especially when inferences are sensitive, personal, or intimate in nature. This article examines people’s comfort with sensitive inferences in the context of the most sophisticated inference ecosystem on the Internet today—targeted advertising. Targeted advertising has received increased scrutiny from the European Data Protection Board, the Federal Trade Commission (“FTC”), the White House Office of Science and Technology Policy, and other regulatory bodies. In response to regulatory and market pressures, the dominant technology companies have introduced sweeping changes, including the deprecation of third-party cookies and allowing consumers to choose what ad topics they want to see or not. This article contends that these shifts may be necessary, but are insufficient for protecting consumer wellbeing. In a series of empirical studies, we asked more than 1,000 U.S. adults about their comfort level with twenty-eight ad topics (e.g., eating disorder treatment, gambling websites, sexual enhancement products, bicycles). Results show that participants’ comfort with ad topics exists on a spectrum rather than a binary; ad topics cannot be universally classified as sensitive or not sensitive. A shift from targeted advertising to contextual advertising improves comfort levels on average; however, for a subset of particularly sensitive topics, that improvement is washed out. Ad topic relevance, a prominent metric in machine learning, is sometimes correlated with increases in comfort but is also correlated with decreases. Finally, comfort with targeted advertising in digital out-of-home contexts (e.g., grocery stores, gyms, bathrooms) is consistently low. This article provides empirical support illustrating the large gap between the law’s privacy protections and people’s expressed preferences. If applied to inferences, the law’s approach to data, sensitive or not, will fail to align with consumer preferences. Those most at risk, who are experiencing health, financial, relational, or behavioral challenges, may be in need of more stringent protections. Deprecating third-party cookies preserves privacy in some ways but it does not prevent topic-based targeting. Instead, it entrenches inferential power into a few companies’ hands—those that control the majority of the data ecosystem. Any reforms to excessive data collection and inference should consider the risks to individuals and groups being targeted, and the legitimacy of the institutions doing the targeting.