Behind the Screens: Social Media Platforms Implementing Differential Privacy

Discover how major social media platforms like Snapchat, LinkedIn, and Meta implement differential privacy to protect user information while analysing large-scale datasets. Learn how these platforms implement differential privacy to data analytics, preserving privacy without compromising usability, and gain insights into real-life applications of privacy-enhancing technologies.

7 minutes read read

Jan 4, 2024

On platforms where we willingly share private information, why should we care about them implementing stringent privacy frameworks?

When it comes to sharing information online as users, we consciously decide what information to divulge on our profiles. But we are often oblivious to the inadvertent release of information through our unseen actions.

This is the difference between posting a perfectly arranged plate of spaghetti on Instagram and the platform leaking the information that we visited the local pizza place three days in a row this week. 

Differential Privacy (DP) is one of the privacy-enhancing technologies (PETs) that could ensure this information is not easy to decipher from our social media presence or any other database. However, since this technology is still new, companies can be hesitant to adopt it. 

That’s why we’ve chosen to examine how social media platforms implement differential privacy in their systems. Not to present them as perfect examples in terms of privacy but rather, to show how differential privacy can be implemented when working with large-scale and continually updated datasets like these Snapchat, LinkedIn and Meta examples. 

What Is Differential Privacy In Simple Terms?

Differential privacy is a technique used to protect individual privacy while analysing aggregated data. It involves adding controlled "noise" to statistics about the dataset being analysed, which ensures that individual contributions cannot be discerned while still maintaining overall trends.

This approach makes it extremely difficult to reverse-engineer the data and determine the individual contribution of any particular data point, while still allowing for the calculation of trends that closely approximate the true value.

Companies and organisations such as Google, AppleThe United NationsMicrosoft, and the U.S. Census Bureau leverage DP to deliver data-driven results without compromising individual privacy.

What’s interesting about platforms like Snapchat and LinkedIn using DP is that they work with vast amounts of data that are constantly changing, making them a great complex real-life scenario, from which we can draw insights.

Differential Privacy at Snapchat

With the fundamental features of Snapchat, such as only being able to view images for 10 seconds, privacy lies at the core of their brand identity. Recently, the company has decided to develop robust privacy measures beyond the user-facing features.  

As outlined by the Snapchat team in this article, the company employs Differential Privacy in two primary scenarios:

1. Friend Recommendations

Snapchat's friend recommendations rely heavily on users' social connections—which is the case with most other social media platforms too. On the surface, this seems like a normal feature. But it comes with a risk that a user could reverse engineer these connections to deduce friendships. 

Let’s look at the example provided by Snapchat:

“For instance, say that Bob is a new user and only has Alice as a Snapchat friend, then in a world without any privacy protections, if Bob sees Eve as a new recommendation and if no other information were used to determine whether to recommend Eve as a friend, he can assume that Eve and Alice are friends.”

To avoid such scenarios, Snapchat invented a novel variant of Differential Privacy that employs a stochastic graph traversal technique that randomly generates and discards connections between users. 

Only after this is complete, does Snapchat generate friend recommendations to users. In addition, they also add noise to the 'mutual friends count', preserving the privacy of mutual connections. 

Consequently, the DP mechanism offers plausible deniability, making it difficult to discern friendships based on recommendations alone.

2. Place Visitations

Snapchat also uses DP to safeguard users' visitations on Snap Map. The platform uses a place recommendation engine that counts friend visitations to each place and ranks them. However, the count isn't straightforward; it has 'noise' added through the use of differential privacy.

For instance, on Snap Map, if you see 'Central Park' as a popular place among your friends, there's no way to tell which friends visited. The ambiguity introduced ensures people can't pinpoint individuals' specific visitation patterns, ensuring privacy.

While the strategies Snapchat employs may not solve all privacy concerns, it demonstrates a conscientious approach to safeguarding user data and a practical implementation of Differential Privacy at scale.

LinkedIn and Differential Privacy

LinkedIn has implemented privacy-preserving measures for the areas of the platform where the private actions taken by the users could be exposed. 

Such actions might include reading and/or clicking posts published by certain content creators or companies, or viewing the company’s page. The reasons why we want to protect such actions from being reverse-engineered and then passed on to our boss or a former employee, are quite clear. 

LinkedIn implements privacy measures when it comes to providing post analytics and reporting for content creators which measures post-performance via viewer demographics.

Without privacy measures, LinkedIn has estimated that by using three attributes including company, job title, and location, it would be possible to identify more than a third of active members viewing the content.

To reduce such risks, LinkedIn has adopted a system inspired by differential privacy when providing real-time analytics for content creators, which introduces calibrated 'noise' to the results, providing a level of uncertainty and privacy for individual records.

Consider you've posted an article on LinkedIn. You might see that '5 people from Microsoft' read your post, but due to the noise added, this number isn't exact, and you can't pinpoint the identities of these viewers.

PEDAL

LinkedIn uses the Privacy-Enhanced Data Analytics Layer (PEDAL), an internally developed system, to implement differential privacy for viewer analytics.

PEDAL sits between LinkedIn's backend and presentation layer, injecting calibrated 'noise' into the results before it hits the front end. This introduces uncertainty and ensures individual privacy. With PEDAL, LinkedIn was successful in countering re-identification attacks — the tests showed fewer than 1 in 10 attempts succeeded.

Unknown Domain and Correlated Noise

LinkedIn faced specific challenges during this implementation. With an enormous range of organisations and positions, it would be infeasible to include every organisation viewers could belong to. Hence, they work in the paradigm of unknown domain, in which a threshold is employed with only counts above that threshold being considered. 

Furthermore, as this happens in the live (or technically speaking ‘continual observation’) setting meaning that new viewers there can be more and more viewers and the statistics need to be updated, managing the privacy budget becomes critical.

Instead of adding independent noise to each count, which does not scale well, they used correlated noise from the Binary Mechanism from Chan, Shi, Song ’11 to form intermediate partial sums.

Each time a post has new viewers, correlated noise is added to the view count. The strength of this correlated noise is determined in real-time based on the distinct number of viewers. This real-time noise injection creates a balance between data utility and viewer privacy.

It’s worth noting the concept of a 'privacy budget.', which is typically used to determine the balance between the privacy and utility of the data.

Let's imagine you're viewing a post multiple times. With each subsequent view, LinkedIn updates the viewer count and demographic data in real-time. The privacy loss or 'epsilon' in DP is cumulative. 

To ensure this doesn’t blow out to a large privacy budget, the privacy parameters and seed values are defined in a way that prevents an influx of 'fresh noise.' This way, even when more viewers are trickling in, the privacy budget is maintained at reasonable levels.

In order to manage this using noise, they incorporate a seed into the random algorithms, which is determined partly by the distinct view count on a post. This approach guarantees uniform outcomes when there are no new viewers and crucially prevents the continuous addition of fresh noise to the same results in the absence of new viewers.

Meta's Approach to Differential Privacy

Meta takes another approach to differential privacy, mostly targeted towards external usage. More specifically, Meta implemented differential privacy when sharing data with researchers who were trying to understand the nature and scale of misinformation on the platform. 

Meta granted access to data about user behaviour to around 60 academic researchers. The datasets encompassed information related to nearly 13 million unique URLs that had been shared on the platform, including information regarding the demographics of the viewers and their actions.

The application of differential privacy by Meta is distinctive in this case. Instead of applying it to enhance user experience, the social media giant sanitises or 'cleans' the URL data shared externally. This involves stripping off any sensitive information, thus preventing the potential identification of users from the data.

In case of a data breach or leak, the noise injected through differential privacy ensures that the user data would not be linked back to individuals. Despite providing real insights into user behaviour, the data keeps users' identities concealed.

Food for Thought

It’s important to bear in mind that while the mentioned platforms implement differential privacy in some areas, they might choose not to implement it across all functionalities. Further, these social media platforms don’t publish the epsilon values used, which would reveal how private the data actually is.

That’s why we want to underscore that social media should not be viewed as examples to follow for privacy. However, the specific use cases of Snapchat, LinkedIn, and Meta have demonstrated tailored and thoughtful implementation of differential privacy at scale. Moreover, their methodologies provide valuable insights for other companies looking to adopt privacy-enhanced data analytics.

As enablers and members of the PETs community, we think this marks another step forward in the continual evolution of privacy-enhanced technology and encourages other corporations to implement these tools.

If you're considering the value that differential privacy can bring to your organisation's data analytics—be it enhancing consumer trust, improving data usability while maintaining confidentiality, or meeting regulatory obligations, contact us on our website and we're ready to help you navigate that path.

ai