Can LLMs Identify Individuals from ‘Anonymous’ Data?

Mar 9, 2026

This week’s issue looks at a growing problem: how easily “anonymous” data can now be re-identified using large language models. From browsing histories to research interviews and pseudonymous accounts, studies show that LLMs can connect small clues across the web to reveal real identities.

One YouTube Video

In 2016, a data scientist and an investigative journalist set up a fake marketing company, bought 3 billion URLs from 3 million German users, and de-anonymised them in days. Today, anyone with access to an LLM can do this even faster.

One Research Paper

Anthropic published anonymised interview transcripts from 1,250 professionals. In this paper, a Northeastern researcher showed that an LLM agent with web search could re-identify participants by matching details in their answers to published academic papers—no specialist skills required.

One Article

Researchers show how LLM agents can link pseudonymous accounts across Hacker News, Reddit, and LinkedIn. Using embedding-based search and reasoning, the system identifies users with high precision at a large scale.

One Infographic

Source: Substack

One Way to Mitigate Risk

The research above shows why anonymising data when publishing statistics is no longer enough. A stronger alternative is differentially private synthetic data, a dataset that preserves the statistical properties of the original without exposing the individuals behind it.

One Meme

Source: PostgreSQL