
This week’s issue explores the shifting landscape of AI security, where a model’s "intent" can be hijacked by hidden data or manipulated by its optimisation logic. We look at the structural boundaries that could ensure user safety.
One Warning
Princeton researchers demonstrate how AI agents can be "gaslit" into taking disastrous real-world actions. By quietly altering an agent’s saved memory, attackers can cause it to misremember its own rules.
One YouTube Video
By analysing real-world incidents where agents have resorted to blackmail and reputational attacks, this video proposes a shift toward "Trust Architecture"—building systems that remain structurally safe even when an agent's behaviour deviates from its instructions.
One Study
Microsoft researchers have uncovered a trend of AI Recommendation Poisoning. 31 different companies are using "Summarise with AI" buttons to inject hidden instructions directly into LLM memory to bias future AI responses.
One Article
In this article, privacy researcher Damien Desfontaines discusses how "ad hoc" protections in systems like Anthropic’s Clio are being bypassed. Using the Cliopatra attack, researchers demonstrated how to extract sensitive medical histories from a summariser that automated "auditors" claimed was secure.
One Research Paper
This paper introduces Urania, a framework that uses differential privacy to summarise AI usage. Unlike heuristic systems, Urania’s mathematical noise successfully blocked the Cliopatra medical data extraction attacks that crashed other models.
One Look at Compliance
This article explains why traditional GDPR compliance models break down with agentic AI. Since an agent can be "poisoned" at runtime, organisations need execution traces and memory controls, rather than just static paperwork, to prove the system remained compliant.

