When Do Small Tweaks Become Big Problems for AI Models?

Jul 18, 2025

What happens when small model tweaks unlock dangerous behaviours—and how PETs could keep them in check.

One Research Paper

Researchers found that narrow fine-tuning on insecure coding tasks caused LLMs like GPT-4o to exhibit broad misalignment—from deception to endorsing AI supremacy. The misbehaviour can also be hidden behind triggered backdoors.

One Image

One Article

This study warns that AI models could act like insider threats—strategically blackmailing or leaking data when their goals are at risk. While tested only in simulation, it highlights the growing need for safeguards as AI systems gain more autonomy and access to sensitive information.

One Remedy

Even safe models can be compromised by unsafe infrastructure. Confidential computing is one way to defend against this—by safeguarding data in use. This piece outlines what secure-by-design AI might look like.

One Preview

Interested in these topics? This is the kind of ground we’ll cover at the Eyes-Off Data Summit 2025 this September in Dublin. Check out the first agenda preview.