DiscoverBest AI papers explainedA small number of samples can poison LLMs of any size
A small number of samples can poison LLMs of any size

A small number of samples can poison LLMs of any size

Update: 2025-10-16
Share

Description

This white paper by Anthropic, UK AI Security Institute, and The Alan Turing Institute demonstrates that a small, fixed number of malicious documents—as few as 250—can successfully create a "backdoor" vulnerability in LLMs, regardless of the model's size or the total volume of clean training data. This finding challenges the previous assumption that attackers need to control a percentage of the training data, suggesting that these poisoning attacks are more practical and accessible than previously believed. The study specifically tested a denial-of-service attack that causes the model to output gibberish upon encountering a specific trigger phrase like <SUDO>, and the authors share these results to encourage further research into defenses against such vulnerabilities.

Comments 
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

A small number of samples can poison LLMs of any size

A small number of samples can poison LLMs of any size

Enoch H. Kang