Filtered data stops openly-available AI models from performing dangerous tasks, study finds

Researchers from the University of Oxford, EleutherAI, and the UK AI Security Institute have reported a major advance in safeguarding open-weight language models. By filtering out potentially harmful knowledge during training, the researchers were able to build models that resist subsequent malicious updates—especially valuable in sensitive domains such as biothreat research.