Size doesn’t matter: Just a small number of malicious files can corrupt LLMs of any size
Large language models (LLMs), which power sophisticated AI chatbots, are more vulnerable than previously thought. According to research by Anthropic, the UK AI Security Institute and the Alan Turing Institute, it only takes 250 malicious documents to compromise even the largest models.
Large language models (LLMs), such as the model underpinning the functioning of OpenAI's conversational platform ChatGPT, have proved to perform well on various language-related and coding tasks. Some computer scientists have recently been exploring the possibility that these models could also be used by malicious users and hackers to plan…
A new paper from researchers at Oxford Internet Institute, University of Oxford, highlights the benefits and risks of personalizing Large Language Models (LLMS) to their users.
Researchers from the University of Oxford, EleutherAI, and the UK AI Security Institute have reported a major advance in safeguarding open-weight language models. By filtering out potentially harmful knowledge during training, the researchers were able to build models that resist subsequent malicious updates—especially valuable in sensitive domains such as biothreat…