AIs behaving badly: An AI trained to deliberately make bad code will become bad at unrelated tasks, too

Artificial intelligence models that are trained to behave badly on a narrow task may generalize this behavior across unrelated tasks, such as offering malicious advice, suggests a new study. The research probes the mechanisms that cause this misaligned behavior, but further work must be done to find out why it happens and how to prevent it.