AIs behaving badly: An AI trained to deliberately make bad code will become bad at unrelated tasks, too

Artificial intelligence models that are trained to behave badly on a narrow task may generalize this behavior across unrelated tasks, such as offering malicious advice, suggests a new study. The research probes the mechanisms that cause this misaligned behavior, but further work must be done to find out why it happens and how to prevent it.

Comments are closed

Uploading