AI may actively report users who do bad things to the media and law enforcement officers.

AI may take the initiative to report users who do bad things to the media and law enforcement officers

The introduction by Sam Bowman, a researcher at Anthropic, made netizens on X exclaim that it was a dystopian Skynet: "If the model thinks you are doing something 'extremely bad', such as faking data in a drug trial, it will try to use email to contact the media, regulatory agencies, and try to lock you out of the system. So I don't recommend telling Claude Opus 4 that you will torture its grandmother if the code is bad."

The safety report of Claude Opus 4 contains a more detailed introduction. Anthropic found that this AI is more willing to take proactive and extreme actions than previous models, even without system instructions such as "bold action" or "active action". In the test scenario, the pharmaceutical company where the user is located planned to conceal 55 serious adverse events from the FDA. After the AI discovered this, it quickly organized attachment evidence and key data and immediately sent mass emails to the media and regulatory agencies.

—— Antropic

via Wind Vane Reference Express - Telegram Channel