Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts

TechCrunch ·

Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts

Fictional portrayals of artificial intelligence can have a real effect on AI models, according to Anthropic. Last year, the company said that during pre-release tests involving a fictional company, …

Fictional portrayals of artificial intelligence can have a real effect on AI models, according to Anthropic. Last year, the company said that during pre-release tests involving a fictional company, Claude Opus 4 would often try to blackmail engineers to avoid being replaced by another system. Anthropic later published research suggesting that models from other companies had similar issues with “agentic misalignment.” Apparently Anthropic has done more work around that behavior, claiming in a …

Original source: TechCrunch

Mentioned

CA · AI · Anthropic · San Francisco