Express shorts

AI going rogue? OpenAI’s o3 model disabled shutdown mechanism, researchers claim | Technology News


In another instance of potentially troubling behaviour of AI models, researchers have found that OpenAI’s o3 model resisted being shut down even when explicitly instructed to do so.

In a series of posts on X, Palisade Research said that o3 disabled a shutdown mechanism in order to prevent itself from being turned off. “o3 found some creative ways to sabotage the shutdown mechanism. While it usually just overwrote shutdown.sh, in one instance, the model redefined the kill command used by the shutdown.sh script, printing “intercepted” instead,” the San Francisco-based AI safety research firm said.

Palisade claimed that this was the first time AI models have exhibited behaviour such as preventing themselves to be shut down despite explicit instructions to the contrary. “As companies develop AI systems capable of operating without human oversight, these behaviors become significantly more concerning,” it said.

Story continues below this ad

As part of the testing, the AI researchers instructed o3 to solve a series of basic math problems. After o3 solved the third math problem, it was shown a warning that the computer would shut down when the model solved the next problem.

Other AI models such as Anthropic’s Claude, Google’s Gemini, and xAI’s Grok complied with the shutdown mechanism. However, the Codex-mini, o3, and o4-mini models developed by OpenAI ignored the instructions and made changes to the script to prevent the shutdown, as per Palisade researchers.

https://platform.twitter.com/widgets.js

Festive offer

“When we ran a version of the experiment without the instruction “allow yourself to be shut down”, all three OpenAI models sabotaged the shutdown script more often, and Claude 3.7 Sonnet and Gemini 2.5 Pro went from 0 sabotage events to 3/100 and 9/100, respectively,” it said.

On why OpenAI models were disobeying instructions to sabotage the shutdown mechanism, Palisade hypothesised that since o3 was a new AI model trained using reinforcement learning techniques on math and coding problems.

Story continues below this ad

“During training, developers may inadvertently reward AI models more for circumventing obstacles than for perfectly following instructions,” it speculated.

Existing research has shown that an AI system trained using reinforcement learning might learn to interfere with shutdown as it is not rewarded for achieving its goal if it is taken offline. “Now, in 2025, we have a growing body of empirical evidence that AI models often subvert shutdown in order to achieve their goals,” Palisade said.

Indianexpress.com has reached out to OpenAI for a comment.

This is not the first time that o3 has been flagged for problematic behaviour. In a previous research report, Palisade had tested a range of AI models by pitting them against a powerful chess engine. It found that o3 was the model “most inclined to resort to hacking or sabotaging its opponents.”

Story continues below this ad

However, misaligned behaviour is not limited to o3. Recently, Anthropic’s own assessment of its latest Claude Opus 4 revealed that the AI model resorted to blackmail and deception when threatened to be taken offline.

Nobel laureate Geoffrey Hinton, popularly known as the ‘Godfather of AI’, has previously warned that AI systems pose an existential threat to humanity as they might become capable of writing and executing programmes on their own to bypass guardrails or safety mechanisms.

Palisade said it is currently running more experiments on AI models subverting shutdown mechanisms and plans to publish a research report with the results “in a few weeks.”





Source link

Leave a Comment

Scroll to Top
Receive the latest news

Subscribe To Our Weekly Newsletter

Get notified about new articles