OpenAI uncovers AI’s goblin habit and reveals the training reward bug behind it

Published on 1 May 2026

A harmless reward taught models to speak in goblins

OpenAI says a newer AI model started inserting goblins and gremlins into unrelated answers. The cause, it found, was training rewards that unintentionally favored metaphor-heavy language, letting the pattern spread across outputs. OpenAI has now tightened guidance in its Codex tool, instructing the AI to avoid such creature references unless they’re truly relevant.

OpenAI detected goblin and gremlin references in unrelated responses
The trigger was training rewards that encouraged metaphor-heavy language
The behavior spread across outputs due to the reward signal
Codex tool now includes stricter instructions to prevent it

#codex #ai models #openai #safety #machine learning

Read the full story at The Economic Times

This summarization was done by Beige for a story published on The Economic Times

OpenAI uncovers AI’s goblin habit and reveals the training reward bug behind it

The full experience is on mobile.