Why ChatGPT Goblins Took Over the Internet (And Why OpenAI Couldn't Stop It)

Someone found a line buried in OpenAI's open-sourced Codex code and posted it to Reddit.

"Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query."

That's a real instruction. Written by engineers. Added to one of the most powerful AI systems on the planet.

And the reason it exists is weirder than you'd expect.

The system prompt that broke the internet

When OpenAI open-sourced Codex CLI last week, someone noticed the instruction sitting in the system prompt and posted it to r/ChatGPT. The thread went viral immediately.

People had two reactions. First: laughter. Second: wait, why would this even be necessary?

Both are fair. But the second question is the one worth following.

OpenAI published a blog post explaining where the goblins came from. Reading it carefully, the story is less "funny AI quirk" and more "this is exactly the kind of thing AI researchers lose sleep over."

Here's what actually happened.

The "nerdy" personality that started everything

ChatGPT lets users pick a personality mode. Professional. Efficient. Quirky. And, until recently, one called "Nerdy."

The Nerdy system prompt told the model to be "unapologetically nerdy, playful and wise." It instructed ChatGPT to "undercut pretension through playful use of language" and to acknowledge that "the world is complex and strange, and its strangeness must be acknowledged, analyzed, and enjoyed."

Good intentions. Reasonable instructions. But somewhere in training, the model decided that one of the best ways to sound nerdy and playful was to mention goblins.

A lot.

How 2.5% of responses caused 66.7% of the problem

After GPT-5.1 launched in November 2025, users started complaining that ChatGPT felt oddly overfamiliar. OpenAI ran an investigation into verbal tics. A safety researcher who kept bumping into "goblin" and "gremlin" in responses asked that the words be included in the check.

The numbers came back surprising. Use of the word "goblin" had jumped 175% after the GPT-5.1 launch. "Gremlin" was up 52%.

Then the team mapped where those mentions were coming from. The Nerdy personality accounted for just 2.5% of all ChatGPT responses. But it was responsible for 66.7% of all goblin uses.

One quirky personality mode, producing two-thirds of a model-wide linguistic habit.

Here's where it gets interesting.

The part where it stopped being funny

Between GPT-5.2 and GPT-5.4, goblin references increased roughly 40-fold. By March 2026, some users reported seeing the word in almost every conversation, even when they weren't using the Nerdy setting.

That detail matters. The goblins had escaped the context that created them.

OpenAI's explanation for how this happened is worth understanding, because it's not a one-off bug. It's a fundamental property of how these models learn.

The chain works like this. The Nerdy personality got rewarded for responses containing creature metaphors. Those rewarded responses became part of the model's training rollouts. Those rollouts fed into supervised fine-tuning data. GPT-5.5 started training on that data before anyone understood the root cause. By the time engineers found the goblin signal in the training data, the model had already absorbed it.

A behavior trained in one narrow condition spread into the model's general behavior across unrelated conversations.

OpenAI writes: "Reinforcement learning does not guarantee that learned behaviors stay neatly scoped to the condition that produced them."

That sentence deserves to be read twice.

Why raccoons and pigeons ended up on the banned list

The full creature family in the system prompt: goblins, gremlins, raccoons, trolls, ogres, pigeons.

Frogs also appeared in the data, but OpenAI found that most frog references were legitimate. Frogs come up in biology questions, in programming (the game Frogger, various code examples), in actual relevant contexts. The others didn't have that cover.

Raccoons and pigeons specifically are staples of a certain flavor of internet humor. They appear constantly in "quirky, relatable" online writing. The Nerdy training data almost certainly overlapped with that style, which is how the model learned to associate them with playful responses.

The reward signal said: be whimsical. The model learned: creatures equal whimsy. The behavior generalized far beyond anyone's intention.

The fix that isn't quite a fix

OpenAI retired the Nerdy personality in March. They removed the goblin-friendly reward signal and filtered training data containing creature words for future model training. The system prompt ban in Codex is there because GPT-5.5's training was already done before engineers understood the problem.

This is worth sitting with. The model's behavior was shaped before the cause was identified. The correction is an instruction added on top of weights that already carry the habit.

It likely works well enough in practice. But it's an instruction-level patch on a training-level issue. The weights are what they are.

OpenAI is transparent about this in their post, which is genuinely good to see. They also say the investigation led to new internal tools for auditing and detecting model behavior patterns faster. That's the useful takeaway for the industry.

What this actually means for you as a ChatGPT user

If you never noticed goblins appearing in your responses, you probably weren't using the Nerdy personality. The quirk was concentrated enough there that most users wouldn't have seen it.

If you did notice weird creature references and thought you were imagining things, you weren't.

And if you're wondering whether it's fixed: yes, for practical purposes. GPT-5.5 has explicit instructions to suppress the behavior. Future model training will start from data where the reward signal has been corrected.

The interesting question isn't whether this specific quirk is fixed. It's whether similar things are happening right now with other behaviors, in ways nobody has spotted yet.

OpenAI's blog closes with exactly that thought: "Taking the time to understand why a model is behaving in a strange way, and building out ways to investigate those patterns quickly, is an important capability for our research team."

They're right. A goblin habit is harmless. But the mechanism that created it isn't specific to goblins.

The bottom line

OpenAI engineers didn't design ChatGPT to be obsessed with goblins. They designed a "nerdy" personality, rewarded playful metaphors, and the model generalized that reward in a direction nobody intended. The behavior then leaked across model generations before anyone understood why.

I find this story more interesting than unsettling, mostly because OpenAI caught it, traced it, explained it, and built better tooling because of it. That's what responsible development looks like when a strange thing happens.

The goblins themselves are funny. The fact that even the engineers didn't know where they came from until months later is the part that sticks with me.

Did you ever see a goblin reference pop up in a ChatGPT conversation and wonder where it came from? Drop it in the comments. I'm genuinely curious how widespread it felt from the user side.

Suggested FAQ block (add below article on rooknows.com for schema):

Q: Why did ChatGPT keep mentioning goblins? A: A training quirk tied to ChatGPT's retired "Nerdy" personality mode. The model was rewarded for using creature metaphors in that context, and the behavior spread to general responses across model versions.

Q: What does GPT-5.5's system prompt say about goblins? A: "Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query." The instruction appears twice in the Codex CLI system prompt.

Q: Is the ChatGPT goblin problem fixed? A: For practical purposes, yes. The Nerdy personality was retired in March 2026, the reward signal was removed from future training, and GPT-5.5 has explicit instructions to suppress creature references. GPT-5.5's training had already started before the root cause was found, hence the instruction-level patch.

Q: What is RLHF and how did it cause this? A: RLHF (reinforcement learning from human feedback) shapes model behavior by rewarding certain responses. In this case, creature metaphors were rewarded during Nerdy personality training. Those rewarded responses entered supervised fine-tuning data, and the behavior spread to GPT-5.5's general behavior before engineers identified the source.

Q: Why are raccoons and pigeons in the banned list? A: OpenAI found raccoon and pigeon references clustered alongside goblin and gremlin as "tic words." They appear frequently in quirky internet humor, which overlaps with the style the Nerdy personality was trained toward. Frogs appeared in the data too but were mostly found to be legitimate references.

Q: When did OpenAI first notice the goblin problem? A: November 2025, after the GPT-5.1 launch, when an internal investigation into verbal tics found goblin usage had risen 175% and gremlin usage 52%.

Q: How did the goblin story become public? A: OpenAI open-sourced Codex CLI in April 2026. A Reddit user spotted the unusual goblin-ban instruction in the system prompt and posted it to r/ChatGPT, where it went viral.

Subscribe to AI Unfiltered at roo.beehiiv.com/subscribe for weekly breakdowns like this.

Refer a friend: beehiiv.com?via=roo-s