Pixlrun | Latest in Technology, AI, Gadgets & Science

The Experiment: Feeding AI Nonsense

Michael Vitevitch, a professor in the Speech, Language, and Hearing Department at the University of Kansas, decided to test the linguistic boundaries of ChatGPT with a clever three-stage experiment. Rather than trying to trick the AI for entertainment, he aimed to expose the true cognitive limitations of large language models (LLMs) like GPT.

The idea was simple but smart: What happens when you throw outdated, foreign-sounding, or completely made-up words at a language AI? Does it still make sense of them, or does it begin to “hallucinate” answers — something ChatGPT is known for?

Stage One: Obsolete Words and Hallucinations

In the first stage of the experiment, Vitevitch presented ChatGPT with 52 real but obsolete English words. These weren’t your everyday archaic terms — they included oddities like “upknocker,” a person whose job in the 1800s was to wake others up.

Here were the results:

✅ 36 words were defined correctly.

🤷‍♂️ 11 words got “I don’t know” responses.

🌐 3 were incorrectly translated into other languages.

🚨 2 responses were full hallucinations — fake definitions that sounded confident but were entirely made up.

Vitevitch noted, “It did hallucinate on a couple of things… I guess it was trying to be helpful.” This part of the test showed that while LLMs have impressive memory, they are not infallible — and we shouldn’t take their word as gospel, especially for uncommon or outdated language.

Stage Two: The Foreign Sound Trap

For the second stage, Vitevitch turned to real Spanish words and asked ChatGPT to list English words that sounded similar — a classic psycholinguistic method used on humans. But unlike human subjects, ChatGPT frequently responded with more Spanish.

This exposed a major difference between human cognition and AI logic. Instead of comparing phonetic similarity the way humans do, ChatGPT leaned on cross-language statistical associations — linking words that often appear together in multilingual data rather than how they actually sound.

It also revealed a quirk in how LLMs process confusion: humans stick to the requested language even when unsure. But ChatGPT? It may “switch languages,” go off-topic, or even make things up — because it prioritizes pattern completion over strict linguistic boundaries.

Stage Three: Invented Words & English Sounding Terms

In the final stage, the professor tested ChatGPT with totally made-up words like “lexinize” and “stumblop.” These were designed with varying degrees of “English-likeness” using phonotactic probabilities — the likelihood that a sound combination is found in English.

ChatGPT was asked to rate each word from 1 (definitely not English-sounding) to 7 (very English-sounding). When compared with human participants’ scores, the AI’s judgments matched closely.

But that wasn’t the end.

Vitevitch also prompted the AI to invent new words for concepts that have no name in English. It succeeded, often using compounding or blending techniques similar to those found in actual language development.

Some of the most creative AI-generated words included:

Rousrage – anger felt after being suddenly woken up

Prideify – taking pride in someone else’s success

Lexinize – when a nonsense word starts gaining meaning

Stumblop – to trip over one’s own feet in an awkward way

What This Means for the Future of Language AI

Through this mix of nonsense, language theory, and machine learning, Vitevitch didn’t just aim to expose where ChatGPT fails — but where it might actually help.

By observing how the AI handles nonwords, obsolete language, and creative tasks, researchers can better understand when it’s useful for language-based work and when it’s simply parroting patterns. This helps define the line between mimicry and cognition.

The takeaway? ChatGPT’s strength isn’t in accuracy or logic, but in statistical fluency — and that can be both a powerful tool and a critical weakness.