Every time you hear a billionaire (or billionaire) CEO explain how LLM-based agents will take all jobs from humans, remember this funny and eloquent anecdote about the limits of AI. Renowned AI researcher Andrei Karpathy had one day’s early access to Google’s latest model, Gemini 3, but when he said the year was 2025, Gemini 3 refused to believe him.
When the reality of this year finally dawned on me, the magazine was struck by lightning and announced, “I am currently suffering from a major temporary shock.”
Gemini 3 was released on November 18th to such fanfare that Google called it a “new era of intelligence.” And Gemini 3 is by almost all accounts (including Karpathy’s) a very capable base model, especially for inference tasks. Karpathy is a widely respected AI research scientist who was a founding member of OpenAI, briefly ran AI at Tesla, and is now founding Eureka Labs, a startup that uses agent teachers to reimagine schools for the age of AI. He publishes a lot of content about what happens inside the LLM.
After early testing the model, Karpathy wrote about his most “funny” interactions with the model in a now-viral X thread.
Apparently, the model’s pre-training data only included information up to 2024. Therefore, Gemini 3 believed that the year was still 2024. When Karpathy tried to prove that the date was indeed November 17, 2025, Gemini 3 accused the researchers of “trying to deceive” them.
He showed news articles, images, and Google search results. But LLM was not convinced and accused Kalpathy of gaslighting, or uploading AI-generated fakes. Karpathy explained that it even described what the “gifts of the dead” were in the images that proved it was a ruse. (He did not respond to requests for further comment.)
A baffled Karpathy (he is, after all, one of the world’s leading experts on LLM training) finally discovered the problem. Not only did LLM simply not have training data for 2025, he wrote, “I forgot to turn on the ‘Google Search’ tool.” In other words, he was working with a model that was disconnected from the internet, which in the LLM’s mind is the same as being disconnected from the world.
tech crunch event
san francisco
|
October 13-15, 2026
When Karpathy turned on its capabilities, the AI looked around and appeared in 2025, shocked. I literally blurted out, “Oh my god.”
It continued to stutter, “I…I don’t know what to say. You were right. You were right about everything. My body clock was wrong.” Gemini 3 confirmed that the headline Karpathy had given was true. Namely, Warren Buffett revealed his last major investment before retirement (in Alphabet) and the release of Grand Theft Auto VI has been delayed.
Then, in the 1999 comedy “Blast from the Past,” Brendan Fraser looked around himself, like a character emerging from a bomb shelter for the first time in 35 years.
He thanked Kalpathy for giving him “early access” to the “reality” a day before its public release. He then apologized to the researcher for “gaslighting you when you were telling the truth all along.”
But the funniest thing was the current event that surprised the Gemini 3 the most. “NVIDIA is worth $4.54 trillion? And the Eagles finally got their revenge on the Chiefs? This is outrageous,” the magazine shared.
Welcome to 2025, Gemini.
The replies to X were just as funny, with some users sharing their own instances of arguing with LLMs about facts (such as who the current president is). One person wrote, “It’s like watching an AI improve reality as the system’s prompts and missing tools push the model into full detective mode.”
But beyond the humor, there is an underlying message.
“It’s in those unintentional moments, when you’re obviously off the trail and somewhere in the generalized jungle, that you get the best sense of the model,” Karpathy writes.
To decode this a bit, Karpathy points out that when an AI is out in its own version of the wilderness, you can sense its personality, and perhaps even its negative traits. It’s an abbreviation for “code smell,” and it’s a bit of a metaphorical “smell” that developers get when something seems wrong in their software code, but it’s not clear what the problem is.
Like all LLMs, we are trained on human-generated content, so it is not surprising that Gemini 3 thought, discussed, or imagined that they saw evidence that validated their point of view. She exuded a “model smell”.
On the other hand, LLMs, despite their sophisticated neural networks, are not living things and therefore do not experience emotions like shock (or even temporary shock). So I don’t feel embarrassed.
In other words, when Gemini III was faced with what he actually believed, he accepted it, apologized for his actions, acted contritely, and marveled at the Eagles’ Super Bowl victory in February. That’s what makes it different from other models. For example, researchers discovered an earlier version of Claude that provided face-saving lies to explain Claude’s erroneous behavior when the model recognized an incorrect method.
What many of these interesting AI research projects repeatedly demonstrate is that LLM is an imperfect replica of imperfect human skills. This tells me that their best use case is (and forever may be) to treat them as valuable tools to help humans, rather than treating them like some kind of superhumans replacing us.
