This is a story about how a crude internet joke got from Reddit to ChatGPT to the top of Google.
Here is what you get currently if you ask Google “Are there any African countries starting with K”
The featured snippet is obviously nonsense. Once you get past the featured snippet it’s got sensible content, but the whole point of the featured snippet is: We display featured snippets when our systems determine this format will help people more easily discover what they’re seeking, both from the description about the page and when they click on the link to read the page itself. They’re especially helpful for those on mobile or searching by voice.
So you’d be forgiven for thinking that Google has quite a high bar for what it puts into a featured snippet.
With my debugging hat on, my first hypothesis is that Google is interpreting this search query as the first line in a joke rather than as a genuine question. If that’s the case then this featured snippet is a great one to show because it builds on the joke.
But. Even if that explains the logic behind showing the snippet, it doesn’t mean that this is the best snippet to show. I’d still consider this a bug if it were in one of my systems. At the very least there should be some context to say: “if this is the first line in a joke, then here is the expected response”.
How did this joke get into the featured snippet?
Here’s the page that the Google featured snippet links to. It’s a web page showing a purported chat with ChatGPT that shows ChatGPT agreeing that there are no African countries starting with K. It’s from emergentmind.com, a website that includes lots of content about ChatGPT
I don’t know whether this is a genuine example of ChatGPT producing text that looks grammatically correct but is actually nonsense, or whether it’s a spoof that was added to emergentmind.com as a joke. But there is definitely a lot of this “African countries starting with k” content on Reddit, and we know that Reddit was used to train ChatGPT. So it’s very plausible that ChatGPT picked up this “knowledge”, but, being a language model, can’t tell whether it’s reality, fake or just a joke.
Either way, the fact that this is presented as ChatGPT text on emergentmind.com helps give it enough weight to get into a featured snippet.
One obvious lesson is don’t trust featured snippets on Google. Only last month I wrote about another featured snippet that got things wrong, this time about terms of use for LinkedIn. Use DuckDuckGo if you just want a solid search engine that finds relevant pages, no more, no less.
But this example raises some interesting food for thought ….
Food for thought for people working with LLMs:
- If you are training your model on “the entire internet”[1] then you will get lots of garbage in there
- As more and more content gets created by large language models, the garbage problem will only get worse
And food for thought for people trying to build products with LLMs:
- Creating a demo of something that looks good using LLMs is super easy, but turning it into a user-facing product that can handle all these garbage cases remains hard. Not impossible, but still hard work.
- So how do you design your product to maximize the benefits from LLMs while minimizing the downside risk when your LLM gets things wrong?[2]
I’ve written in the past about the hype cycle related to NLP. That was 4 months ago in April. Back then I was uncomfortable that people were hyping LLMs out of all proportion to their capabilities. Now it seems that we are heading towards the trough of disillusionment – with people blowing out of all proportion the negative aspects. The good news is that, if it’s taken less than 6 months to get from the peak of “Large Language Models are showing signs of sentience and they’re going to take your job” to the trough of “ChatGPT keeps getting things wrong and OpenAI is about to go under”[3], then this must mean that the plateau of productivity beckons. I think it’s pretty close (months vs years).
Hat tip to https://mastodon.online/@rodhilton@mastodon.social/110894818521176741 for the context. (Warning before you click – the joke is pretty crude and it’s arguable how funny it is).
Notes
[1] For whatever definition you have for “entire” and “internet”
[2] I saw a Yann LeCun quote (can’t find it now, sadly, so perhaps it’s apocryphal) about one company using 30 LLMs to cross-check results and decrease the risk of any one of them hallucinating. I’m sure this brute force approach can work, but there will also be other smarter ways, depending on the use case
[3] Whether OpenAI succeeds or fails as a company has very little to do with the long-term productivity gains from LLMs, in much the same way that Friendster’s demise didn’t spell the end of social networking platforms