Machine Learning, Technology Adoption

Productizing AI is Hard: Part 94

This is a story about how a crude internet joke got from Reddit to ChatGPT to the top of Google.

Here is what you get currently if you ask Google “Are there any African countries starting with K”

The featured snippet is obviously nonsense. Once you get past the featured snippet it’s got sensible content, but the whole point of the featured snippet is: We display featured snippets when our systems determine this format will help people more easily discover what they’re seeking, both from the description about the page and when they click on the link to read the page itself. They’re especially helpful for those on mobile or searching by voice.

So you’d be forgiven for thinking that Google has quite a high bar for what it puts into a featured snippet.

With my debugging hat on, my first hypothesis is that Google is interpreting this search query as the first line in a joke rather than as a genuine question. If that’s the case then this featured snippet is a great one to show because it builds on the joke.

But. Even if that explains the logic behind showing the snippet, it doesn’t mean that this is the best snippet to show. I’d still consider this a bug if it were in one of my systems. At the very least there should be some context to say: “if this is the first line in a joke, then here is the expected response”.

How did this joke get into the featured snippet?

Here’s the page that the Google featured snippet links to. It’s a web page showing a purported chat with ChatGPT that shows ChatGPT agreeing that there are no African countries starting with K. It’s from emergentmind.com, a website that includes lots of content about ChatGPT

I don’t know whether this is a genuine example of ChatGPT producing text that looks grammatically correct but is actually nonsense, or whether it’s a spoof that was added to emergentmind.com as a joke. But there is definitely a lot of this “African countries starting with k” content on Reddit, and we know that Reddit was used to train ChatGPT. So it’s very plausible that ChatGPT picked up this “knowledge”, but, being a language model, can’t tell whether it’s reality, fake or just a joke.

Either way, the fact that this is presented as ChatGPT text on emergentmind.com helps give it enough weight to get into a featured snippet.

One obvious lesson is don’t trust featured snippets on Google. Only last month I wrote about another featured snippet that got things wrong, this time about terms of use for LinkedIn. Use DuckDuckGo if you just want a solid search engine that finds relevant pages, no more, no less.

But this example raises some interesting food for thought ….

Food for thought for people working with LLMs:

  1. If you are training your model on “the entire internet”[1] then you will get lots of garbage in there
  2. As more and more content gets created by large language models, the garbage problem will only get worse

And food for thought for people trying to build products with LLMs:

  1. Creating a demo of something that looks good using LLMs is super easy, but turning it into a user-facing product that can handle all these garbage cases remains hard. Not impossible, but still hard work.
  2. So how do you design your product to maximize the benefits from LLMs while minimizing the downside risk when your LLM gets things wrong?[2]

I’ve written in the past about the hype cycle related to NLP. That was 4 months ago in April. Back then I was uncomfortable that people were hyping LLMs out of all proportion to their capabilities. Now it seems that we are heading towards the trough of disillusionment – with people blowing out of all proportion the negative aspects. The good news is that, if it’s taken less than 6 months to get from the peak of “Large Language Models are showing signs of sentience and they’re going to take your job” to the trough of “ChatGPT keeps getting things wrong and OpenAI is about to go under”[3], then this must mean that the plateau of productivity beckons. I think it’s pretty close (months vs years).

Hat tip to https://mastodon.online/@rodhilton@mastodon.social/110894818521176741 for the context. (Warning before you click – the joke is pretty crude and it’s arguable how funny it is).

Notes

[1] For whatever definition you have for “entire” and “internet”

[2] I saw a Yann LeCun quote (can’t find it now, sadly, so perhaps it’s apocryphal) about one company using 30 LLMs to cross-check results and decrease the risk of any one of them hallucinating. I’m sure this brute force approach can work, but there will also be other smarter ways, depending on the use case

[3] Whether OpenAI succeeds or fails as a company has very little to do with the long-term productivity gains from LLMs, in much the same way that Friendster’s demise didn’t spell the end of social networking platforms

Anthropology

Knowing Your Onions: Adventures in cross-language misunderstandings

I recently bought a recipe book in Greek. It was a result of spending months looking for a great Kleftiko recipe – eventually I found one that worked with a few small modifications. Anyway, for anyone who wants the recipe this is the one from Akis Petretzikis – but I switch leg of lamb for shoulder. I wonder if lamb cuts vary from country to country – are they leaner in UK than in Greece?

I made another recipe and got one of the ingredients a bit wrong. Here’s why.

‘κρεμμύδι’ (kremidhi) is “onion”. The ending ‘-ακι’ (aki) means “little”, so ‘κρεμμύδακι’ would be a small onion. ‘φρέσκο’ (fresko) means “fresh”, so ‘φρέσκο κρεμμύδακι’ would mean “little fresh onion”. Except, it turns out, it means “spring onion”. No wonder my meal looked a bit different to what the picture in the recipe book showed. I knew the words individually, but not when they were put together.

It made me wonder what it must be like to be working in an English speaking organization if English isn’t your first language (or even if it is). Even if you understand the individual words, there must be so much scope for misunderstanding when you put words together.

Here is the Google Translate evidence for your delectation:

Software Development

Rip it up and start again, without ripping it up and starting again

Time for another update on my side project, Syracuse: http://syracuse-1145.herokuapp.com/

I got some feedback that the structure of the relationships was a bit unintuitive. Fair enough, let’s update it to make more sense.

Previously code was doing the following:

  1. Use RoBERTa-based LLM for entity extraction (the code is pretty old but works well)
  2. Use Benepar dependency parsing to link up relevant entities with each other
  3. Use FlanT5 LLM to dig out some more useful content from the text
  4. Build up an RDF representation of the data
  5. Clean up the RDF

Step 5 had got quite complex.

Also, I had a look at import/export of RDF in graphs – specifically Neo4J, but couldn’t see much excitement about RDF. I even made a PR to update some of the Neo4J / RDF documentation. It’s been stuck for 2+ months.

I wondered if a better approach would be to start again using a different set of technologies. Specifically,

  1. Falcon7B instead of FlanT5
  2. Just building the representation in a graph rather than using RDF

Falcon7B was very exciting to get a chance to try out. But in my use case it wasn’t any more useful than FlanT5.

Going down the graph route was a bit of fun for a while. I’ve used networkx quite a bit in the past so thought I’d try with that first. But, guess what, it turned out more complicated than I needed. Also I do like the simplicity and elegance of RDF, even if it makes me seem a bit, old.

So the final choice was to rip up all my post-processing and turn it into pre-processing, and then generate the RDF. It was heart-breaking to throw away a lot of code, but, as programmers, I think we know when we’ve built something that is just too brittle and needs some heavy refactoring. It worked well in the end, see the git stats below:

  • code: 6 files change: 449 insertions, 729 deletions
  • tests: 81 files changed, 3618 insertions, 1734 deletions

Yes, a lot of tests. It’s a data-heavy application so there are a lot of tests to make sure that data is transformed as expected. Whenever it doesn’t work, I add that data (or enough of it) as a test case and then fix it. Most of this test data was just changed with global find/replace so it’s not a big overhead to maintain. But having all those tests was crucial for doing any meaningful refactoring.

On the code side, it was very satisfying to remove more code than I was adding. It just showed how brittle and convoluted the codebase had become. As I discovered more edge cases I added more logic to deal with them. Eventually this ended up as lots of complexity. The new code is “cleaner”. I put clean in quotes because there is still a lot of copy/paste in there and similar functions doing similar things. This is because I like to follow “Make it work, make it right, make it fast“. Code that works but isn’t super elegant is going to be easier to maintain/fix/re-factor later than code that is super-abstracted.

Some observations on the above:

  1. Tests are your friend (obviously)
  2. Expect to need major refactoring in the future. However well you capture all the requirements now, there will be plenty that have not yet been captured, and plenty of need for change
  3. Shiny new toys aren’t always going to help – approach with caution
  4. Sometimes the simple old-fashioned technologies are just fine
  5. However bad you think an app is, there is probably still 80% in there that is good, so beware completely starting from scratch.

See below for the RDF as it stands now compared to before:

Current version

@prefix ns1: <http://example.org/test/> .
@prefix org: <http://www.w3.org/ns/org#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://example.org/test/abc/Core_Scientific> a org:Organization ;
    ns1:basedInRawLow "USA" ;
    ns1:buyer <http://example.org/test/abc/Stax_Digital_Assets_Acquisition> ;
    ns1:description "Artificial Intelligence and Blockchain technologies" ;
    ns1:foundName "Core Scientific" ;
    ns1:industry "Artificial Intelligence and Blockchain technologies" ;
    ns1:name "Core Scientific" .

<http://example.org/test/abc/Stax_Digital> a org:Organization ;
    ns1:basedInRawLow "LLC" ;
    ns1:description "blockchain mining" ;
    ns1:foundName "Stax Digital, LLC",
        "Stax Digital." ;
    ns1:industry "blockchain mining" ;
    ns1:name "Stax Digital" .

<http://example.org/test/abc/Stax_Digital_Assets_Acquisition> a ns1:CorporateFinanceActivity ;
    ns1:activityType "acquisition" ;
    ns1:documentDate "2022-11-28T05:06:07.000008"^^xsd:dateTime ;
    ns1:documentExtract "Core Scientific (www.corescientific.com) has acquired assets of Stax Digital, LLC, a specialist blockchain mining company with extensive product development experience and a strong track record of developing enterprise mining solutions for GPUs.",
        "Core Scientific completes acquisition of Stax Digital." ;
    ns1:foundName "acquired",
        "acquisition" ;
    ns1:name "acquired",
        "acquisition" ;
    ns1:status "completed" ;
    ns1:targetDetails "assets" ;
    ns1:targetEntity <http://example.org/test/abc/Stax_Digital> ;
    ns1:targetName "Stax Digital" ;
    ns1:whereRaw "llc" .

Previous version

@prefix ns1: <http://example.org/test/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://example.org/test/abc/Core_Scientific> a <http://www.w3.org/ns/org#Organization> ;
    ns1:basedInLow "USA" ;
    ns1:description "Artificial Intelligence and Blockchain technologies" ;
    ns1:foundName "Core Scientific" ;
    ns1:industry "Artificial Intelligence and Blockchain technologies" ;
    ns1:name "Core Scientific" ;
    ns1:spender <http://example.org/test/abc/Purchase_Stax_Digital> .

<http://example.org/test/abc/Acquired_Assets_Stax_Digital_Llc> a ns1:TargetDetails ;
    ns1:label "Assets" ;
    ns1:name "Acquired Assets Stax Digital, LLC" ;
    ns1:nextEntity "Stax Digital, LLC" ;
    ns1:previousEntity "acquired" ;
    ns1:targetEntity <http://example.org/test/abc/Stax_Digital> .

<http://example.org/test/abc/Purchase_Stax_Digital> a ns1:Activity ;
    ns1:activityType "Purchase" ;
    ns1:documentDate "2022-11-28T05:06:07.000008"^^xsd:dateTime ;
    ns1:documentExtract "Core Scientific (www.corescientific.com) has acquired assets of Stax Digital, LLC, a specialist blockchain mining company with extensive product development experience and a strong track record of developing enterprise mining solutions for GPUs.",
        "Core Scientific completes acquisition of Stax Digital." ;
    ns1:label "Acquired",
        "Acquisition" ;
    ns1:name "Purchase Stax Digital" ;
    ns1:targetDetails <http://example.org/test/abc/Acquired_Assets_Stax_Digital_Llc> ;
    ns1:whenRaw "has happened, no date available" .

<http://example.org/test/abc/Stax_Digital> a <http://www.w3.org/ns/org#Organization> ;
    ns1:basedInLow "LLC" ;
    ns1:description "blockchain mining" ;
    ns1:foundName "Stax Digital, LLC",
        "Stax Digital." ;
    ns1:industry "blockchain mining" ;
    ns1:name "Stax Digital" .

DIFF
1a2
> @prefix org: <http://www.w3.org/ns/org#> .
4,5c5,7
< <http://example.org/test/abc/Core_Scientific> a <http://www.w3.org/ns/org#Organization> ;
<     ns1:basedInLow "USA" ;
---
> <http://example.org/test/abc/Core_Scientific> a org:Organization ;
>     ns1:basedInRawLow "USA" ;
>     ns1:buyer <http://example.org/test/abc/Stax_Digital_Assets_Acquisition> ;
9,10c11
<     ns1:name "Core Scientific" ;
<     ns1:spender <http://example.org/test/abc/Purchase_Stax_Digital> .
---
>     ns1:name "Core Scientific" .
12,31c13,14
< <http://example.org/test/abc/Acquired_Assets_Stax_Digital_Llc> a ns1:TargetDetails ;
<     ns1:label "Assets" ;
<     ns1:name "Acquired Assets Stax Digital, LLC" ;
<     ns1:nextEntity "Stax Digital, LLC" ;
<     ns1:previousEntity "acquired" ;
<     ns1:targetEntity <http://example.org/test/abc/Stax_Digital> .
< 
< <http://example.org/test/abc/Purchase_Stax_Digital> a ns1:Activity ;
<     ns1:activityType "Purchase" ;
<     ns1:documentDate "2022-11-28T05:06:07.000008"^^xsd:dateTime ;
<     ns1:documentExtract "Core Scientific (www.corescientific.com) has acquired assets of Stax Digital, LLC, a specialist blockchain mining company with extensive product development experience and a strong track record of developing enterprise mining solutions for GPUs.",
<         "Core Scientific completes acquisition of Stax Digital." ;
<     ns1:label "Acquired",
<         "Acquisition" ;
<     ns1:name "Purchase Stax Digital" ;
<     ns1:targetDetails <http://example.org/test/abc/Acquired_Assets_Stax_Digital_Llc> ;
<     ns1:whenRaw "has happened, no date available" .
< 
< <http://example.org/test/abc/Stax_Digital> a <http://www.w3.org/ns/org#Organization> ;
<     ns1:basedInLow "LLC" ;
---
> <http://example.org/test/abc/Stax_Digital> a org:Organization ;
>     ns1:basedInRawLow "LLC" ;
36a20,34
> 
> <http://example.org/test/abc/Stax_Digital_Assets_Acquisition> a ns1:CorporateFinanceActivity ;
>     ns1:activityType "acquisition" ;
>     ns1:documentDate "2022-11-28T05:06:07.000008"^^xsd:dateTime ;
>     ns1:documentExtract "Core Scientific (www.corescientific.com) has acquired assets of Stax Digital, LLC, a specialist blockchain mining company with extensive product development experience and a strong track record of developing enterprise mining solutions for GPUs.",
>         "Core Scientific completes acquisition of Stax Digital." ;
>     ns1:foundName "acquired",
>         "acquisition" ;
>     ns1:name "acquired",
>         "acquisition" ;
>     ns1:status "completed" ;
>     ns1:targetDetails "assets" ;
>     ns1:targetEntity <http://example.org/test/abc/Stax_Digital> ;
>     ns1:targetName "Stax Digital" ;
>     ns1:whereRaw "llc" .