December | 2023 | Be good, work hard, get lucky

Compelling debate on what looks like IP infringement from Gary Marcus on his Substack. It shows how an apparently trivial prompt can make MidJourney create pictures that look startlingly similar to stills from films.

Assume, for sake of the argument, that MidJourney was trained on these specific films, and that this counts as IP infringement. Seem like reasonable assumptions to me, but obviously I am not a lawyer.

Anyway, assuming the above, then who is liable?

[a] The people who train the model on content from films

[b] The person who inputs the prompt and then uses the resulting image as their own

There is a well-trodden debate about whether sites like Facebook or X are liable for the content on them. The consensus seems to be that the company is not liable for what people write on it.

Could that argument apply here, such that MidJourney is not held responsible for what human beings do with the stuff they create with it?

This is a thorny issue – a European Commission article from Feb 2023 has this to say:

the question of ownership and authorship of AI-generated works is not fully settled by the law yet, and as a “hot topic” may evolve in the years to come depending on regulatory changes and on case law.
https://intellectual-property-helpdesk.ec.europa.eu/news-events/news/intellectual-property-chatgpt-2023-02-20_en

In my view there is a qualitative difference between running a GenAI model and hosting a social network. In a social network or other website, it is really easy to differentiate between the “code” (that the company writes) and the “data” (which comes from the users). With GenAI models, the data is interwoven into the model itself. Whoever trained the AI model consciously decided what content to train it on.

My first instinct, then, is that the model developers should be responsible for any IP infringement because the model developers are the ones who chose how to incorporate data into the model. My second instinct is to wonder whether taking this position cause more problems than it solves:

[a] in a world where people use open source models, and potentially train new models on top of open source models, does it make sense to hold the original model creator accountable?

[b] If you’ve got a chain of model developers, each one taking an existing model and training it further, then how could you prove where the IP infringement took place? It is conceivable that the original source data for a model is not even accessible in any useful way any more.

[c] If you end up saying that the model developer is infringing IP when its model outputs something that looks similar to someone else’s IP, then does this have impacts about the IP ownership of anything else generated by the model? If a model developer can breach copyright by plagiarising something, then does it own the copyright to anything it creates that is not plagiarism?

I recently heard Saad Ansari (former Director of AI at Jasper) speaking about how he sees forthcoming evolution in the GenAI space. It was really thought provoking so here are some notes of mine from the talk.

He sees four key use cases for GenAI:

Co-piloting. Github users are already familiar with an AI tool called co-pilot. More generally, you can think of ChatGPT or similar as a co-pilot who is there to help you do your tasks, whether that is by drafting an email or a job spec for you or helping you learn a new topic of prepare for a meeting.
Personalization
Bringing everyone the “power of Pixar”
Robotics (both virtual agents and physical robots)

I’m going to go into a bit more on the personalization piece.

Go back far enough and the internet was all about search¹. You go to Google and get “about 8,400,000,000 results (0.35 seconds)”. Then you scan page 1 and possibly 2 to see if there’s anything relevant.

Then, over time, things become more personalised for the user. One high profile example was the Netflix Prize from 2009. This was a competition with a $1m prize to use machine learning to improve Netflix’s recommendation algorithm (“if you liked show X then probably you will like shows Y and Z”). At the time this ML work was pretty groundbreaking.

Now with GenAI we are in a new world again. In this world new things can be created to the user’s taste. Saad used the words “synthesis” and “remixing” to describe this. The GenAI models have seen enormous amounts of text, images, audio etc in their training which they can use to synthesise new things. They are like a music producer doing a remix. From their training data they can make something that is just what the user is interested in, has never fully existed before, but is similar to what it has been trained on.

What does this sea change in personalization mean for future disruption?

From this perspective, Saad believes, someone like Adobe or TurboTax is safe. It’s easier for them to enhance their products with GenAI than it is for a new GenAI entrant to add the core features that companies like this have.

On the other hand, someone like Amazon might not be safe. A more personalized shopping service could well disrupt them. Imagine a service like:

You upload some photos of your family
Based on the photos an AI figures out your interests
It gives you some ideas of local activities to do nearby
And gives you some links to things you might want to buy

Be honest, it sounds pretty realistic, doesn’t it?

Notes

Or you could go back a bit further to the dark days of domain dipping but it’s the same principle ↩︎

Be good, work hard, get lucky

Be good, work hard, get lucky

Month: December 2023

IP infringement in the world of GenAI

Where is the GenAI disruption going to happen?

Notes