Compelling debate on what looks like IP infringement from Gary Marcus on his Substack. It shows how an apparently trivial prompt can make MidJourney create pictures that look startlingly similar to stills from films.
Assume, for sake of the argument, that MidJourney was trained on these specific films, and that this counts as IP infringement. Seem like reasonable assumptions to me, but obviously I am not a lawyer.
Anyway, assuming the above, then who is liable?
[a] The people who train the model on content from films
[b] The person who inputs the prompt and then uses the resulting image as their own
There is a well-trodden debate about whether sites like Facebook or X are liable for the content on them. The consensus seems to be that the company is not liable for what people write on it.
Could that argument apply here, such that MidJourney is not held responsible for what human beings do with the stuff they create with it?
This is a thorny issue – a European Commission article from Feb 2023 has this to say:
the question of ownership and authorship of AI-generated works is not fully settled by the law yet, and as a “hot topic” may evolve in the years to come depending on regulatory changes and on case law.
https://intellectual-property-helpdesk.ec.europa.eu/news-events/news/intellectual-property-chatgpt-2023-02-20_en
In my view there is a qualitative difference between running a GenAI model and hosting a social network. In a social network or other website, it is really easy to differentiate between the “code” (that the company writes) and the “data” (which comes from the users). With GenAI models, the data is interwoven into the model itself. Whoever trained the AI model consciously decided what content to train it on.
My first instinct, then, is that the model developers should be responsible for any IP infringement because the model developers are the ones who chose how to incorporate data into the model. My second instinct is to wonder whether taking this position cause more problems than it solves:
[a] in a world where people use open source models, and potentially train new models on top of open source models, does it make sense to hold the original model creator accountable?
[b] If you’ve got a chain of model developers, each one taking an existing model and training it further, then how could you prove where the IP infringement took place? It is conceivable that the original source data for a model is not even accessible in any useful way any more.
[c] If you end up saying that the model developer is infringing IP when its model outputs something that looks similar to someone else’s IP, then does this have impacts about the IP ownership of anything else generated by the model? If a model developer can breach copyright by plagiarising something, then does it own the copyright to anything it creates that is not plagiarism?