WHAT’S HAPPENED?
Comedian Sarah Silverman’s lawsuit against ChatGPT maker OpenAI is considered by many to be a test case of how courts will interpret copyright law in the age of AI.
The case could be especially significant for entertainment rightsholders, because one of its key goals is to have a US federal court rule that when an AI algorithm is trained on copyrighted material without the copyright owner’s permission, that in itself is an infringement of copyright.
Without a doubt, the world’s major music recording companies and publishers would like the courts to see it that way, as would many musical artists.
“Much of the latest generative AI is trained on copyrighted material, which clearly violates artists’ and labels’ rights,” Universal Music Group (UMG) Chairman and CEO Sir Lucian Grainge commented earlier this year.
There’s a simple reason why Grainge is making that argument: It’s the most straightforward way to ensure that generative AI technologies won’t plagiarize or mimic copyrighted music.
Large language models – the AI algorithms behind AI chatbots like ChatGPT and others – have to be trained on extremely large volumes of material in order to be able to mimic human language in a way that is useful to users. And since ChatGPT was released late last year, it’s become obvious that some of the material used to train OpenAI’s algorithm is copyrighted.
However, OpenAI’s recent response to the complaint filed by Silverman and her co-plaintiffs suggests that getting the courts to agree with Grainge’s take on the issue might not be a simple task.
Although it’s impossible to predict how the case will play out, OpenAI’s response to the case is a window into how the artificial intelligence industry plans to defend itself against accusations of copyright infringement – and it’s shaping up to be an epic legal battle.
WHAT’S THE CONTEXT?
Sarah Silverman, a prominent comedian and author of the 2010 book The Bedwetter, filed a complaint against OpenAI in the US District Court for the Northern District of California, San Francisco division, in July of this year.
Joining her in the suit were two other published writers: Christopher Golden, author of Ararat, and Richard Kadrey, author of Sandman Slim.
(At the same time, the lawyers for Silverman, Golden, and Kadrey filed an almost identical complaint against OpenAI on behalf of two other writers – Paul Tremblay, who penned The Cabin at the End of the World, and Mona Awad, author of 13 Ways of Looking at a Fat Girl and Bunny. They also filed a separate lawsuit against Facebook owner Meta Platforms, over its own AI algorithm, dubbed LLaMA, over its alleged use of those copyrighted works.)
The key argument in the complaints against OpenAI is that, because its AI algorithm trained on those copyrighted works, the AI’s output – that is, the things that ChatGPT tells users when they interact with it – is a “derivative work” of those copyrighted works, and therefore a copyright infringement.
“Much of the material in OpenAI’s training datasets… comes from copyrighted works – including books written by plaintiffs – that were copied by OpenAI without consent, without credit, and without compensation,” states the Silverman complaint, which can be read in full here.
“When ChatGPT was prompted to summarize books written by each of the plaintiffs, it generated very accurate summaries,” the complaint continued, adding that this means that “ChatGPT retains knowledge of particular works in the training dataset and is able to output similar textual content. At no point did ChatGPT reproduce any of the copyright management information plaintiffs included with their published works.”
It continues: “Because the OpenAI Language Models cannot function without the expressive information extracted from plaintiffs’ works (and others) and retained inside them, the OpenAI Language Models are themselves infringing derivative works, made without plaintiffs’ permission and in violation of their exclusive rights under the Copyright Act.”
“Much of the latest generative AI is trained on copyrighted material, which clearly violates artists’ and labels’ rights.”
Sir Lucian Grainge, Universal Music Group
On the basis of that argument, the legal complaints ask the court to certify the cases as class-action lawsuits, with anyone whose copyrighted works were used to train ChatGPT eligible to join the suit.
On August 28, lawyers for OpenAI filed a response to the complaints, in the form of motions to dismiss most of the counts in the two lawsuits.
The claims brought by Silverman and the other authors “misconceive the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence,” stated the motion, which can be read in full here.
“The constitutional purpose of copyright is ‘[t]o promote the Progress of Science and useful Arts’,” it added, citing the intellectual property clause of the US Constitution.
The motion doesn’t seek to dismiss the entire case brought by Silverman – it leaves intact the first and core count, that of direct copyright infringement. OpenAI’s lawyers state in the document that they plan to dispute that claim directly in court.
But they seek to dismiss five other counts, among them “vicarious copyright infringement,” violations of the US’s Digital Millennium Copyright Act, negligence and unjust enrichment.
“Much of the material in OpenAI’s training datasets… comes from copyrighted works – including books written by plaintiffs – that were copied by OpenAI without consent, without credit, and without compensation.”
Legal complaint filed on behalf of Sarah Silverman
The arguments OpenAI’s lawyers set out for seeking dismissal of these counts are technical points of law; what’s really interesting here is that, in disputing those points, they opened a window into how they plan to fight the case against the authors.
They go straight for the jugular, attacking the core claim in the authors’ suit that what ChatGPT produces is a “derivative” of copyrighted works.
“Plaintiffs’ claims for vicarious infringement are based on the erroneous legal conclusion that every single ChatGPT output is necessarily an infringing ‘derivative work’ – which is a very specific term in copyright law – because those outputs are, in only a remote and colloquial sense, ‘based on’ an enormous training dataset that allegedly included plaintiffs’ books.
“The Ninth Circuit has rejected such an expansive conception of the ‘derivative work’ right as ‘frivolous,’ holding that a derivative work claim requires a showing that the accused work shares copyright-protected, expressive elements with the original. Plaintiffs’ contrary theory is simply incorrect, and would be unworkable were it not.”
(The Ninth Circuit is the court that would hear any appeal of the two lawsuits brought against OpenAI.)
“Plaintiffs’ claims for vicarious infringement are based on the erroneous legal conclusion that every single ChatGPT output is necessarily an infringing ‘derivative work’.”
Motion to dismiss filed on behalf of OpenAI
OpenAI’s lawyers argued that the fact that ChatGPT was able to offer accurate summaries of the authors’ copyrighted books doesn’t amount to copyright infringement, because it doesn’t meet the “substantial similarity” test that courts have created. That is, the motion argues that a summary of a book isn’t “substantially similar” to the book itself.
And even if it were, courts have ruled that it’s not an infringement of copyright to use a copyrighted work for the purposes of creating a new non-infringing product, the OpenAI motion argued – that new product being ChatGPT itself.
“You can start to see the story that they’re going to tell here, which is that copyright has limitations to it. It doesn’t extend to facts and ideas,” Gregory Leighton, a technology transactions, privacy and security lawyer at law firm Polsinelli, told VentureBeat.
“Even if a work is copyright[ed] and an LLM [is] processing it or then producing a summary of it back or something like that, that’s not a derivative work on its face.”
WHAT HAPPENS NEXT?
Before the court rules on the motions to dismiss, Silverman’s lawyers will be given the opportunity to respond and make their argument for why their claims should remain a part of the case.
That moment, which is likely still weeks or months away, will provide observers with the next set of insights into this legal battle; we’ll see just what kinds of arguments Silverman’s lawyers bring to the table.
Yet what seems clear, from both the original complaint and from OpenAI’s response to it, is that the case will center around the concept of fair use. OpenAI’s lawyers have made it clear that the doctrine of fair use will be central to their argument that training AI algorithms on copyrighted material doesn’t amount to infringement.
In US law, there are four elements in determining whether a particular use of a copyrighted work is “fair” or not:
- The purpose and character of the use
- The nature of the copyrighted work
- The amount and substantiality of the portion taken
- The effect of the use on the value of the copyrighted work
The authors’ lawyers will likely have their best shot with the last two – the amount of the work used, and its effect on the value of the copyrighted works. They could argue that using the entire books for training the AI amounts to taking far too much of the work, and they could argue that allowing AI chatbots to summarize the work could reduce demand for the original work, reducing sales.
On the other hand, OpenAI’s lawyers would have the strongest case with the first two – the purpose and character of the use, and the nature of the copyrighted work. That is, OpenAI isn’t copying the book and giving the public access to it; it’s using it internally to train a piece of software, which then powers a chatbot that is very different from a written book.
The only thing that’s truly certain about this case going forward is that it will be wading into uncharted legal waters.
A FINAL THOUGHT…
The idea that courts could potentially allow AI algorithms to scan copyrighted works as part of their training is certainly a source of concern for rights holders, but even if courts side with OpenAI in these cases, it’s not necessarily the end of the legal road for rights holders – nor will it mean open season for anyone who wants to use AI to mimic copyrighted works.
For one thing, OpenAI’s case relies on the notion that using these copyrighted books to train AI doesn’t result in the creation of “substantially similar” works to the copyrighted books. If courts accept this argument, it would mean that AI could still be prohibited from creating works that are substantially similar to copyrighted works.
Take, for instance, the AI-generated “fake Drake” track that went viral earlier this year. That track, which was obviously meant to imitate the vocals of artists Drake and The Weeknd – both signed to labels owned by UMG – could still be seen as a copyright violation under the “substantial similarity” rule.
(Though that would be more certain if the US had a federal right of publicity law, which it doesn’t. Such a law would expressly protect artists’ identity against appropriation.)
So while rights holders may not succeed in stopping an AI algorithm from training on their copyrighted works, if that algorithm is used to create obvious rip-offs, they could still hold the creators (and possibly the algorithm’s makers) to account under the law.
Finally, it may be worth considering what the next steps should be for rights holders such as music companies if the floodgates of AI-generated music really are opened.
If protection against “derivative works” does prove hard to come by under the law, then music companies may want to consider a commercial approach. It might not be possible to stop developers from offering “Taylor Swift filters” for AI music-making apps, but only UMG and Republic Records will be able to offer the “official” Taylor Swift music filter.
The creation of scarcity – already a well-known technique in marketing – could also play a role. “Only 1,000 of these official Taylor Swift AI music filters will be made available to the public.”
In the midst of the AI boom, the only certainty is that adapting to the new reality will require some serious creativity — and not just on the part of lawyers.