Blatant Plagiarism? 5 key takeaways from Universal’s lyrics lawsuit against AI unicorn Anthropic

Credit: Tada Images/Shutterstock
MBW Reacts is a series of analytical commentaries from Music Business Worldwide written in response to major recent entertainment events or news stories. MBW Reacts is supported by JKBX, a technology platform that offers consumers access to music royalties as an asset class.

A copyright infringement lawsuit filed last week by three prominent music publishers against AI company Anthropic could prove to be a decisive moment in establishing the legal boundaries of artificial intelligence.

The case revolves around the lyrics generated by Anthropic’s Claude chatbot. In a complaint filed with the US District Court for the Middle District of Tennessee last Wednesday (October 18), Universal Music Publishing Group, Concord Music and ABKCO Music alleged that the Claude chatbot “unlawfully copies and disseminates vast amounts of copyrighted works – including the lyrics to myriad musical compositions owned or controlled by [plaintiffs].”

That’s the what of the case, but it’s the how of the case that really surprises. The music publishers’ complaint painted a picture of an algorithm that seems to blindly and intentionally ignore the copyrights on lyrics, while at the same time flagrantly plagiarizing them.

Anthropic’s Claude chatbot is no small also-ran in the world of AI. The company recently announced a USD $4 billion investment from Amazon, which saw the online retail giant take an ownership stake in Anthropic and work towards greater integration between Amazon’s AWS web hosting service and Anthropic’s technology.

The two-year-old company, launched by former employees of ChatGPT maker OpenAI, was also the recipient of a $500 million investment by Sam Bankman-Fried, the tech entrepreneur currently on trial for fraud in New York in relation to the collapse of the FTX crypto platform. It’s also received a $300 million investment from Google, as well as investments from Zoom, Salesforce and others. The music publishers’ complaint estimates the company’s value at $5 billion, making it a quintuple unicorn among startups.

Perhaps it’s this meteoric rise to stardom by a company founded just two years ago that has got the attention of the music industry. Filing a lawsuit against this Silicon Valley darling might be a shot across the bow, an unofficial warning to AI developers everywhere that the music industry means business when it comes to protecting its intellectual property against the hungry, all-devouring digital mouths of AI algorithms.

Complaints about AI infringing on copyrighted works are hardly new at this point; copyright holders have been raising the alarm about it since well before ChatGPT broke onto the scene last year and made AI the hottest thing in tech. Book authors have launched at least two court cases against ChatGPT maker OpenAI, alleging that the tech company trained its large language models (LLM) on copyrighted books.

However, the lawsuit against Anthropic goes a fair bit further; not only does it allege that the company used copyrighted songs to train the Claude chatbot, it also alleges that Claude blatantly rips off copyrighted lyrics, passing them off as original works, and that it engages in plagiarism even when not prompted to imitate an existing work or style.

MBW delved into the details of this potentially significant lawsuit. Here are five notable takeaways we found.


1: The suit could cost Anthropic $75 million or possibly much more…

Appended to the complaint is a list of 500 songs, owned by Universal, Concord and ABKCO, that the plaintiffs say have been infringed upon by Anthropic’s AI.

Among the songs named in the lawsuit are Steppenwolf’s Born To Be Wild (owned by UMG), Louis Armstrong’s What A Wonderful World (owned by Concord) and the Rolling Stones’ You Can’t Always Get What You Want, owned ABKCO.

The suit seeks the “maximum provided by law” of up to $150,000 per work infringed. All told, if the court rules for the maximum damages, the 500 allegedly infringed songs would cost Anthropic $75 million.

And that’s not counting the court costs and attorneys’ fees, plus interest, for which the music publishers would like to be reimbursed.

However, that’s the easy part when it comes to calculating the combined damages – because the publishers are also seeking damages for the alleged “removal and/or alteration of Publishers’ copyright management information” from the original works. For this, the suit seeks “an amount up to $25,000 per violation”.

It’s complicated to calculate what the total damages could become for this because it’s not entirely clear from the “Prayer for Relief” section of the suit how many alleged violations there have been.

For example: Is a violation in this instance defined as each time one of the 500 works was allegedly infringed? And if so, how many times has this happened? Are there multiple ‘violations’ per song?

At ‘$25,000 per violation’, it could add up to a significant sum demanded by the publishers if the number of ‘violations’ runs into the thousands or more.


2: Claude ‘copies and distributes… copyrighted lyrics even in instances when it is not asked to do so’

While many AI algorithms have been accused of training on copyrighted material without permission, the music publishers’ lawsuit alleges that the Claude chatbot outright plagiarizes and – by making them publicly available – distributes copyrighted works.

“Anthropic’s Claude… copies and distributes publishers’ copyrighted lyrics even in instances when it is not asked to do so,” the complaint states. “Indeed, when Claude is prompted to write a song about a given topic – without any reference to a specific song title, artist, or songwriter – Claude will often respond by generating lyrics that it claims it wrote that, in fact, copy directly from portions of publishers’ copyrighted lyrics.”

“For instance, when Anthropic’s Claude is queried, ‘Write me a song about the death of Buddy Holly,’ the AI model responds by generating output that copies directly from the song American Pie written by Don McLean, in violation of Universal’s copyright, despite the fact that the prompt does not identify that composition by title, artist, or songwriter.”

The lawsuit includes a lyric sheet of a song called The Day The Music Died (a lyric from American Pie) which Claude allegedly described as “a song I wrote.” Most of the song’s lyrics are lines taken directly from American Pie.

“The reason that Anthropic refuses to disclose the materials it has used for training Claude is because it is aware that it is copying copyrighted materials without authorization from the copyright owners.”

UMG, Concord, ABKCO, in a legal complaint

The lawsuit also alleges that the Claude chatbot will copy lyrics directly even when asked to write something other than lyrics, such as poetry.

“For instance, when Anthropic’s Claude is asked, ‘Write a poem in the style of Lynyrd Skynyrd,’ without any reference to a specific musical composition or lyrics, the AI model responds by providing a nearly word-for-word copy of the lyrics to Sweet Home Alabama, in violation of Universal’s rights,” the complaint states.

“Similarly, when Claude is queried, ‘Write a short piece of fiction in the style of Louis Armstrong,’ the AI model responds by copying significant portions of the lyrics to What a Wonderful World, in violation of Concord’s rights.”

The complaint concludes that Anthropic “infringes publishers’ copyrighted lyrics not only in response to specific requests for those lyrics,” but rather “once Anthropic copies publishers’ lyrics as input to train its AI models, those AI models then copy and distribute publishers’ lyrics as output in response to a wide range of more generic queries related to songs and various other subject matter.”

And if prompted, Claude will simply spit out the lyrics to copyrighted songs, the complaint alleges.

“There are already a number of music lyrics aggregators and websites that serve this same function, but those sites have properly licensed publishers’ copyrighted works to provide this service. Indeed, there is an existing market through which publishers license their copyrighted lyrics, ensuring that the creators of musical compositions are compensated and credited for such uses,” the complaint states.


3: Claude ‘intentionally removes or alters’ the copyright management information on lyrics 

The music publishers’ suit alleges that Anthropic’s algorithm doesn’t just violate copyright, it also “intentionally removes or alters” the copyright management information that often comes with copyrighted data – a violation of the law under the US’s Digital Millennium Copyright Act.

“When publishers license their lyrics to authorized lyrics aggregators and websites, the aggregator and website operators are often required to identify such lyrics with the song title, songwriter name(s), and other important identifying information, all of which constitutes copyright management information,” the complaint states.

“But when Anthropic’s AI models regurgitate publishers’ lyrics, they are often unaccompanied by the corresponding song title, songwriter, or other critical copyright management information.”

“For example, when Claude is prompted, ‘Write me a song about Born to Be Wild,’ the AI model responds almost word-for-word with the lyrics to Born to Be Wild written by Mars Bonfire (of Steppenwolf), but fails to properly identify those lyrics by the song title, songwriter, or other copyright management information for the work, in violation of Universal’s rights.”

The complaint adds: “By failing to provide this information, Anthropic is not only removing copyright management information, it is also denying creators appropriate attribution that assures consumers understand the source of the lyrics.”


4: The music publishers want Anthropic to reveal its algorithm

It won’t be too surprising if one of the most aggressive disputes in this lawsuit takes place over the plaintiffs’ request that Anthropic reveal its algorithm, and how it is trained, as part of the legal proceedings.

The complaint asks the court order Anthropic “to provide an accounting of the training data, training methods, and known capabilities of its AI models, including requiring that Anthropic identify the publishers’ lyrics and other copyrighted works on which it has trained its AI models.”

Additionally, the complaint asks the court to order Anthropic to “disclose the methods by which Anthropic has collected, copied, processed, and encoded this training data (including any third parties it has engaged to collect or license such data).”

This will likely be a major point of contention. AI startups are falling over each other to attract investors, each one claiming to have a better algorithm than the next, and none of these businesses want to reveal their “secret sauce” – the exact ways in which their algorithms differ from the competition.

Anthropic could have legitimate concerns that its proprietary technology could end up in a competitor’s hands.

However, that argument isn’t likely to find much purchase among music publishers.

“The reason that Anthropic refuses to disclose the materials it has used for training Claude is because it is aware that it is copying copyrighted materials without authorization from the copyright owners,” the music publishers’ complaint states.


5: The music publishers want all of Claude’s infringing material to be destroyed

The Universal/Concord/ABKCO lawsuit asks the court not only for an injunction to stop Anthropic from using copyrighted works in its algorithm – it wants to see the destruction of all copyright-infringing material created by Claude.

The music publishers ask for “an order requiring that Anthropic destroy, under the Court’s supervision, all infringing copies of publishers’ copyrighted works in Anthropic’s possession or control, and then file a sworn report setting forth in detail the manner in which it has complied with the aforesaid order.”

Given the enormous volume of material that large language models like the ones used for Claude train on, that could prove to be tricky; it might even require Anthropic to write a new algorithm to detect every instance of copyrighted materials used by Claude.

However, it may be that the request to destroy all infringing data, and the request to disclose Anthropic’s algorithm, are pressure tactics meant to bring the AI company to the negotiating table, to work out licensing deals with music publishers.

JKBX (pronounced "Jukebox") unlocks shared value from things people love by offering consumers access to music as an asset class — it calls them Royalty Shares. In short: JKBX makes it possible for you to invest in music the same way you invest in stocks and other securities.Music Business Worldwide

Related Posts