Bytes

Authors Sue OpenAI For Copyright Infringement

OpenAI trained GPT-1 using a library of more than 7,000 novels on BookCorpus; Sarah Silverman also sues Meta

Published

July 18, 2023

Image source: HBO, Sarah Silverman: We Are Miracles

A group of authors is suing OpenAI for using their novels to train its generative AI ChatGPT without their consent. As covered by The Hollywood Reporter, The lawsuit claims that, when prompted, ChatGPT generates summaries of the authors’ novels and that that’s only possible if the AI was trained on the copyrighted works. The suit aims to represent hundreds of thousands of US authors.

The lawsuit filing states, “They copied the books from a website called Smashwords.com that hosts unpublished novels that are available to readers at no cost. Those novels, however, are largely under copyright. They were copied into the BookCorpus dataset without consent, credit, or compensation to the authors.”

Shadow libraries

Additionally, the authors state that ChatGPT infringes on their copywrites by creating derivative works. OpenAI informed the public in 2018 that it taught its first iteration of its generative AI GPT-1 using a library of more than 7,000 novels on BookCorpus. According to the filing, newer versions of ChatGPT were trained on even more extensive collections of copyrighted works. In 2020, OpenAi revealed that 15 per cent of the data used to train GPT-3 came from, “Two internet-based books corpora.” The authors claim the datasets came from shadow library websites like Bibliotik.

“These flagrantly illegal shadow libraries have long been of interest to the AI-training community: for instance, an AI training dataset published in December 2020 by EleutherAI called ‘Books3′ includes a recreation of the Bibliotik collection and contains nearly 200,000 books,” writes the lawyer representing the authors, Joseph Saveri.

Sarah Silverman sues OpenAI and Meta

Famous comedian Sarah Silverman (pictured) is also suing OpenAI and Meta for copyright infringement. Silverman’s lawsuit also cites the AI’s ability to summarise her novel as proof of copyright infringement, as reported by The Verge. In addition, Silverman claims ChatGPT did not generate any of the copyright management information included with her publication. Similarly, the suit against Meta states that Meta used datasets that included copyrighted novels to train its LLaMA models.

In this article:ChatGPT / Featured / Lawsuit / Meta / OpenAI

Written By Jack Brassell

Jack Brassell is a freelance journalist and aspiring novelist. Jack is a self-proclaimed nerd with a lifelong passion for storytelling. As an author, Jack writes mostly horror and young adult fantasy. Also an avid gamer, she works as the lead news editor at Hardcore Droid. When she isn't writing or playing games, she can often be found binge-watching Parks & Rec or The Office, proudly considering herself to be a cross between Leslie Knope and Pam Beasley.

Bytes

Digital Futures Institute Festival of Storytelling announced for 2nd-4th June

The London festival will explore science fiction, games and speculative narratives, culminating in the Arthur C Clarke shortlist reveal

Dave BradleyMay 19, 2026

BeyondGames.biz

Bytes

Authors Sue OpenAI For Copyright Infringement

Shadow libraries

Sarah Silverman sues OpenAI and Meta

You May Also Like

Bytes

Digital Futures Institute Festival of Storytelling announced for 2nd-4th June

Subscribe to the future

Popular reading