Connect with us

Hi, what are you looking for?

Bytes

Authors Sue OpenAI For Copyright Infringement

OpenAI trained GPT-1 using a library of more than 7,000 novels on BookCorpus; Sarah Silverman also sues Meta

Image source: HBO, Sarah Silverman: We Are Miracles

A group of authors is suing OpenAI for using their novels to train its generative AI ChatGPT without their consent. As covered by The Hollywood Reporter, The lawsuit claims that, when prompted, ChatGPT generates summaries of the authors’ novels and that that’s only possible if the AI was trained on the copyrighted works. The suit aims to represent hundreds of thousands of US authors.

The lawsuit filing states, “They copied the books from a website called Smashwords.com that hosts unpublished novels that are available to readers at no cost. Those novels, however, are largely under copyright. They were copied into the BookCorpus dataset without consent, credit, or compensation to the authors.”

Shadow libraries

Additionally, the authors state that ChatGPT infringes on their copywrites by creating derivative works. OpenAI informed the public in 2018 that it taught its first iteration of its generative AI GPT-1 using a library of more than 7,000 novels on BookCorpus. According to the filing, newer versions of ChatGPT were trained on even more extensive collections of copyrighted works. In 2020, OpenAi revealed that 15 per cent of the data used to train GPT-3 came from, “Two internet-based books corpora.” The authors claim the datasets came from shadow library websites like Bibliotik.

“These flagrantly illegal shadow libraries have long been of interest to the AI-training community: for instance, an AI training dataset published in December 2020 by EleutherAI called ‘Books3′ includes a recreation of the Bibliotik collection and contains nearly 200,000 books,” writes the lawyer representing the authors, Joseph Saveri.

Sarah Silverman sues OpenAI and Meta

Famous comedian Sarah Silverman (pictured) is also suing OpenAI and Meta for copyright infringement. Silverman’s lawsuit also cites the AI’s ability to summarise her novel as proof of copyright infringement, as reported by The Verge. In addition, Silverman claims ChatGPT did not generate any of the copyright management information included with her publication. Similarly, the suit against Meta states that Meta used datasets that included copyrighted novels to train its LLaMA models.

Written By

Jack Brassell is a freelance journalist and aspiring novelist. Jack is a self-proclaimed nerd with a lifelong passion for storytelling. As an author, Jack writes mostly horror and young adult fantasy. Also an avid gamer, she works as the lead news editor at Hardcore Droid. When she isn't writing or playing games, she can often be found binge-watching Parks & Rec or The Office, proudly considering herself to be a cross between Leslie Knope and Pam Beasley.

You May Also Like

Level Up

Eager to be at the metaverse frontier, but not sure how to get started? As exciting as the idea of a shared digital space...

Bytes

New blockchain gaming platform based on Unreal Engine 5.

Bytes

The record for the most expensive land sale in the metaverse has just been raised

Bytes

Voice suppression tech prevents the real world from overhearing your in-metaverse conversations

Advertisement
Advertisement

Subscribe to the future

Advertisement