We’ve filed law­suits chal­leng­ing Chat­GPT and LLaMA, indus­trial-strength pla­gia­rists that vio­late the rights of book authors.
Because AI needs to be fair & eth­i­cal for every­one.

June 28, 2023

Hello. This is Joseph Saveri and Matthew Butterick. In Novem­ber 2022, we teamed up to file a law­suit chal­leng­ing GitHub Copi­lot, an AI cod­ing assis­tant built on unprece­dented open-source soft­ware piracy. In Jan­u­ary 2023, we filed a law­suit chal­leng­ing Sta­ble Dif­fu­sion, an AI image gen­er­a­tor built on the heist of five bil­lion dig­i­tal images.

Since the release of OpenAI’s Chat­GPT sys­tem in March 2023, we’ve been hear­ing from writ­ers, authors, and pub­lish­ers who are con­cerned about its uncanny abil­ity to gen­er­ate text sim­i­lar to that found in copy­righted tex­tual mate­ri­als, includ­ing thou­sands of books.

Today, on behalf of two won­der­ful book authorsPaul Trem­blay and Mona Awad—we’ve filed a class-action law­suit against OpenAI chal­leng­ing Chat­GPT and its under­ly­ing large lan­guage mod­els, GPT-3.5 and GPT-4, which remix the copy­righted works of thou­sands of book authors—and many oth­ers—with­out con­sent, com­pen­sa­tion, or credit.

Today’s fil­ings:

It’s a great plea­sure to stand up on behalf of authors and con­tinue the vital con­ver­sa­tion about how AI will coex­ist with human cul­ture and cre­ativ­ity.

July 7, 2023

We’ve filed a sec­ond class-action law­suit against OpenAI on behalf of three more won­der­ful book authorsSarah Sil­ver­man, Chris Golden, and Richard Kadrey. The claims are oth­er­wise sim­i­lar to the ini­tial com­plaint filed on June 28.

Today’s fil­ings against OpenAI:

On behalf of the same three plain­tiffs, we’ve also filed an ini­tial class-action law­suit against Meta chal­leng­ing LLaMA, a set of large lan­guage mod­els trained in part on copy­righted books.

Today’s fil­ings against Meta:

The defen­dants

OpenAI

OpenAI, founded by Elon Musk and Sam Alt­man in 2015, is based in San Fran­cisco. Accord­ing to Alt­man, the two started OpenAI as a non­profit ven­ture “to develop a human pos­i­tive AI … freely owned by the world.”

OpenAI’s sta­tus as a non­profit was crit­i­cal to its ini­tial posi­tion­ing. As Alt­man said then, “any­thing [OpenAI] devel­ops will be avail­able to every­one.” Why? Because they believed this approach was “the actual best thing for the future of human­ity.”

When asked about the pos­si­bil­ity of mis­use by a hypo­thet­i­cal Dr. Evil fig­ure—the guy who wanted to “hold the world ran­som for 100 bil­lion dol­lars”—Alt­man con­tended that OpenAI was the anti­dote: “if Dr. Evil gets [a pow­er­ful AI] and there is noth­ing to coun­ter­act it, then we’re really in a bad place.”

Even then, how­ever, there was more to the Alt­man–Musk com­bi­na­tion than their deeply held com­mit­ment to put human­ity above profit. Musk, as CEO of Tesla, and Alt­man, then serv­ing as pres­i­dent of startup incu­ba­tor Y Com­bi­na­tor, planned to pool data from their respec­tive com­pa­nies for use as train­ing data in future OpenAI sys­tems. Accord­ing to report­ing at the time, Alt­man claimed that “Y Com­bi­na­tor com­pa­nies will share their data with OpenAI” and that this data would be “pair[ed] … with Tesla’s” so that OpenAI would have access to train­ing data suf­fi­cient to “rival Google.”

The hon­ey­moon would end. In 2018, Musk left OpenAI amidst dis­putes with Alt­man over its direc­tion. In 2019, Alt­man reversed him­self on OpenAI’s non­profit purity and cre­ated a for-profit sub­sidiary. Later that year, OpenAI took a $1 bil­lion invest­ment from Microsoft. By Jan­u­ary 2023, Microsoft had pro­gres­sively increased its invest­ment to $13 bil­lion.

In Feb­ru­ary 2023, Musk sharply crit­i­cized OpenAI, say­ing that it “was cre­ated as an open source … non-profit com­pany” but had “become a closed source, max­i­mum-profit com­pany effec­tively con­trolled by Microsoft.” By March 2023, Alt­man’s new “grand idea” was that “OpenAI will cap­ture much of the world’s wealth”. How much? Alt­man sug­gested “$100 bil­lion, $1 tril­lion, $100 tril­lion.” The total United States money sup­ply, accord­ing to the broad­est mea­sure (called M2), cur­rently sits at roughly $20.8 tril­lion.

Meta

Meta is a maker of vir­tual-real­ity prod­ucts, includ­ing Hori­zon Worlds. Meta also sells adver­tis­ing on its web­sites Face­book, Insta­gram, and WhatsApp. In 2019, Meta (then known as Face­book) was fined $5 bil­lion by the FTC for pri­vacy vio­la­tions aris­ing from improper han­dling of per­sonal data. More recently, Meta was fined €1.3 bil­lion by the EU, also for pri­vacy vio­la­tions aris­ing from improper han­dling of per­sonal data.

Since 2013, Meta has oper­ated an AI research lab, called Meta AI, founded by Mark Zucker­berg and Yann LeCun. Though in the past, Meta has used AI to spread fake news and hate speech, cur­rently Meta AI is work­ing on apply­ing AI tech­nol­ogy to sell­ing adver­tis­ing.

In Feb­ru­ary 2023, to com­pete with OpenAI’s Chat­GPT sys­tem, Meta AI released a set of large lan­guage mod­els called LLaMA. Though Meta had intended to share LLaMA only with a select group of users, the mod­els soon leaked to a pub­lic inter­net site. This led to an inquiry by the US Sen­ate Sub­com­mit­tee on Pri­vacy, Tech­nol­ogy, and the Law, which noted the “poten­tial for [LLaMA’s] mis­use in spam, fraud, mal­ware, pri­vacy vio­la­tions, harass­ment, and other wrong­do­ing and harms.”

Books as train­ing data

Though a large lan­guage model is a soft­ware pro­gram, it’s not cre­ated the way most soft­ware pro­grams are—that is, by human soft­ware engi­neers writ­ing code.

Rather, a large lan­guage model is “trained” by copy­ing mas­sive amounts of text from var­i­ous sources and feed­ing these copies into the model. (This cor­pus of input mate­r­ial is called the train­ing dataset).

Dur­ing train­ing, the large lan­guage model copies each piece of text in the train­ing dataset and extracts expres­sive infor­ma­tion from it. The large lan­guage model pro­gres­sively adjusts its out­put to more closely resem­ble the sequences of words copied from the train­ing dataset. Once the large lan­guage model has copied and ingested all this text, it is able to emit con­vinc­ing sim­u­la­tions of nat­ural writ­ten lan­guage as it appears in the train­ing dataset.

Much of the mate­r­ial in the train­ing datasets used by OpenAI and Meta comes from copy­righted works—includ­ing books writ­ten by Plain­tiffs—that were copied by OpenAI and Meta with­out con­sent, with­out credit, and with­out com­pen­sa­tion. Many of these books likely came from "shadow libraries”, web­sites that dis­trib­ute thou­sands of pirated books and pub­li­ca­tions. These fla­grantly ille­gal shadow libraries have long been of inter­est to the AI-train­ing com­mu­nity: for instance, an AI train­ing dataset called "Books3" (used by Meta) includes a recre­ation of a shadow library called Bib­li­otik and con­tains nearly 200,000 books.

Books in par­tic­u­lar are rec­og­nized within the AI com­mu­nity as valu­able train­ing data. A team of researchers from MIT and Cor­nell recently stud­ied the value of var­i­ous kinds of tex­tual mate­r­ial for machine learn­ing. Books were placed in the top tier of train­ing data that had “the strongest pos­i­tive effects on down­stream per­for­mance.” Books are also com­par­a­tively “much more abun­dant” than other sources, and con­tain the “longest, most read­able” mate­r­ial with “mean­ing­ful, well-edited sen­tences”.

Right—because as usual, “gen­er­a­tive arti­fi­cial intel­li­gence” is just human intel­li­gence, repack­aged and divorced from its cre­ators.

And the grift doesn’t end at train­ing. The project of steer­ing AI sys­tems toward some­thing other than the worst ver­sion of human­ity is known as align­ment. So far, the best known tech­nique is euphemisti­cally called rein­force­ment learn­ing from human feed­back, which for OpenAI entails hir­ing low-wage for­eign work­ers to spend hours with Chat­GPT, nudg­ing it away from toxic results.

The plain­tiffs

Our plain­tiffs are accom­plished book authors who have stepped for­ward to rep­re­sent a class of thou­sands of other writ­ers afflicted by gen­er­a­tive AI.

Paul Trem­blay

Paul Trem­blay has won the Bram Stoker, British Fan­tasy, and Mass­a­chu­setts Book awards and is the author of Sur­vivor Song, The Cabin at the End of the World, Dis­ap­pear­ance at Devil’s Rock, A Head Full of Ghosts, the crime nov­els The Lit­tle Sleep and No Sleep Till Won­der­land, and the short story col­lec­tion Grow­ing Things and Other Sto­ries.

His essays and short fic­tion have appeared in the Los Ange­les Times, New York Times, Enter­tain­ment Weekly online, and numer­ous year’s-best antholo­gies. He has a mas­ter’s degree in math­e­mat­ics and lives out­side Boston with his fam­ily.

Sarah Sil­ver­man

Sarah Sil­ver­man is a two-time Emmy Award-win­ning come­dian, actress, writer, and pro­ducer. She cur­rently hosts a crit­i­cally acclaimed weekly pod­cast, The Sarah Sil­ver­man Pod­cast. She can next be seen as the host of TBS’ Stu­pid Pet Tricks, an expan­sion of the famous David Let­ter­man late-night seg­ment. In spring 2022, Sil­ver­man’s off-Broad­way musi­cal adap­ta­tion of her 2010 New York Times best­selling mem­oir The Bed­wet­ter: Sto­ries of Courage, Redemp­tion, and Pee had a sold-out run with the Atlantic The­atre Com­pany.

On stage, Sil­ver­man con­tin­ues to cement her sta­tus as a force in stand-up com­edy. Sil­ver­man also lent her voice as “Vanel­lope” in the Oscar-nom­i­nated smash hit Wreck It Ralph and Golden Globe-nom­i­nated Ralph Breaks the Inter­net: Wreck-it Ralph 2. Sil­ver­man was nom­i­nated for a 2009 Prime­time Emmy Award for “Out­stand­ing Lead Actress in a Com­edy Series” for por­tray­ing a fic­tion­al­ized ver­sion of her­self in her Com­edy Cen­tral series The Sarah Sil­ver­man Pro­gram. In 2008, Sil­ver­man won a Prime­time Emmy Award for “Out­stand­ing Orig­i­nal Music and Lyrics” for her musi­cal col­lab­o­ra­tion with Matt Damon.

Sil­ver­man grew up in New Hamp­shire and attended New York Uni­ver­sity for one year. In 1993 she joined Sat­ur­day Night Live as a writer and fea­ture per­former and has not stopped work­ing since. She cur­rently lives in Los Ange­les.

Christo­pher Golden

Christo­pher Golden is the New York Times best­selling, Bram Stoker Award-win­ning author of such nov­els as Road of Bones, Ararat, Snow­blind, and Red Hands. With Mike Mignola, he is the co-cre­ator of the Out­er­verse comic book uni­verse, includ­ing such series as Bal­ti­more, Joe Golem: Occult Detec­tive, and Lady Bal­ti­more. As an edi­tor, he has worked on the short story antholo­gies Seize the Night, Dark Cities, and The New Dead, among oth­ers, and he has also writ­ten and co-writ­ten comic books, video games, screen­plays, and a net­work tele­vi­sion pilot. In 2015 he founded the pop­u­lar Mer­ri­mack Val­ley Hal­loween Book Fes­ti­val.

He was born and raised in Mass­a­chu­setts, where he still lives with his fam­ily. His work has been nom­i­nated for the British Fan­tasy Award, the Eis­ner Award, and mul­ti­ple Shirley Jack­son Awards. For the Bram Stoker Awards, Golden has been nom­i­nated ten times in eight dif­fer­ent cat­e­gories, and won twice. His orig­i­nal nov­els have been pub­lished in more than fif­teen lan­guages in coun­tries around the world.

Richard Kadrey

Richard Kadrey is the New York Times best­selling author of the Sand­man Slim super­nat­ural noir series. Sand­man Slim was included in Ama­zon’s “100 Sci­ence Fic­tion & Fan­tasy Books to Read in a Life­time,” and is in devel­op­ment as a fea­ture film. Some of Kadrey’s other books include King Bul­let, The Grand Dark, Butcher Bird, and The Dead Take the A Train (with Cas­san­dra Khaw). He’s writ­ten for film and comics, includ­ing Heavy Metal, Lucifer, and Hell­blazer. Kadrey also makes music with his band, A Demon in Fun City.

Email updates

If you’d like to receive occa­sional email updates on the progress of the cases, please use the links below.

Con­tact­ing us

If you’re a mem­ber of the press or the pub­lic with other ques­tions about this case or related top­ics, email llmlitigation@saverilawfirm.com. (Though please don’t send con­fi­den­tial or priv­i­leged infor­ma­tion.)

This web page is infor­ma­tional. Gen­eral prin­ci­ples of law are dis­cussed. But nei­ther Matthew Butterick nor any­one at the Joseph Saveri Law Firm is your law­yer, and noth­ing here is offered as legal advice. Ref­er­ences to copy­right per­tain to US law. This page will be updated as new infor­ma­tion becomes avail­able.