We’ve filed law­suits chal­leng­ing Chat­GPT, LLaMA, and other lan­guage mod­els for vio­lat­ing the legal rights of authors.
Because AI needs to be fair & eth­i­cal for every­one.

This is Joseph Saveri and Matthew Butterick. In Novem­ber 2022, we teamed up to file a law­suit chal­leng­ing GitHub Copi­lot, an AI cod­ing assis­tant built on unprece­dented open-source soft­ware piracy. In Jan­u­ary 2023, we filed a law­suit chal­leng­ing Sta­ble Dif­fu­sion and Mid­jour­ney, AI image gen­er­a­tors built on the heist of five bil­lion dig­i­tal images.

On behalf of seven won­der­ful book authors, we’ve filed four class-action law­suits—against OpenAI, Meta, NVIDIA, and Data­bricks—chal­leng­ing the legal­ity of large lan­guage mod­els trained on copy­righted works with­out con­sent, com­pen­sa­tion, or credit.

It’s a great plea­sure to stand up on behalf of authors and con­tinue the vital con­ver­sa­tion about how AI will coex­ist with human cul­ture and cre­ativ­ity.

The plain­tiffs

Our plain­tiffs are accom­plished book authors who have stepped for­ward to rep­re­sent a class of thou­sands of other writ­ers afflicted by gen­er­a­tive AI.

Paul Trem­blay

Paul Trem­blay has won the Bram Stoker, British Fan­tasy, and Mass­a­chu­setts Book awards and is the author of Sur­vivor Song, The Cabin at the End of the World, Dis­ap­pear­ance at Devil’s Rock, A Head Full of Ghosts, the crime nov­els The Lit­tle Sleep and No Sleep Till Won­der­land, and the short story col­lec­tion Grow­ing Things and Other Sto­ries.

His essays and short fic­tion have appeared in the Los Ange­les Times, New York Times, Enter­tain­ment Weekly online, and numer­ous year’s-best antholo­gies. He has a mas­ter’s degree in math­e­mat­ics and lives out­side Boston with his fam­ily.

Sarah Sil­ver­man

Sarah Sil­ver­man is a two-time Emmy Award-win­ning come­dian, actress, writer, and pro­ducer. She cur­rently hosts a crit­i­cally acclaimed weekly pod­cast, The Sarah Sil­ver­man Pod­cast. She can next be seen as the host of TBS’ Stu­pid Pet Tricks, an expan­sion of the famous David Let­ter­man late-night seg­ment. In spring 2022, Sil­ver­man’s off-Broad­way musi­cal adap­ta­tion of her 2010 New York Times best­selling mem­oir The Bed­wet­ter: Sto­ries of Courage, Redemp­tion, and Pee had a sold-out run with the Atlantic The­atre Com­pany.

On stage, Sil­ver­man con­tin­ues to cement her sta­tus as a force in stand-up com­edy. Sil­ver­man also lent her voice as Vanel­lope in the Oscar-nom­i­nated smash hit Wreck-It Ralph and Golden Globe-nom­i­nated Ralph Breaks the Inter­net: Wreck-It Ralph 2. Sil­ver­man was nom­i­nated for a 2009 Prime­time Emmy Award for “Out­stand­ing Lead Actress in a Com­edy Series” for por­tray­ing a fic­tion­al­ized ver­sion of her­self in her Com­edy Cen­tral series The Sarah Sil­ver­man Pro­gram. In 2008, Sil­ver­man won a Prime­time Emmy Award for “Out­stand­ing Orig­i­nal Music and Lyrics” for her musi­cal col­lab­o­ra­tion with Matt Damon.

Sil­ver­man grew up in New Hamp­shire and attended New York Uni­ver­sity for one year. In 1993 she joined Sat­ur­day Night Live as a writer and fea­ture per­former and has not stopped work­ing since. She cur­rently lives in Los Ange­les.

Christo­pher Golden

Christo­pher Golden is the New York Times best­selling, Bram Stoker Award-win­ning author of such nov­els as Road of Bones, Ararat, Snow­blind, and Red Hands. With Mike Mignola, he is the co-cre­ator of the Out­er­verse comic book uni­verse, includ­ing such series as Bal­ti­more, Joe Golem: Occult Detec­tive, and Lady Bal­ti­more. As an edi­tor, he has worked on the short story antholo­gies Seize the Night, Dark Cities, and The New Dead, among oth­ers, and he has also writ­ten and co-writ­ten comic books, video games, screen­plays, and a net­work tele­vi­sion pilot. In 2015 he founded the pop­u­lar Mer­ri­mack Val­ley Hal­loween Book Fes­ti­val.

He was born and raised in Mass­a­chu­setts, where he still lives with his fam­ily. His work has been nom­i­nated for the British Fan­tasy Award, the Eis­ner Award, and mul­ti­ple Shirley Jack­son Awards. For the Bram Stoker Awards, Golden has been nom­i­nated ten times in eight dif­fer­ent cat­e­gories, and won twice. His orig­i­nal nov­els have been pub­lished in more than fif­teen lan­guages in coun­tries around the world.

Richard Kadrey

Richard Kadrey is the New York Times best­selling author of the Sand­man Slim super­nat­ural noir series. Sand­man Slim was included in Ama­zon’s “100 Sci­ence Fic­tion & Fan­tasy Books to Read in a Life­time,” and is in devel­op­ment as a fea­ture film. Some of Kadrey’s other books include King Bul­let, The Grand Dark, Butcher Bird, and The Dead Take the A Train (with Cas­san­dra Khaw). He’s writ­ten for film and comics, includ­ing Heavy Metal, Lucifer, and Hell­blazer. Kadrey also makes music with his band, A Demon in Fun City.

Stew­art O’Nan

Stew­art O’Nan’s award-win­ning fic­tion includes Snow Angels, A Prayer for the Dying, Last Night at the Lob­ster, and Emily, Alone. Granta named him one of Amer­ica’s Best Young Nov­el­ists. He lives in Pitts­burgh. (Photo: Beth Navarro)

Abdi Nazemian

Abdi Nazemian is the author of Like a Love Story, a Stonewall Honor Book, Only This Beau­ti­ful Moment, The Chan­dler Lega­cies, and The Authen­tics. His novel The Walk-In Closet won the Lambda Lit­er­ary Award for LGBT Debut Fic­tion. His screen­writ­ing cred­its include the films The Artist’s Wife, The Quiet, and Menen­dez: Blood Broth­ers and the tele­vi­sion series Ordi­nary Joe and The Vil­lage. He has been an exec­u­tive pro­ducer and asso­ciate pro­ducer on numer­ous films, includ­ing Call Me by Your Name, Lit­tle Woods, and The House of Tomor­row. He lives in Los Ange­les with his hus­band, their two chil­dren, and their dog, Disco. (Photo: Michelle Schapiro)

Brian Keene

Brian Keene is the author of over fifty books and three hun­dred short sto­ries, mostly in the hor­ror, crime, fan­tasy, and non-fic­tion gen­res, includ­ing Ghost Walk. His 2003 novel The Ris­ing is cred­ited with inspir­ing pop cul­ture’s recur­rent inter­est in zom­bies. He has also writ­ten for such media prop­er­ties as Doc­tor Who, Thor, Aliens, Harley Quinn, The X-Files, Doom Patrol, Jus­tice League, Hell­boy, Super­man, and Mas­ters of the Uni­verse. He was the showrun­ner for Realm Media and Black­box TV’s Sil­ver­wood: The Door.

Sev­eral of Keene’s nov­els and sto­ries have been adapted for film, includ­ing Ghoul, The Naughty List, The Ties That Bind, and Fast Zom­bies Suck. Keene also served as Exec­u­tive Pro­ducer for the fea­ture-length film I’m Dream­ing of a White Dooms­day.

From 2015 to 2020, he hosted the immensely pop­u­lar The Hor­ror Show with Brian Keene pod­cast. He also hosted (along with Christo­pher Golden) the long-run­ning Defend­ers Dia­logue pod­cast. Keene also serves on the board of Scares That Care, and as a trustee for the Hor­ror Writ­ers Asso­ci­a­tion.

The father of two sons and step­fa­ther to one daugh­ter, Keene lives in Penn­syl­va­nia with his wife, author Mary San­Gio­vanni, and sev­eral cats. (Photo: John Urban­cik)

Books as train­ing data

Though an AI lan­guage model—often known as a large lan­guage model or LLM for short—is a soft­ware pro­gram, it’s not cre­ated the way most soft­ware pro­grams are—that is, by human soft­ware engi­neers writ­ing code.

Rather, an LLM is “trained” by copy­ing mas­sive amounts of text from var­i­ous sources and feed­ing these copies into the model. (This cor­pus of input mate­r­ial is called the train­ing dataset).

Dur­ing train­ing, the LLM copies each work in the train­ing dataset and the copy­righted expres­sion con­tained therein. The LLM pro­gres­sively adjusts its out­put to more closely resem­ble the sequences of words copied from the train­ing dataset. Once the LLM has copied and ingested all this text, it is able to emit con­vinc­ing sim­u­la­tions of nat­ural writ­ten lan­guage as it appears in the train­ing dataset.

Much of the mate­r­ial in the train­ing datasets used by the defen­dants comes from copy­righted works—includ­ing books writ­ten by Plain­tiffs—that were copied and used for train­ing with­out con­sent, with­out credit, and with­out com­pen­sa­tion. Many of these books likely came from “shadow libraries”, web­sites that pirat­i­cally dis­trib­ute thou­sands of copy­righted books and pub­li­ca­tions.

Books in par­tic­u­lar are rec­og­nized within the AI com­mu­nity as valu­able train­ing data. A team of researchers from MIT and Cor­nell recently stud­ied the value of var­i­ous kinds of tex­tual mate­r­ial for machine learn­ing. Books were placed in the top tier of train­ing data that had “the strongest pos­i­tive effects on down­stream per­for­mance.” Books are also com­par­a­tively “much more abun­dant” than other sources, and con­tain the “longest, most read­able” mate­r­ial with “mean­ing­ful, well-edited sen­tences”.

Right—because as usual, “gen­er­a­tive arti­fi­cial intel­li­gence” is just human intel­li­gence, repack­aged and divorced from its cre­ators.

And the grift doesn’t end at train­ing. The project of steer­ing AI sys­tems toward some­thing other than the worst ver­sion of human­ity is known as align­ment. So far, the best known tech­nique is euphemisti­cally called rein­force­ment learn­ing from human feed­back, which for OpenAI entails hir­ing low-wage for­eign work­ers to spend hours with Chat­GPT, nudg­ing it away from toxic results.

The defen­dants

OpenAI

OpenAI, founded by Elon Musk and Sam Alt­man in 2015, is based in San Fran­cisco. Accord­ing to Alt­man, the two started OpenAI as a non­profit ven­ture “to develop a human pos­i­tive AI … freely owned by the world.”

OpenAI’s sta­tus as a non­profit was crit­i­cal to its ini­tial posi­tion­ing. As Alt­man said then, “any­thing [OpenAI] devel­ops will be avail­able to every­one.” Why? Because they believed this approach was “the actual best thing for the future of human­ity.”

The hon­ey­moon would end. In 2018, Musk left OpenAI amidst dis­putes with Alt­man over its direc­tion. In 2019, Alt­man reversed him­self on OpenAI’s non­profit purity and cre­ated a for-profit sub­sidiary. Later that year, OpenAI took a $1 bil­lion invest­ment from Microsoft. By Jan­u­ary 2023, Microsoft had pro­gres­sively increased its invest­ment to $13 bil­lion.

In Feb­ru­ary 2023, Musk sharply crit­i­cized OpenAI, say­ing that it “was cre­ated as an open source … non-profit com­pany” but had “become a closed source, max­i­mum-profit com­pany effec­tively con­trolled by Microsoft.” By March 2023, Alt­man’s new “grand idea” was that “OpenAI will cap­ture much of the world’s wealth”. How much? Alt­man sug­gested “$100 bil­lion, $1 tril­lion, $100 tril­lion.” (The total United States money sup­ply, accord­ing to the broad­est mea­sure (called M2), cur­rently sits at roughly $20.8 tril­lion.) In March 2024, Musk sued OpenAI for breach of con­tract.

Meta

Meta is a maker of vir­tual-real­ity prod­ucts, includ­ing Hori­zon Worlds. Meta also sells adver­tis­ing on its web­sites Face­book, Insta­gram, and WhatsApp. In 2019, Meta (then known as Face­book) was fined $5 bil­lion by the FTC for pri­vacy vio­la­tions aris­ing from improper han­dling of per­sonal data. More recently, Meta was fined €1.3 bil­lion by the EU, also for pri­vacy vio­la­tions aris­ing from improper han­dling of per­sonal data.

Since 2013, Meta has oper­ated an AI research lab, called Meta AI, founded by Mark Zucker­berg and Yann LeCun. Though in the past, Meta has used AI to spread fake news and hate speech, cur­rently Meta AI is work­ing on apply­ing AI tech­nol­ogy to sell­ing adver­tis­ing.

In Feb­ru­ary 2023, to com­pete with OpenAI’s Chat­GPT sys­tem, Meta AI released a set of LLMs called LLaMA. Though Meta had intended to share LLaMA only with a select group of users, the mod­els soon leaked to a pub­lic inter­net site. This led to an inquiry by the US Sen­ate Sub­com­mit­tee on Pri­vacy, Tech­nol­ogy, and the Law, which noted the “poten­tial for [LLaMA’s] mis­use in spam, fraud, mal­ware, pri­vacy vio­la­tions, harass­ment, and other wrong­do­ing and harms.”

NVIDIA

NVIDIA is a tech­nol­ogy com­pany founded in 1993 that orig­i­nally focused on com­puter-graph­ics hard­ware and has since expanded to other com­pu­ta­tion­ally inten­sive fields, includ­ing soft­ware and hard­ware for train­ing and oper­at­ing AI soft­ware pro­grams. NVIDIA has trained a series of large lan­guage mod­els called NeMo Mega­tron on a dataset called The Pile, which includes thou­sands of copy­righted books.

Data­bricks and MosaicML

Data­bricks is an AI ser­vices com­pany in San Fran­cisco. In July 2023, it acquired MosaicML, a maker of gen­er­a­tive-AI tools. MosaicML has trained a series of large lan­guage mod­els called MPT on a dataset called Red­Pa­jama, which includes thou­sands of copy­righted books.

Email updates

If you’d like to receive occa­sional email updates on the progress of the cases, please use the links below.

Con­tact­ing us

If you’re a mem­ber of the press or the pub­lic with other ques­tions about this case or related top­ics, email llmlitigation@saverilawfirm.com. (Though please don’t send con­fi­den­tial or priv­i­leged infor­ma­tion.)

This web page is infor­ma­tional. Gen­eral prin­ci­ples of law are dis­cussed. But nei­ther Matthew Butterick nor any­one at the Joseph Saveri Law Firm is your law­yer, and noth­ing here is offered as legal advice. Ref­er­ences to copy­right per­tain to US law. This page will be updated as new infor­ma­tion becomes avail­able.