Anthropic AI pays off authors with just $1.5 billion

David Gerard@awful.systems · 4 months ago

Anthropic AI pays off authors with just $1.5 billion

corbin@awful.systems · 4 months ago

I’m gonna start by quoting the class’s pretty decent summary, which goes a little heavy on the self-back-patting:

If approved, this landmark settlement will be the largest publicly reported copyright recovery in history… The proposed settlement … will set a precedent of AI companies paying for their use of pirated websites like Library Genesis and Pirate Library Mirror.

The stage is precisely the one that we discussed previously, on Awful in the context of Kadrey v. Meta. The class was aware that Kadrey is an obvious obstacle to succeeding at trial, especially given how Authors Guild v. Google (Google Books) turned out:

Plaintiffs’ core allegation is that Anthropic committed largescale copyright infringement by downloading and comercially exploiting books that it obtained from allegedly pirated datasets. Anthropic’s principal defense was fair use, the same defense that defeated the claims of rightsholders in the last major battle over copyrighted books exploited by large technology companies. … Indeed, among the Court’s first questions to Plaintiffs’ counsel at the summary judgment hearing concerned Google Books. … This Settlement is particularly exceptional when viewed against enormous risks that Plaintiffs and the Class faced… [E]ven if Plaintiffs succeeded in achieving a verdict greater than $1.5 billion, there is always the risk of a reversal on appeal, particularly where a fair use defense is in play. … Given the very real risk that Plaintiffs and the Class recover nothing — or a far lower amount — this landmark $1.5 billion+ settlement is a resounding victory for the Class. … Anthropic had in fact argued in its Section 1292(b) motion that Judge Chhabria held that the downloading of large quantities of books from LibGen was fair use in the Kadrey case.

Anthropic’s agreed to delete their copies of pirated works. This should suggest to folks that the typical model-training firm does not usually delete their datasets.

Anthropic has committed to destroy the datasets within 30 days of final judgement … and will certify as such in writing…

All in all, I think that this is a fairly healthy settlement for all involved. I do think that the resulting incentive for model-trainers is not what anybody wants, though; Google Books is still settled and Kadrey didn’t get updated, so model-trainers now merely must purchase second-hand books at market price and digitize them, just like Google has been doing for decades. At worst, this is a business opportunity for a sort of large private library which has pre-digitized its content and sells access for the purpose of training models. Authors lose in the long run; class members will get around $3k USD in this payout, but second-hand sales simply don’t have royalties attached in the USA after the first sale.