

I’m gonna start by quoting the class’s pretty decent summary, which goes a little heavy on the self-back-patting:
If approved, this landmark settlement will be the largest publicly reported copyright recovery in history… The proposed settlement … will set a precedent of AI companies paying for their use of pirated websites like Library Genesis and Pirate Library Mirror.
The stage is precisely the one that we discussed previously, on Awful in the context of Kadrey v. Meta. The class was aware that Kadrey is an obvious obstacle to succeeding at trial, especially given how Authors Guild v. Google (Google Books) turned out:
Plaintiffs’ core allegation is that Anthropic committed largescale copyright infringement by downloading and comercially exploiting books that it obtained from allegedly pirated datasets. Anthropic’s principal defense was fair use, the same defense that defeated the claims of rightsholders in the last major battle over copyrighted books exploited by large technology companies. … Indeed, among the Court’s first questions to Plaintiffs’ counsel at the summary judgment hearing concerned Google Books. … This Settlement is particularly exceptional when viewed against enormous risks that Plaintiffs and the Class faced… [E]ven if Plaintiffs succeeded in achieving a verdict greater than $1.5 billion, there is always the risk of a reversal on appeal, particularly where a fair use defense is in play. … Given the very real risk that Plaintiffs and the Class recover nothing — or a far lower amount — this landmark $1.5 billion+ settlement is a resounding victory for the Class. … Anthropic had in fact argued in its Section 1292(b) motion that Judge Chhabria held that the downloading of large quantities of books from LibGen was fair use in the Kadrey case.
Anthropic’s agreed to delete their copies of pirated works. This should suggest to folks that the typical model-training firm does not usually delete their datasets.
Anthropic has committed to destroy the datasets within 30 days of final judgement … and will certify as such in writing…
All in all, I think that this is a fairly healthy settlement for all involved. I do think that the resulting incentive for model-trainers is not what anybody wants, though; Google Books is still settled and Kadrey didn’t get updated, so model-trainers now merely must purchase second-hand books at market price and digitize them, just like Google has been doing for decades. At worst, this is a business opportunity for a sort of large private library which has pre-digitized its content and sells access for the purpose of training models. Authors lose in the long run; class members will get around $3k USD in this payout, but second-hand sales simply don’t have royalties attached in the USA after the first sale.
I think that you have useful food for thought. I think that you underestimate the degree to which capitalism recuperates technological advances, though. For example, it’s common for singers supported by the music industry to have pitch correction which covers up slight mistakes or persistent tone-deafness, even when performing live in concert. This technology could also be used to allow amateurs to sing well, but it isn’t priced for them; what is priced for amateurs is the gimmicky (and beloved) whammy pedal that allows guitarists to create squeaky dubstep squeals. The same underlying technology is configured for different parts of capitalism.
From that angle, it’s worth understanding that today’s generative tooling will also be configured for capitalism. Indeed, that’s basically what RLHF does to a language model; in the jargon, it creates an “agent”, a synthetic laborer, based on desired sales/marketing/support interactions. We also have uses for raw generation; in particular, we predict the weather by generating many possible futures and performing statistical analysis. Style transfer will always be useful because it allows capitalists to capture more of a person and exploit them more fully, but it won’t ever be adopted purely so that the customer has a more pleasant experience. Composites with object detection (“filters”) in selfie-sharing apps aren’t added to allow people to express themselves and be cute, but to increase the total and average time that users spend in the apps. Capitalists can always use the Shmoo, or at least they’ll invest in Shmoo production in order to capture more of a potential future market.
So, imagine that we build miniature cloned-voice text-to-speech models. We don’t need to imagine what they’re used for, because we already know; Disney is making movies and extending their copyright on old characters, and amateurs are making porn. For every blind person using such a model with a screen reader, there are dozens of streamers on Twitch using them to read out donations from chat in the voice of a breathy young woman or a wheezing old man. There are other uses, yes, but capitalism will go with what is safest and most profitable.
Finally, yes, you’re completely right that e.g. smartphones completely revolutionized filmmaking. It’s important to know that the film industry didn’t intend for this to happen! This is just as much of an exaptation as captialist recuperation and we can’t easily plan for it because of the same difficulty in understanding how subsystems of large systems interact (y’know, plan interference.)