This thread makes it really clear that we need laws specifically focused on generative AI. Looking for an answer in current copyright law is like expecting the First Amendment to have a subsection specifically devoted to social media networks.
There are, it’s called TDM exceptions: https://www.reedsmith.com/en/perspectives/ai-in-entertainment-and-media/2023/06/text-and-data-mining-around-the-globe
Here we go again with all the reddit bros pretending statistical regurgitation “is just like a human, trust me, don’t you get inspired by reading stuff as well”
Oh, well, whatever… Soon enough all the data sets will be useless, maybe it will be time these companies hire writers for $5 an hour and lock them in a room to produce some unspoiled input.
Every other week someone tries.
A lawsuit is basically an angry letter with a filing fee. It’s another question entirely if they can actually win.
Going to be fun to see the influx of “case dismissed” articles in a few months though.
and academic journals without their consent.
Good.
Elsevier and their ilk are pure parasites. They take work paid for by public funding and charge scientists to publish, they do basically nothing, they don’t review the work, they don’t do formatting, they don’t even do so much as check for spelling mistakes. They exist purely because of a quirk of history and the difficulty of coordinating moving away from assessing academics based on prestige and impact factor of publications.
They’re parasitic organisations who try to lock up public information.
Academic journals should be free and available for everyone, they shouldn’t be getting fed into AI without permission.
You do realize you’re contradicting yourself, right?
Nope. Journals being accessible to everyone in an archive does not mean AI models should have carte blanche consent to use them to train.
I understand what you’re going for, but that might be tricky legally. What special status does the archive have that allows it to make all that information accessible, that an AI model wouldn’t have?
The law is fucked and needs to catch up to AI stuff. DMCA, fair use etc is not built to handle scraping on the level AI does.
Academic journals should be free and available for everyone, they
shouldn’tshould be getting fed into AI without permission.Here, FTFY. I don’t know if you recognize the dissonance between the first and the second part of your sentence.
There is no dissonance. I don’t think AI models should be getting stuff, because they’re not a public archive. They are using it to build a data model. There’s a difference between commercial use, which is the goal of AI companies, and spreading knowledge and research.
That’s not dissonance.
So your opinion is also that search engines should pay websites for the content they index? Explain to me how one is different from th other.
Explain to me how one is different from th other.
Man, he literally said it. Can you read? Wait sorry, you’re an AI techbro. You barely know how to write a prompt.
The goal of AI companies is to make money and give nothing back to the data that fed their model. Search indexes have a mutually beneficial relationship with whatever they index that drives traffic to websites.
I’m not sure I can make it any easier. Maybe ask chatgpt if you still don’t get it.
Feeding it into AI’s is one of the things countless researchers would love to do with scientific literature in order to fuel more discoveries for the benefit of everyone.
but the parasitic journal owners try to heavily restrict what you can do with the text even after you’ve paid out the nose to publish and paid out the nose for subscriptions.
You’re speaking for the researchers. What they want is a free, public archive which already exists(not legally though). AI is not there to make an archive.
Well, if it’s just so people have to pay openAI to get access to knowledge instead of having to pay Elsevier, it’s not really what I personally want to be honest…
You managed to contradict yourself in one sentence.
I don’t have a dog in this fight nor do I know the specifics of the relevant law here, but I would note that Susman Godfrey is probably the best litigation-focused law firm in America and it’s unlikely that they’re just moronically accepting a case without strong support in the law. Look at their track record and their attorney bios; these people absolutely do not screw around.
Distinguished lawyers and professors have done the same in the past, I wouldn’t rule it out.
People, particularly outside tech, have a tendency to imaging the chatbot is like a person they can ask to testify.
Elsevier and their ilk are pure parasites.
But Microsoft is cool and good.
Microsoft and OpenAI may scrape stuff but at least they don’t then try to lock everyone else out from being able to read the original.
A big step up from Elsevier
At this point everyone knows that these LLM’s don’t know what they were trained on.
Just to emphasize this, machine learning algorithms doesn’t know anything. All training it does is calibrate adjust constants in an equation.
Like Jon Snow, it knows nothing, and if you ask it for something complicated, it will put that on full display.
Good for them. I wish them justice.
Justice would be them having to pay the defense’s legal fees for filing a frivolous suit.
If you are buying the hoax that genAI’s data laundering scheme is fair use, I would like you to spare me the frivolous argument!
It is truly depressing to see so many people watch massive mega corporations practice unrestrained access to our property and personal data, then use that to replace our jobs to fill their own pockets, and be dumb enough to take their side.
If you are buying the hoax that genAI’s data laundering scheme is fair use
Because it is. No legal scholar seriously doubts that argument. It comfortably meets all the requirements.
It is truly depressing to see so many people watch massive mega corporations practice unrestrained access to our property and personal data
Lmao, and you think abolishing fair use is somehow a win for people over corporations? Now I know you’re just trolling.
Because it is. No legal scholar seriously doubts that argument. It comfortably meets all the requirements.
Rationalization placed on the big corporations having good lawyers.
Lmao, and you think abolishing fair use is somehow a win for people over corporations? Now I know you’re just trolling.
You seriously think thats what I’m arguing for? Or are you composing a strawman to comfort yourself? Asking for data laundering scams to be regulated so they don’t replace the working class’s jobs the moment it makes a mega corporation a single buck should not be insane. It doesn’t mean abolishing fair use. Helpful idiots like you are what these companies are depending on though.
I thought I told you to spare me the frivolous argument … go bootlick somewhere else.
Rationalization placed on the big corporations having good lawyers.
I’m not talking about just OpenAI’s lawyers. This is actually a very clear-cut matter, despite your attempts to throw doubt on it.
You seriously think thats what I’m arguing for?
Quite literally, yes. Training an AI model is rather clearly fair use, so to make that illegal, you need to either abolish fair use, or severely limit it from its current scope.
Asking for data laundering scams to be regulated so they don’t replace the working class’s jobs the moment it makes a mega corporation a single buck
And I’m sure you would have also suggested that we ban the automated loom for putting weavers out of business. There’s a reason the Luddites lost.
What is it with you AI circlejerkers and constantly calling people Luddites?
What is it with you AI circlejerkers and constantly calling people Luddites?
Calling a spade a spade. You have a better term for someone who wants to hold back technology because it threatens some small population in an existing industry?
When you don’t know the difference between stealing and copying
Why is this downvoted?
Books good
AI badUpvoted to the left
Learn to code
Makes you wonder if they’re now compelled to release their models if they used any GNU licensed material
You’re a moron
Idiots: “Sam Altman was fired because of Super AI for some reason”
Normal people: “OpenAI now has several lawsuits that might make a impossible to monetize without being forced to pay billions and it’s doubtful that Sam told Microsoft before he sold his chat bot that he was stealing author’s works”
And that explains why he’s now back? And has had MS’s support the entire time?
Well, it was either fire him and lose all your money by losing your employees or keep him and try to salvage what you can.
Microsoft has always wanted to keep him. The OpenAI board fired him for ideological reasons/power struggle, realized they would be killing the entire company, and decided to salvage the company even at the cost of their jobs.
Why is it okay for a human to read and learn from copyrighted materials, but its not OK for a machine to do so?
Machines don’t have inspiration. They only do advanced versions of copy paste
It’s funny you say that because now that I think about it, inspiration basically is advanced copy and paste
Except a human gets inspiration from their environment, their life, their emotions. Unique experiences.
A bot only gets “inspiration” from other people’s work. And if that work is copyrighted… The author deserves compensation
This is an oversimplification of both human cognition and how machines work.
Your argument boils doen to the fact humans have a more diverse data set. This is a terrible legal basis.
What are you saying… It’s not about the amount of information, it’s about whether the source of information is copyrighted work or not.
Monet cultivated his own garden and painted the famous water lillies. That is 100% original work. No argument possible
Your environment, emotions, and experiences are simply different forms of data and sources to pull from. Most stories are in some way inspired by other stories.
Why is it okay to own furniture, but not people?
By the way:
its not OK for a machine to do so
There are no machines that read and learn. “machine learning” is a technical term that has nothing to do with actual learning.
There are no machines that read and learn.
That’s exactly what Language Learning Models do.
That’s exactly what Language Learning Models do.
I can see how you would come to that conclusion, given that you clearly are incapable of either.
What’s the connection between owning slaves and using computer tools? I don’t really follow this jump in logic.
https://en.m.wikipedia.org/wiki/Master/slave_(technology)
Don’t quite agree with the above poster but this is the tool they’re referring to and they’re making the argument that it is a metaphor/just the name of the tool and there isn’t a direct connection.
I think they were talking about people slaves, not computer networks. The person above them asked why humans can learn from copyright materials, but machines aren’t allowed to. The next person asked why we can own furniture but not people. To me this seems like they are saying we don’t own slaves for the same reason computer programs shouldn’t be allowed to learn from copyright materials. I’d say we don’t own slaves because as a society we value and believe in individuality, personal choice, and bodily autonomy, and I don’t see how these relate to dictating what content you train computer models on.
Have you ever considered the possibility that unliving objects are not, in fact, people?
I guess you think “neural networks” work nothing like a brain right?
Of course machines can read and learn, how can you even say otherwise?
I could give a LLM an original essay, and it will happily read it and give me new insights based on it’s analysis. That’s not a conceptual metaphor, that’s bonafide artificial intelligence.
I think anyone who thinks neural nets work exactly like a brain at this point in time are pretty simplistic in their view. Then again you said “like a brain” so You’re already into metaphor territory so I don’t know what you’re disagreeing with.
Learning as a human and learning as an LLM are just different philosophical categories. We have consciousness, we don’t know if LLMs do. That’s why we use the word “like”. Kind of like, “head-throbbed heart-like”.
We don’t just use probability. We can’t parse 10,000,000 parameter spaces. Most people don’t use linear algebra.
A simulation of something is not equal to that something in general.
What? We as human literally learn through pattern recognition. How is it different that what a machine is doing? Of course it is not exactly the same process our brains do, but it is by no means a “metaphor”.
I fail to see how training an LLM with the material I choose is any different than me studying that material. Artists are just mad I can make awesome pictures on my graphics card.
Neural networks aren’t literally bundles of biological neurons but that doesn’t mean they’re not learning.
Pretty sure humans paid for the materials. That’s the whole point. Authors have to be compensated for their work.
Homie is in /r/books and has never heard of a library
Because human beings have rights and machines don’t and shouldn’t. Humans read for enjoyment and self fulfillment. These AI machines only read for the purpose of regurgitating a soulless imitation of the original. Not even remotely similar.
This is going to go the way of the Silverman case. On quote from that judge:
“This is nonsensical,” he wrote in the order. “There is no way to understand the LLaMA models themselves as a recasting or adaptation of any of the plaintiffs’ books.”
The Silverman case isn’t over. The judge took the position that the output themselves are not infringement, as I think most people agree since it is a transformation, but the core of the case is still ongoing - that the dataset used to train these models contained their copyrighted work. Copying is one of the rights granted to copyright holders and, unlike the Google case a few years back, this is for a commercial product and the books were not legally obtained. Very different cases. I would be surprised if Silverman and the others lost this lawsuit.
The Silverman case isn’t over.
It is with respect to that argument. The claim in question was thrown out.
The remaining claim is unrelated.
Copyright is more about distribution and deprivation than copying.
There is absolutely nothing preventing me from sitting down and handwriting the entirety of the LOTR in calligraphic script.
I can even give that copy to other people, as it is a “derivative work,” and I’m not attempting to profit from it.
There’s not even anything preventing me from scanning every page and creating a .pdf file for personal use, as long as I don’t distribute it.
Hell, the DMCA even allows me to rip a movie as long as I’m keeping it for personal use.
I don’t see anything here that can not be argued against with fair use. The case is predicated upon the idea that if you give it the correct prompts, it’ll spit out large amounts of copyrighted text.
If you were describing that as an interaction with a person, you’d call that coercion and maybe even entrapment.
The intent of the scraping was not explicitly distribution.
Just to clarify most people do not agree. A lot of people are explicitly arguing that.
Get ‘em!
I’m so surprised at the amount of people defending AI in this subreddit. It’s truly makes me feel like we failed as a species. I’m not a writer, nor an artist or musician but art and culture have walked hand in hand in human history. I struggle to believe why aren’t we more protective of it and instead just hand out thousands of years of human tradition to machines. Just because we could doesn’t mean we should.
Both meta and OpenAI have been clear about pirating thousands of books for their training sets so it’s no exactly surprising that lawsuits are following
Good. It is theft. Data is not free