As a non-lawyer, I am very suspicious of the claim that "Plaintiffs and the Class have suffered monetary damages as a result of Defendants’ conduct." Flagrant disregard for copyright? Sure, maybe. The output of the model is subject to copyright? Who knows! But the copyright holders being damaged in some what? Seems doubtful. The best argument I could think of would be "GitHub would have had to pay us for this, and they didn't pay us, so we lost money," but that'd presumably work out to pennies per person.
The common practice in copyright cases is to calculate damages based on the theoretical cost that the infringer would have paid if they have bought the rights in the first place. This method was used during the piratebay case to calculate damages caused by the sites founders.
They did not actually calculate damages in terms of lost movie tickets or estimates vs actually sales number of sold game copies. When it came to pre-releases where such product wouldn't have been sold legally in the first place, they simply added a multiplier to indicate that the copyright owner wouldn't have been willing to sell.
For software code, an other practice I have read is to use the man-hours that rewriting copyrighted code would cost. Using such calculations they would likely estimate the man hours based on number of lines of code and multiply that with the average salary of a programmer.
The one thing we can say with complete certainty is that most programmers who had their code used without permission will not receive very much money at all if this class action lawsuit is decided in their favor.
I don't care about the money. I support this because it will establish case law that other companies can't ignore licenses as long as they throw AI somewhere in the chain.
If "I took your code and trained an AI that then generated your code" is a legal defense, the GPL and similar licenses all become moot.
"This license does not grant GitHub the right to sell Your Content. It also does not grant GitHub the right to otherwise distribute or use Your Content"
Money likely isn't the main goal (maybe it is for the lawyers), these are open source repos. Maybe they didn't consent to have their code used as training and that seems like the kind of thing consent should be needed for. Maybe this the AI spitting out copied snippets is a violation of open source licensing without attribution.
So for iseven can we go for how much a student might accept 20 an hour say and multiply that by the one minute required to create it and offer them 33 cents?
"Using such calculations they would likely estimate the man hours based on number of lines of code and multiply that with the average salary of a programmer."
The average salary of a programmer in which country?
So much programming is outsourced these days, and in some places programmers are very cheap.
This is just my guess, but I think the intention from the judges is not to actually calculate a true number. The reason they used the cost of publishing fees in the piratebay case was likely to illustrate how the court distinguished between a legal publisher vs an illegal one. The legal publisher would have bought the publishing rights, and since piratebay did not do this, the court uses those publishing fees in order to illustrate the difference.
If the court wanted to distinguish between Microsoft using their own programmers to generate code vs taking code from github users, then the salary in question would likely be that of Microsoft programmers. It would then be used to illustrate how a legal training data would look like compared to an illegal one.
Those damages are enumerated on pages 50-52. Remember, "damages" is being used in a legal sense here -- for a non-lawyer, you can interpret it more like "a dollar value on something someone did that was wrong". This is more broad than the colloquial use of the word.
If your intent is to create a competing product for profit, chances are that won't be found as fair use, given that determining fair use depends on intent and how the content is used.
Using clips from a movie in a movie review is probably fair use.
Using clips from a movie in knock-off of that movie for profit? Probably not fair use if it's not a parody.
Copilot is not like a movie reviewer using clips to review a movie. Copilot is like a production team for a movie taking clips from another movie to make a ripoff of that movie and selling it.
Consider every repo on github to be a movie. Copilot is taking individual frames out of every movie on github and composting them into a new film.
I think most of us would agree that individually, each frame is copyrighted. But what if you take one frame from a million different movies and put them in an order that produces a new coherent movie?
The core question we need to settle in court is: does the new movie become its own copyrightable work, or is it plagiarism?
You're mistaking the end-user's copyright infringement with Copilot's alleged infringement.
Copilot is fair use and transformative -- that is unless there is an open source Copilot that Copilot is training on, only then would it be competing and it's easy for GitHub or OpenAI to exclude those repos of copilot alternatives from the training set.
> Copilot is like a production team for a movie taking clips from another movie to make a ripoff of that movie and selling it.
I can't think of a 5 line snippet I've written or read that makes sense to claim ownership of. They don't stand on their own in the way even a 30s movie clip does.
I dont think that's comparable. For starters, its not just the length of a quote that makes it fair use, but the way quotes are used i.e. to engage in commentary.
It's the license that matters, not whether the code is visible on Microsoft's website.
Code which anybody can view is called "source available". You aren't necessarily allowed to use the code, but some companies will let their customers see what is going on so they can better integrate the code, understand performance implications, debug and fix unexpected issues, etc. The customers would probably face significant legal risks if they took that code and started to sell it.
"Open source" code implies permission to re-use the code, but there is still some nuance. Some open-source licenses come with almost no restrictions, but others include limiting clauses. The GPL, for example, is "viral": anybody who uses GPL code in a project must also provide that project's source code on request.
What do you think the chances are that Microsoft would surrender the Copilot codebase upon receipt of a GPL request?
Aren't there statutory damages for copyright infringement, i.e. there is a presumption that each work infringed is worth at least a certain amount without proving actual damages?