As a non-lawyer, I am very suspicious of the claim that "Plaintiffs and the Clas...

belorn · on Nov 3, 2022

The common practice in copyright cases is to calculate damages based on the theoretical cost that the infringer would have paid if they have bought the rights in the first place. This method was used during the piratebay case to calculate damages caused by the sites founders.

They did not actually calculate damages in terms of lost movie tickets or estimates vs actually sales number of sold game copies. When it came to pre-releases where such product wouldn't have been sold legally in the first place, they simply added a multiplier to indicate that the copyright owner wouldn't have been willing to sell.

For software code, an other practice I have read is to use the man-hours that rewriting copyrighted code would cost. Using such calculations they would likely estimate the man hours based on number of lines of code and multiply that with the average salary of a programmer.

karaterobot · on Nov 3, 2022

The one thing we can say with complete certainty is that most programmers who had their code used without permission will not receive very much money at all if this class action lawsuit is decided in their favor.

mike_d · on Nov 3, 2022

I don't care about the money. I support this because it will establish case law that other companies can't ignore licenses as long as they throw AI somewhere in the chain.

If "I took your code and trained an AI that then generated your code" is a legal defense, the GPL and similar licenses all become moot.

bastardoperator · on Nov 3, 2022

But that's not what's happening here. Also, you grant GitHub a license.

https://docs.github.com/en/site-policy/github-terms/github-t...

"You grant us and our legal successors the right to store, archive, parse, and display Your Content"

Copilot displays content. Case closed.

mike_d · on Nov 3, 2022

Feel free to keep reading the next line down:

"This license does not grant GitHub the right to sell Your Content. It also does not grant GitHub the right to otherwise distribute or use Your Content"

sqeaky · on Nov 3, 2022

Money likely isn't the main goal (maybe it is for the lawyers), these are open source repos. Maybe they didn't consent to have their code used as training and that seems like the kind of thing consent should be needed for. Maybe this the AI spitting out copied snippets is a violation of open source licensing without attribution.

heavyset_go · on Nov 3, 2022

I don't want money, I want the terms of my licenses to be adhered to.

whiddershins · on Nov 3, 2022

I believe there are statutory damages or penalties in many cases. At least with music and images.

michaelmrose · on Nov 3, 2022

So for iseven can we go for how much a student might accept 20 an hour say and multiply that by the one minute required to create it and offer them 33 cents?

pmoriarty · on Nov 3, 2022

"Using such calculations they would likely estimate the man hours based on number of lines of code and multiply that with the average salary of a programmer."

The average salary of a programmer in which country?

So much programming is outsourced these days, and in some places programmers are very cheap.

belorn · on Nov 3, 2022

This is just my guess, but I think the intention from the judges is not to actually calculate a true number. The reason they used the cost of publishing fees in the piratebay case was likely to illustrate how the court distinguished between a legal publisher vs an illegal one. The legal publisher would have bought the publishing rights, and since piratebay did not do this, the court uses those publishing fees in order to illustrate the difference.

If the court wanted to distinguish between Microsoft using their own programmers to generate code vs taking code from github users, then the salary in question would likely be that of Microsoft programmers. It would then be used to illustrate how a legal training data would look like compared to an illegal one.

imoverclocked · on Nov 3, 2022

Probably in the place where GitHub copilot is used and the location of the authority of the court.

kube-system · on Nov 3, 2022

Those damages are enumerated on pages 50-52. Remember, "damages" is being used in a legal sense here -- for a non-lawyer, you can interpret it more like "a dollar value on something someone did that was wrong". This is more broad than the colloquial use of the word.

Sometimes damages are statutory, i.e. they have a fixed dollar amount written right into the law. This lawsuit references one such law: https://www.law.cornell.edu/uscode/text/17/1203

citilife · on Nov 3, 2022

Say I produce a licensed library. Someone can pay me $5/year per license. I keep the code private and compile the code before sending it to customers.

If you have co-pilot trained on my code base (which was private), that then reproduces near replica's of my code then they sell it for $5/year...

Well, I'm eligible for damages.

cheriot · on Nov 3, 2022

> that then reproduces near replica's of my code

Copying a few lines is not the same as copying the whole thing. Sharing quotes from a book is not copyright infringement.

heavyset_go · on Nov 3, 2022

If your intent is to create a competing product for profit, chances are that won't be found as fair use, given that determining fair use depends on intent and how the content is used.

Using clips from a movie in a movie review is probably fair use.

Using clips from a movie in knock-off of that movie for profit? Probably not fair use if it's not a parody.

Copilot is not like a movie reviewer using clips to review a movie. Copilot is like a production team for a movie taking clips from another movie to make a ripoff of that movie and selling it.

rolenthedeep · on Nov 3, 2022

Interesting analogy.

Consider every repo on github to be a movie. Copilot is taking individual frames out of every movie on github and composting them into a new film.

I think most of us would agree that individually, each frame is copyrighted. But what if you take one frame from a million different movies and put them in an order that produces a new coherent movie?

The core question we need to settle in court is: does the new movie become its own copyrightable work, or is it plagiarism?

teddyh · on Nov 4, 2022

The question was basically settled with music sampling:

https://en.wikipedia.org/wiki/Sampling_(music)#Legal_and_eth...

I.e. any use without permission is illegal.

az226 · on Nov 5, 2022

You're mistaking the end-user's copyright infringement with Copilot's alleged infringement.

Copilot is fair use and transformative -- that is unless there is an open source Copilot that Copilot is training on, only then would it be competing and it's easy for GitHub or OpenAI to exclude those repos of copilot alternatives from the training set.

cheriot · on Nov 4, 2022

> Copilot is like a production team for a movie taking clips from another movie to make a ripoff of that movie and selling it.

I can't think of a 5 line snippet I've written or read that makes sense to claim ownership of. They don't stand on their own in the way even a 30s movie clip does.

bawolff · on Nov 3, 2022

I dont think that's comparable. For starters, its not just the length of a quote that makes it fair use, but the way quotes are used i.e. to engage in commentary.

test098 · on Nov 3, 2022

> Sharing quotes from a book is not copyright infringement.

It is if I take those quotes and publish them as my own in my own book.

yawnxyz · on Nov 3, 2022

I don't think this is possible for co-pilot to do?

(If it was, please tell me how, since that would save me $5/year across multiple libraries..!)

sigzero · on Nov 3, 2022

I don't believe it does anything with private repos and that isn't what is being alleged.

mdaEyebot · on Nov 3, 2022

It's the license that matters, not whether the code is visible on Microsoft's website.

Code which anybody can view is called "source available". You aren't necessarily allowed to use the code, but some companies will let their customers see what is going on so they can better integrate the code, understand performance implications, debug and fix unexpected issues, etc. The customers would probably face significant legal risks if they took that code and started to sell it.

"Open source" code implies permission to re-use the code, but there is still some nuance. Some open-source licenses come with almost no restrictions, but others include limiting clauses. The GPL, for example, is "viral": anybody who uses GPL code in a project must also provide that project's source code on request.

What do you think the chances are that Microsoft would surrender the Copilot codebase upon receipt of a GPL request?

joxel · on Nov 3, 2022

But that isn’t what is being alleged

toomuchtodo · on Nov 3, 2022

The parallels to music sampling are somewhat humorous. Where is fair use vs misappropriation? To be discovered!

schappim · on Nov 3, 2022

Soon we'll have to use Mechanical Turk[0] to identify existing opensource code similar to what Girl Talk did with "Feed the Animals"[1].

Unrelated, how is it that Mechanical Turk was never truely integrated w/ AWS?

[0] https://www.mturk.com/

[1] https://waxy.org/2008/09/girl_turk/

TheCoelacanth · on Nov 3, 2022

Aren't there statutory damages for copyright infringement, i.e. there is a presumption that each work infringed is worth at least a certain amount without proving actual damages?

BenjiWiebe · on Nov 4, 2022

Well, the code I write is under GPL, at least it is when I remember to put an explicit license to it or if anyone asks be for permission to use it.

If someone wants to use it commercially without complying with the GPL, I have no problem with allowing that, for a price.

Either use the code freely and openly, or pay me so you can make money on my code.

Copilot could conceivably allow someone to use my code commercially (and in a closed manner) without negotiating with me, the copyright holder.