Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Clearly Copilot had permission to make (unmodified) copies, the same way Github's webserver had permission to make (unmodified) copies. The lawsuit is about making partial copies without attribution.


GitHub's terms of service (TOS), in my non-lawyerly opinion, clearly states the license for uploaded works granted to them by users doesn't cover using the data to train an LLM or any kind of model beyond those used to improve the hosting service:

>You grant us and our legal successors the right to store, archive, parse, and display Your Content, and make incidental copies, as necessary to provide the Service, including improving the Service over time

>This license does not grant GitHub the right to sell Your Content. It also does not grant GitHub the right to otherwise distribute or use Your Content outside of our provision of the Service, except that as part of the right to archive Your Content, GitHub may permit our partners to store and archive Your Content in public repositories in connection with the GitHub Arctic Code Vault and GitHub Archive Program.

https://docs.github.com/en/site-policy/github-terms/github-t...

I think the important questions are (1) whether "the Service" includes Copilot, and (2) whether GitHub is selling users' content with Copilot.

For (1), I'm unhappy to admit Copilot probably does fall under "the Service," which is nebulously defined as "applications, software, products, and services provided by GitHub." But I'll still say that users' could not agree to this use while GitHub was training The Copilot model but hadn't yet announced it. At that time, a reasonable user would've believed GitHub's services only covered repository hosting, user accounts, and the extra features attached to those (issue trackers, organizations, etc).

GitHub could defend themselves on point (2) by saying they aren't selling the code, instead selling a product that used the code as input. But does that differ much from selling an online service that relies on running user code? The code is input for their servers, and it doesn't need to be distributed as part of that questionable service. But it's a clear break from the TOS.


GitHub’s web server is not the same thing as Copilot and needs separate permission.

GitHub didn’t just copy open source code they copped everything without respect to license. As such attribution which may have allowed some copying isn’t generally relevant.

Really a public repo on GitHub doesn’t even mean the person uploading it owns the code, if they needed to verify ownership before training they couldn’t have started. Thus by necessity they must take the stance that copyright is irrelevant.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: