Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is anything known about what extent if any non-public domain books are used for LLM’s?

One example is the Google books project made digital quite a few texts, but I’ve never heard if Google considers these fair game to train on for Bard.

Most of the copyright discussions I’ve seen have been around images and code but not much about books.

Seems to become more relevant as things scale up as indicated by this article.



>we found 72,508 ebook titles (including 83 from Stanford University Press) that were pirated and then widely used to train LLMs despite the protections of copyright law

https://aicopyright.substack.com/p/the-books-used-to-train-l...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: