AFAIK, using copyrighted data to train does not necessarily make the trained mod...

pabs3 · on April 15, 2021

The phrase is "toxic candy" not "toxic", see the policy for what it means.

Most data is protected by copyright, but I assume you meant proprietary rather than copyrighted. Using proprietary data might not matter under copyright law, but it does matter in terms of the Debian machine learning policy and DFSG, because the non-free data cannot be shipped in Debian main and thus cannot be used to train a model shipped in main.

pabs3 · on April 15, 2021

Hmm, that case doesn't appear to be about ML though, could you explain how it is considered a precedent for ML?

donpark · on April 16, 2021

See https://towardsdatascience.com/the-most-important-supreme-co...

pabs3 · on April 18, 2021

Thanks. Its interesting that this only applies to countries with the concept of fair use, which is unfortunately not widespread.