To state the obvious, IANAL. > This is indeed a weak-point in the contract appro...

To state the obvious, IANAL.

> This is indeed a weak-point in the contract approach: People can't be bound by an contract they never knew about nor agreed-to.

"Prominent notice" is important in the terms of use approach. Many terms of use claims have been dismissed because there was failure to give prominent notice - however, there have been successes, such as Hubbert vs Dell[1], where an appeals court reversed the trial court's decision and ruled in Dell's favour on the basis that they had given prominent notice of terms.

[1]:https://caselaw.findlaw.com/court/il-court-of-appeals/124479...

There are other potential legal avenues besides contract law, such as Unjust Enrichment[2], which, according to Wiki is analysed as:

    1. Was the defendant enriched?
    2. Was the enrichment at the expense of the claimant?
    3. Was the enrichment unjust?
    4. Does the defendant have a defense?
    5. What remedies are available to the claimant?

[2]:https://en.wikipedia.org/wiki/Restitution_and_unjust_enrichm...

Since AI companies are likely to be enriched (by a large amount), and it could be at the expense of a claimant if the AI (re)produces the claimants work or closely related work based on it (or in the case of a class action suit, potentially many works), the AI companies which violate a terms of use would have to argue in their favour for #3 and #4 - is the enrichment unjust and do they have a defense.

There are certainly arguments for AI trained on copyrighted works being unjust. Including a terms of use which specifically prevents this would be in the favor of the claimant. An AI company would have to defend their decision to ignore such terms and claim that doing so is not unjust.

Arguably, if the AI is sufficiently "intelligent" enough, it should be able to make sense of such terms and be used to filter such works from training data (unless specifically prompted to ignore). If the AI companies are not filtering the data they aggregate then there's a potential argument for negligence.

There's some efforts being made, such as the IETF aipref[3], which could standardize the way training data is collected/filtered by AI companies. Creative Commons have a related effort called "CC signals". These could also be helpful in a future claim if the AI companies ignore those preferences.

[3]:https://datatracker.ietf.org/wg/aipref/about/

So to me it seems having a clause in your license and/or terms of use which prevents use in AI training, providing prominent notice of such terms, and indicating AI preferences in the protocols, is going to be better than not having such clauses - because if you don't have the clause then the defendant can simply claim in their defense that "There was nothing in the terms which said we can't".

It's up to us to take a proactive approach to protecting our works from AI, even if there is not yet any solid legal basis or case law, so I applaud OP's efforts even if the license doesn't have carry any weight. If we don't attempt to protect our works from use AI training then the AI companies will win at everyone else's expense.

Creators should decide how their works are used, and we need to normalize it being the case that they decide whether or not use in AI training data should be permitted.