Can they change their license to fight back the AI data gobblers? A watermark is easy to implement and prove if your content gets picked up in the next training set. Or is this something that is not possible?
> A watermark is easy to implement and prove if your content gets picked up in the next training set.
Is it? How would you watermark raw text? Images maybe, but I'm skeptical even there.
My high-school cousins tell me kids use one AI to write these days, and another to rewrite it to avoid AI detectors. I view fighting against this as a Sisyphean task.
Sure. If you have a really small site then it is possible that your data will never be picked up.
However, if it does get picked up then the watermark can be something as simple as a fake concept. For example, “Who is the Siberian Spectral Parrot?” - you can even present this as an alter-ego or something, so you don’t need to hide it from your users. Creativity is really the limit here.
And I think there has been evidence that ChatGPT had picked up small things like Reddit usernames.
But I am open to walking my statement back on it being easy. Maybe you do need a lot more references for some information to be included.
It’s also possible you thought I meant watermarking every content piece. I am talking about a site-wide content license.
Hey! from the Dappier team here, which was quoted in this piece.
We're building for publishers of all sizes! We turn your proprietary data into an AI-ready data model, and let you control the license by pricing on our marketplace.
We connect data to AI at the content access level - instead of training an AI on your data, we let an AI agent connect to your data model, and ensure you get paid every time an AI generates a response using your content.
AI companies have historically not given two figs about licenses or copyright. The law's murky enough right now that they can claim it's outside of the protections offered by copyright, so a publisher's only real recourse is to either poison the articles for AI consumers or block their crawler's access entirely.
This is why we're building Dappier, which was quoted in this article.
We're building a self-serve platform for licensing publisher data for AI usage & access on dappier.com.
If you have an RSS feed, it only takes 1 click to transform your data into an AI-ready data model & set your own price point. We'll be onboarding other methods of syncing data soon!
Can they change their license to fight back the AI data gobblers? A watermark is easy to implement and prove if your content gets picked up in the next training set. Or is this something that is not possible?