I want to create a bubble of a space that’s free from the direct influence of AI. I believe that choice should exist, just like the choice to not be indexed by search engine bots over the web.
>> want to create a bubble of a space that’s free from the direct influence of AI.
It seems unlikely you will achieve that goal online, since there is no way (online) to differentiate between human, and bot. (Its unclear what constitutes AI in your mind but clearly a "dumb" crawler can gather information then scanned by an LLM.)
Of course nothing limits you to creating this space online. Think about creating such a space offline (where identifying humans is easier.)
>> I believe that choice should exist,
Of course the choice already exists. There are no LLMs at my farmers market.
Equally online you can choose not to use search engines, llms, social media, copilot, github, or any other tech you choose not to use. Expanding that bubble beyond yourself may be harder.)
I'm not on social media. So are lots of other people. We tend not to socialise online.
>> just like the choice to not be indexed by search engine bots over the web.
I fear that choice does not exist. You can certainly -indicate- that choice via robots.txt, but uou don't really get to "enforce" that choice, much less do you have an expectation that search engines are universally respecting that choice.
Let me say this next bit with respect. I say it with kindness, not malice. It's easier on you mentally if you fight battles you can win. At this point trying to define, much less live, a life unaffected by AI is like a cyclist railing against the use of cars [1]. Of course you arrange your life without a car. Of course you can socialise with like minded folk. But carving out spaces where cars are formally banned is rare.
[1] yes, I'm aware there are places that are less car-dependant than the US. Yes in Amsterdam there are a lot of bikes. But bikes tend to share the road with cars - there are very few car-free zones.
robots.txt is actually a really usefulay to tell an attacker where to look for juicy content that doesn't want to be indexed, but following it entirely voluntary. It's easy to imagine a dark web search engine that only has that content.
If you want your stuff to exist in the same way, but for OpenAI training, just block GPTBot in your robots.txt
bit snarky, but if don't think about/use what don't want AI to scan; then no possibility of AI scaning/getting relevant info don't want AI to get/have access to.
Of course, in order to make sure not 'thinking about things AI would scan/get access to' have to think about things AI would scan/get access to.