> This led me to my next and (currently) final stop, Kullish, which searches through a number of link aggregation and discussion websites (including Reddit) for a URL before providing a single feed of comments from everywhere.
But reddit for instance disallows everything in its robots.txt
In this specific case, Reddit makes an API[1] available for developers.
And again, there is no crawling involved.
Crawling is a very specific, well defined behavior. I'll assume with good intent that you aren't familiar with the definition of crawling, in which case you probably should become familiar with it before making inaccurate comments like these in the future.
If you had taken the time to do some relevant reading before writing your inaccurate comments, you would have seen:
> Our [Reddit] robots.txt is for search engines, not Data API users.
I was under the impression that reddit was charging for its API following the whole third-party client debacle that occurred some time ago—I didn't think you were using the API to power a free service if that was the case.
If AI agents figure out how to buy a subscription and transfer money from their operators to me, they are more than welcome to scrape away.
[1]: https://lgug2z.com/articles/in-the-age-of-ai-crawlers-i-have...