Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Would you like to propose an alternative solution that meets their needs and on their budget?


Anubis has a 'slow' and a 'fast' mode [1], with fast mode selected by default. It used to be so fast that I rarely used to get time to read anything on the page. I don't know why it's slower now - it could be that they're using the slower algorithm, or else the algorithm itself may have become slower. Either way, it shouldn't be too hard to modify it with a different algorithm or make the required work a parameter. This of course has the disadvantage of making it easier for the scrapers to get through.

[1] https://anubis.techaro.lol/docs/admin/algorithm-selection


The DIFFICULTY environment variable already allows for configuring how many iterations the program will run (in powers of 10).

The fast/slow selection still applies, but if you put up the difficulty, even the fast version will take some time.


a static cache for anyone not logged in, and only doing this check when you are authenticated which gives access to editing pages?

edit: Because HN is throwing "you're posting too fast" errors again:

> That falls short of the "meets their needs" test. Authenticated users already have a check (i.e., the auth process). Anubis is to stop/limit bots from reading content.

Arch Wiki is a high value target for scraping so they'll just solve the anubis challenge once a week. It's not going to stop them.


> Arch Wiki is a high value target for scraping so they'll just solve the anubis challenge once a week. It's not going to stop them.

The goal of Anubis isn't to stop them from scraping entirely, but rather to slow down aggressive scraping (e.g. sites with lots of pages being scraped every 6 hours[1]) so that the scraping doesn't impact the backend nearly as much

[1] https://pod.geraspora.de/posts/17342163, which was linked as an example in the original blog post describing the motivation for anubis[2]

[2]: https://xeiaso.net/blog/2025/anubis/


The point of a static cache is that your backend isn't impacted at all.


That falls short of the "meets their needs" test. Authenticated users already have a check (i.e., the auth process). Anubis is to stop/limit bots from reading content.


... Are you saying a bot couldn't authenticate?

Still need a layer there, could also have been a manual login to pull a session token.


> Arch Wiki is a high value target for scraping so they'll just solve the anubis challenge once a week.

ISTR that Anubis allows the site-owner to control the expiry on the check; if you're still getting hit by bots, turn the check to 5s with a lower "work" effort so that every request will take (say) 2s, and only last for 5s.

(Still might not help though, because that optimises for bots at the expense of humans - a human will only do maybe one actual request every 30 - 200 seconds, while a bot could do a lot in 5s).


Rather than a time to live you probably want a number of requests to live. Decrement a counter associated with the token at every request until it expires.

An obvious followup is to decrement it by a larger amount if requests are made at a higher frequency.


Does anyone know if static caches work? No one seems to have replied to that point. It seems like a simple and user-friendly solution.


Caches would only work if the bots were hitting routes that any human had ever hit before.


They'd also work if the bot, or another bot, hits that route before. It's a wiki, the amount of content is finite and each route getting hit once isn't a problem.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: