If you expose a web server to the internet today you'll get 10 malicious requests for every 1 legitimate request.
This constant and unrelenting beating at your doors doesn't go away unless you add perimeter protection.
The options here are:
1) Block the IP and cidr ranges that are giving you trouble
2) Silently scan the connection request and block it when things look fishy
3) Provide a challenge in the return response that is difficult for bots to complete
Most of the bot protection on the internet is #2 where you don't notice you've been verified as a human and the site just loads. People hate #3 of completing a challenge, but the other option here is #1 where the site doesn't load at all.
4) Provide a challenge in the return response that is impossible for anyone to complete
One way to see this one is to use Selenium to launch your browser. E.g., run this code in Python:
from selenium import webdriver
browser = webdriver.Chrome()
then when the browser launches start using it manually to surf the web [1]. This works great on most sites I've visited this way, including my financial institutions. But if it hits a Cloudflare CAPTCHA it fails. For example try this on fanfiction.net. It hits the browser check page if I try to go to any category or story page. I click the checkbox to tell it I'm real, get the challenge to identify the lions or whatever, do that until it is satisfied I really can identify lions...and then just goes back to the browser check page. As far as I can tell it is just an endless loop of check the box and identify the things at that point.
There are some settings you can do in Selenium to tell it to to somewhat hide from the site that Selenium is involved, which for a while allowed getting past the CAPTCHA but that stopped working after a while.
There's also a project somewhere on Github to make a Selenium Chrome driver specifically designed to not trigger bot detection, which also worked for a while and then stopped.
[1] Why would I want a Selenium-launched browser if I'm going to be using it manually? It's for sites where I want to do some automated things on just some pages. For example one of my financial institutions has a lot of options on their transaction download page, so after I finish manually doing things like checking balances, looking at recent activity, paying bills and want to finish by downloading transactions, I can have the script that launched the browser handle that.
Try launching the instance of Chrome with `--disable-web-security` and `--disable-features=IsolateOrigins,site-per-process` options. I use these when launching Chrome via Playwright, and CAPTCHAs seemed to work fine several months ago.
When a selenium worker is attached to a pay-for-solution captcha service the infinite loop of captchas that can be solved but don't provide access would be meant to drain you financially. You uncovered a pretty sweet (dark) pattern implemented by Cloudflare to screw bot owners.
This is just #2 and #3 combined.
It sounds like this is working as intended and also wastes your time with un-passable captchas instead of you spending more time trying to figure out how to get around their bot protection.
Another observation here is that you really shouldn't be hacking some scripts on top of your bank login. The banks know this and they are trying everything possible to dissuade you from doing this.
Huh, apparently ‘the war on general computation’, of which Cory Doctorow spoke, won't necessarily be led by Disney and such corporations, but also by people denying others the right to automate the workings of the GUI on one's machine.
(Coincidentally, this practice might also preclude the operation of aeleveny tools—again, as Doctorow noted, ‘there is no known general-purpose computer that can execute all the programs except the naughty ones’. It might be fun to see the faces of the ‘you shouldn't’ folks when they're asked why less-able clients can't use their websites.)
> you really shouldn't be hacking some scripts on top of your bank login
You can hack whatever you want, but from a SECURITY perspective this is horrible and the banks know this. There are secure ways to store credentials for scripts but most people will just hard-code the values or stick them in unencrypted ENV vars. Also, who's fault is it when the bank updates their website and the selenium script does something horribly wrong? Tell me more about Disney...
Service providers always want full control of the user experience and bots get in the way of that. We know this, but very often, that's not in the interests of the users at all.
Hence why there are legitimate reasons to write bots snd continue the arms race - otherwise, we'll pretty soon end up in a world where YouTube's business model of "subscribe to premium so you we'll stop interrupting the videos when you minimize the app" will be the standard mode of operation.
i have never had a site hacked and i dont even know or care if its being attacked - just dont litter it with rce vulns. if its being ddosed on the other hand, then use an anti ddos solution but your post is such corpo bullshit that i cant even tell if its talking about defending against ddos or defending against hacks (which you cant defend against, they will get around your filters within 5 minutes of playing around).
I'm not sure you understand what Cloudflare is. They have various protections for websites including ones you do not like.
They don't host attacks. They don't even offer a hosting service for that code to run on really. Those attacks come from botnets, mostly hacked IOT devices and servers across the web.
My position is that Cloudflare's hosting of websites and forums where attackers coordinate and organize (what you refer to as "websites [I] do not like") contributes to an Internet environment in which Cloudflare's own tools are more necessary. I am not saying that the attacks themselves are sent from servers controlled by Cloudflare.
If I understand your position correctly, you’re advocating for a world where Cloudflare is the arbtrar of ethics, morality, and legality on the Internet? I think I’d rather they just continue to provide service for sites I disagree with? Thanks.
They can host all the political dubious sites they want, but when they sell a service that protects from what they’re hosting (booster services where they offer DDoS for sale sites), it starts to look an awful lot like a protection racket.
That’s a nice website you’ve got there. It would be a shame if someone DDoS’d it. Hey btw, I sell DDoS protection services. Gee, isn’t that just so convenient!
Cloudflare provides products that protect against malicious requests. They don't help craft those requests, and they don't flood traffic to websites that aren't using their product. It's not a racket.
That seems more like a misconfiguration on the site owner's part; if something is designed to be read by bots/programs, they shouldn't put a challenge on it.
Is it covered in the onboarding docs for that feature (a disable for routes used by bots (rss feeds, api endpoints etc) )? Or is it just happy path "Do these 3 steps to make your Wordpress cake blog safe from botnets"
The site owner has complete control over this in the CF dashboard, and can easily disable it or lower the threshold. Myself, I'm quite happy with stopping bad traffic (about 20% of the requests to my sites) at the edge with CF and keeping my hosting costs down.
nope if you use tor without tor browser you still get the captcha (one for the main domain of the site, then you have to find the cdn subdomain and open that and solve a separate captcha).
> bad traffic (about 20% of requests to my sites)
i have ran many websites too and have not needed cloudflare to deal with that "problem"
I've never used Cloudflare so apologies for what is probably documented somewhere. Can the site owners not set JS requirements per-URL? I ask because the same JS hidden browser tests can be added to NGinx and HAProxy using LUA scripting and it can be done by ACL for specific URL's. e.g. No-JS for static content and URL's that use GET but then require passing the JS hidden browser test prior to using a page that would require a POST. That is just one example of the myriad of possibilities. Can that not be set up in CF? Or is it all-or-none?
For people not using a CDN and wanting to keep bots off the static content, this can for now be partially accomplished doing two things. Forcing HTTP/2.0 and one raw table iptables rule to drop TCP SYN packets that do not have an MSS in the desired range. Most poorly written bots do not even bother to set MSS. I'd wager this is something CF looks at in their eBPF logic. Blocking non HTTP/2.0 requests will drop all search engine crawlers except for Bing.
To find the LUA scripts that do not depend on a centralized catcha service, search for "nginx lua ddos". [1][2] I do not have a live example at the moment as I took my hobby sites offline while the dust settles around the new California AB 2273 law.
Most of these give site-wide examples but one can run the LUA by locations or other ACL's to protect specific resources or exclude specific resources from protection. e.g. RSS feeds.
cloudflare as of this month shows propaganda on the captcha page, like "40% of the internet was historically bots" (as if that matters). it actually fits right in with, the common sentiment that the old internet was bad, welcome in the new internet where nothing is allowed unless it's a legitimate commercial use. this is getting out of hand.
How exactly do you imagine bot/attach protection (cloudflare's main product) working without JS? Even to bypass a captcha using your browser to assert trust requires JS.
Are captchas and DDoS bot protection ruining the web?
Anyone who monitors their web traffic would tell you the bots are ruining the web.
I hate these "are you human" checks too, but when a persistent threat is poking your defenses and legitimate web traffic is only 10% - 20% of your server load... you have to do something.
So the alternative here to receiving a challenge is that the site would just be blocked in your country or for your network provider.
Would you prefer to be outright blocked, or is it ok to have an annoying "are you human?" challenge?
Pretty interesting story about a tiny IP checking tool and how it sort of got out of hand sadly due to abuse. The solution? Major sold the site to Cloudflare for $1. But really kind of a shame overall.
> Seeing that over 90% of my traffic load was malicious and abusive was frustrating.
That story nailed it.
> If you’re curious, Cloudflare did pay me for the site. We made a deal for them to pay me $8.03; the cost of the domain registration. The goal was never to make money from the site.
A little more than $1, but basically the same idea.
Ah yes, I remembered it as a token amount, but obviously not completely accurate lol. But I worked with the author just a bit, so I used and still use the utility, and it's a shame because he's one of the more gracious people I've come across in the industry.
There is a clear distinction between bots (which are legitimate users) and DDoS. The latter aren't even repelled at the application layer but well earlier in the stack.
There is definitely an inherent tension between forcing legitimate users to load JS and store CloudFlare's cookies, and keeping bots off of services. As an individual, having to load random nonsense from CloudFlare does not improve my experience!
I thought they do? I'm not really sure, but I think that capchas also collect info about your browser etc... as well as identifiing you with some kind of "challenge", or am I wrong?
Modern captchas like Google's Recaptcha V3, hCaptcha, etc. require a lot more than an image. They track your reputation score, fingerprint your browser, OS, hardware, window size, installed fonts, analyze your mouse movements and probably more.
Cloudflare is not the one breaking the internet, Bots are, they are just providing a solution to deal with the bot problem.
This is also controlled by the Cloudflare customer. If I'm having issues with my server due to fake/hostile traffic coming to my website, you're dang right I will do what it takes to stop it.
It's annoying to some of us, and will only result in escalation, browser plugins will prolly be made to only run js in this context and not in the final render.
Some more wasted processing power, that might block unwanted requests, but apart from DDoSes, these requests shouldn't be a threat anyway. Maybe DoS zombie agents will be updated to run a bit of js, if it's worth the hassle.
Every day we stray further from what the web could have been if we could have nice things.
Technically this is just breaking the web, not the internet - none of the other protocols are being interfered with.
Even Cloudflare's DNS product is just standard DNS protocol, sitting behind network-level DDoS protections. It's only HTTP where they tamper with the application layer.
This is JS specifically in Cloudflare's own domain is it not? I don't think you need to enable JS after. So a JS / cookie blocking setup should be versatile enough to let you allow only Cloudflare JS and cookies which are then self-destructed.
Perhaps we should have a bare minimum definition of "web" as the core (HTTP and maybe a subset of HTML) and the rest be optional - like CSS, JS, cookies, web workers, etc. That way we can have "web" browsers that are tiny and fast for documents; and "web app" browsers that are huge but support running applications.
Yes. And judging by how many Americans die of gunshot wounds each year, and how many guns are sold, and the profits of the major gun manufacturers - how successful is that "blame the tool and manufacturer" strategy?
(Or is "success", for the anti-gun crowd, mostly about winning performative virtue contests in their own social media bubbles - while tens of thousands of "99.9% of 'em aren't like us, so we only pretend to care" people die?)
MEANWHILE, back at the Greatest Hypocrite Playoffs - I clicked on the link in a browser with cookies and js blocked. The article's web page (at Imgur.com) only says:
"If you're seeing this message, that means JavaScript has been disabled on your browser, please enable JS to make Imgur work."
(Privacy Badger & NoScript say they're blocking cookies from 6+ domains, and js from 12+ domains & subdomains. I know of Cloudflare-protected sites where allowing cookies from 1 domain and js from 2 subdomains are plenty to make them work right.)
Well...by my impression, "blaming the tool and manufacturer" is a rather simplistic and childish way of describing most of the successful national gun control strategies.
If you moved to the US, I'd suggest being very wary of gun ownership. That can feel very empowering. And it's pretty cool in a lot of social circles, kinda required in a few, seen less favorably in many others...but, unless our culture wars turn into actual shooting wars, there are few places in the U.S. where both (1) foreign-born people settle by choice and (2) gun ownership is net plus for the safety of your self and your family. (And (2) is dependent on you not merely owning a gun, but being fairly experienced in securing/handling/using it, plus pretty savvy about if/when/how to use it.)
Shouldn't the government regulate gun-related advertising, then? I have trouble assigning all blame to corporations playing by the rules. Those who set the rules wrongly are to be blamed, IMHO.
We have no problem blaming anyone, rightly or wrongly. Twitter is a fountain of blame, there's enough for everyone.
You can blame me, I think. I haven't done it yet, but I can see the day coming where one of the sites I maintain will move behind Cloudflare. That site gets too much shit. The other day Little Bobby Tables browsed the site, and it wasn't the first time. I have to choose: Deal with the low-level shit or require javascript from a bunch of users who mostly have that enabled anyway. So blame me, and all the site operators who face the same choice.
This constant and unrelenting beating at your doors doesn't go away unless you add perimeter protection.
The options here are:
1) Block the IP and cidr ranges that are giving you trouble
2) Silently scan the connection request and block it when things look fishy
3) Provide a challenge in the return response that is difficult for bots to complete
Most of the bot protection on the internet is #2 where you don't notice you've been verified as a human and the site just loads. People hate #3 of completing a challenge, but the other option here is #1 where the site doesn't load at all.
I'd argue that bots are breaking the internet.