Google's reCAPTCHA makes it impossible to use large portions of the web once you take reasonable measures to protect your privacy. The challenge will continuously fail, despite you spending time to carefully solve it. This cruel behavior is described in a patent [1] by Kyle Adams of Juniper Networks.
I'm logged in on a Chrome browser with a residential IP and get a reCAPTCHA 1-3 times a day when programming. It's the kind that I don't need to solve puzzles but still need JS enabled to click the button. So after the first few (SEO optimized) pages I either get fingerprinted or temp banned. Ugh this is getting ridiculous.
Cloudflare must be mentioned when talking about recaptcha and cancer. They are the ones locking people out from whole websites and forcing you to fill out these recaptchas. They are also the ones who have almost destroyed browsing the internet using TOR due to these recaptchas.
While I agree with you -- I'd also like to point out that >90% of malicious traffic to the websites I administer comes through the Tor network.
It shouldn't be the case, and I don't want to block people who have a legitimate reason to use Tor. Unfortunately there isn't a "block Tor traffic from assholes" option, so all I can really do to reduce the malicious traffic is block exit nodes.
This has nothing to do with Tor. Cloudflare frequently blacklists entire countries/counties worth of people (and rarely reverts those blacklists). There is a good chance, that you have missed a lot Indian/Vietnamese/Russian/Chinese visitors, because Cloudflare concluded, that forwarding their traffic to your site isn't financially viable for them.
> Unfortunately there isn't a "block Tor traffic from assholes" option
What exactly is "Tor traffic from assholes"? Bulk DDoS attacks? E-mail spam? SSH login attempts? Please share your valuable experience with everyone here, so that all of us could stay safe by learning from your example.
And for companies that don't do business with those countries - this is not a loss.
Most "asshole" traffic I see falls into one of two categories - attempts to exploit vulnerabilities (../../../etc/passwd stuff) and account takeover attacks.
The first I can forgive, I don't frankly care where that traffic comes from and the responsibility is entirely mine as website admin to prevent these types of attacks through good coding practices, WAF, etc.
The second I have less control over because customers / the general public sucks at security. They re-use passwords they've had for 10 years and won't opt-in to 2fa. And as a merchant, my company generally eats the cost of fraud that these attacks generally result in.
If no or little legitimate traffic is coming from Tor, and a significant percentage of malicious traffic is coming from Tor - at great cost to me / my company - why the hell would I allow it to continue?
One simple solution I can think of is to restrict POST requests from Tor exit nodes while still allowing GET requests. Cloudflare will give you a impossible-to-solve captcha even if you just try to visit site.com/index.html and I see no reason for this.
Is the issue Tor traffic, or that you know what traffic is Tor?
There are many types of "abuse" (not just trolling) - mass downloading/scanning. (Ex: several types of port scanning can't be done via Tor since it doesn't support UDP)
I see that your heart is in the right place, but I think as web developers we should take a blood oath that we will always optimize for standard compliance, instead. And for a standard that is not a moving target, while we're at it.
But when do we move on? When most browsers implement something the same way, or when all do? What about polyfills? What do you do when you need a new API to better support a user's device with a new form factor, interaction model, wide colour gamut, resolution, background threads, etc.? Tell them to not upgrade? Stop the world? It seems impractical to suggest "target a standard: job done, go home..."
If we target standards, then the standards are driving. The browser gets supported when it builds to the standards. Perhaps the issue will then be getting standards in place quickly around new capabilities?
Then maybe the standards process needs disruption. But if we don't build to standards then we are building roads that only certain cars can drive.
This is unfortunately not true - browsers are driving. Especially when entity everyone uses (Google) also owns the most popular browser. They can, and did, implement non-standard features that only worked in Chrome. Super cool tech demos, you have to see it, just install this browsers from an advertising company. What could go wrong?
Well considering that Google already specifically blocks Chromium based Edge from its current YouTube version may be Recaptcha will not work in it soon too.
Google is not blocking edge, or at least we have no proof of that. In this instance I think it's safe to assume an oversight based on naive user agent whitelisting.
And before I get accused of shilling, I hate chrome and despise Google with a passion.
Do you know of any good alternatives? I would love to get rid off recaptcha but it is a very convenient and quick to set up way to stop most spam bots.
Remember that reCAPTCHA v1 used to be noble: reading books and converting them to text.
Now you're just training many Google machine learning algorithms by classifying data. In which they get more useful for the consumer, thus more powerful.
I hate them as much as you do, but you're wrong. Those storefront and traffic sign captchas are not useful for training ML models. If they were to be useful, they would be much more varied, like the original ones (used for OCR).
>Those storefront and traffic sign captchas are not useful for training ML models.
Not to get all tin-foil-hat, but this is going to sound like it, but if you have a car that has 9+ cameras upon it that drives in areas full of these, then maybe there would be some use for it for Google.
Bear in mind that I'm not saying that they are doing this but to dismiss it unequivocally as something that can't or wouldn't be done entirely ignores the premise that it could prove useful to other areas of their business, which might have a vested interest in such use (say, for example, if Google or it's parent company were trying to break into the self-driving car area[0]).
I would love to see some evidence (a link or something) of this. I see captchas that look like pretty good edge-detection discriminators- street lights in tree limbs, bicycles against brick, and so on.
Since they introduced the square-selecting captchas I have always assumed that they use it for identifying the user. I bet that depending on how you solve the captchas they can identify who you are if their system already has a theory of who you might be.
They're implemented this particular way to provide training data for image segmentation systems, they move the image around inside the frame which allows them to use a few people doing the challenge to create a boundary representation that can be used to train things like YOLO style ML systems
They are able to verify that the user selection is correct. It is possible only if they already have the right answer. If they already have the right answer, what are they training for.
They have some known right answers and some they don't know. They check that you get the ones they know correct, and then they take the other info you provide and add some confidence that they are correct. This bootstraps the system.
The audio CAPTCHA always works first try for me. The image CAPTCHA can go eff itself, it would always take me five tries while the images loaded super slowly.
Yes, the audio CAPTCHA is easier to solve, but the audio challenge is blocked [1] if you are not in a good network neighbourhood or they can't collect enough tracking data to classify your visit.
Can confirm, it's rare for me to be able to get at the audio captcha. Occasionally I'll find that tabbing onto the the button allows it load when clicking directly on it won't. I assume if Google is observing behavior that makes them think you're sighted, they'll block access.
I kind of wonder if it would be possible to force the issue legally as an accessibility problem, but other people than me would need to do it, and in any case it feels a kind of dirty to me to use blind accessibility as a tool in the fight for privacy.
On the other hand, it also feels dirty to me that being blind would mean you're not allowed to do as much on the web to protect your privacy. Blind people should be able to use Tor.
> Can confirm, it's rare for me to be able to get at the audio captcha. Occasionally I'll find that tabbing onto the the button allows it load when clicking directly on it won't. I assume if Google is observing behavior that makes them think you're sighted, they'll block access.
That'd be very cruel to ignore those with vision, but who don't have anything close to perfect vision or correctable vision through glasses. It would also ignore those who have poorer vision as well as have difficulties in recognizing patterns. There's a whole spectrum of accessibility issues, and trying to "fail people" who seem to have enough vision to click on an audio button would be the definition of being evil.
> I kind of wonder if it would be possible to force the issue legally as an accessibility problem, but other people than me would need to do it, and in any case it feels a kind of dirty to me to use blind accessibility as a tool in the fight for privacy.
Even if this is not possible legally in all jurisdictions, enough publicity and outrage could help. There should certainly be some journalists from major publications/site reading HN (or HN readers with journalist contacts) who can investigate and write about this.
I dislike Google reCAPTCHA, however, it brought down contact form and comment spam to almost zero. (With the price of an unknown number of false positives and some frustrated users.)
[1] https://patents.google.com/patent/US9407661