That can be done already based on User-Agent, though. Other browsers don't spoof their agent strings to look like Chrome, and never have (or, they do, but only in the sense that everyone still claims to be Mozilla). And browsers have always (for obvious reasons) been very happy to identify themselves correctly to backend sites.
The purpose here is surely to detect sophisticated spoofing by non-user-browser software, like crawlers and robots. Robots are in fact required by the net's Geneva Convention equivalent to identify themselves and respect limitations, but obviously many don't.
I have a hard time understanding robot detection as an issue of "user freedom" or "browser competition".
>I have a hard time understanding robot detection as an issue of "user freedom" or "browser competition".
The big one is that running a browser other than Chrome (or Safari) could come to mean endless captchas, degrading the experience. "Chrome doesn't have as many captchas" is a pretty good hook.
Not to mention how often you can get stuck in an infinite loop where it just will not accept your captcha results and keeps making you do it over and over. Especially if you’re using a VPN. It’s maddening sometimes. Can’t even do a basic search
So the market isn't allowed to detect robots because some sites have bad captcha implementations? I'm not following. Captchas aren't implement by the browser.
> So the market isn't allowed to detect robots (...)
I don't know what you mean by "the market".
What I do know is that if I try to go to a site with my favourite browser and a site blocks me because it's so poorly engineered it thinks I am a bot just because I'm not using Chrome, then it's pretty obvious that it's not detecting bots.
Also worth noting: it might surprise you that there browser automation frameworks. Some of them, such as Selenium, support Chrome.
I’m not sure who “the market” is in this case, but reCAPTCHA is owned and implemented by Google and clearly favors their browser. Any attempts to use other browsers or obfuscate your digital footprint in the slightest leads to all kinds of headaches. It’s a very convenient side effect of their “anti-bot” efforts that they have every incentive to steer in to.
This isn't Google's doing but Mozilla's. Firefox's strict tracking protection blocks third-party cookies. The site you're trying to visit isn't hosting reCAPTCHA itself; reCAPTCHA was loaded from a third-party origin (Google); so the cookie that Google sets saying you passed the CAPTCHA is blocked by Firefox.
You can add an exception in Firefox's settings to allow third-party cookies for CAPTCHAs. Google's reCAPTCHA cookie is set by "recaptcha.net", and CloudFlare's CAPTCHA has exactly the same problem, whose domain is "challenges.cloudflare.com".
If the cookies aren't set and passed back, then they can't know that you've solved it, so you get another one.
You're blaming Mozilla because they fixed a security vulnerability, and then saying that the workaround is to reenable the vulnerability so that Google can continue surveilling.
Yet for some inexplicable reason all the other bot detection methods I encounter online don’t struggle at all with me and don’t stick me in infinite loops. Cloudflare for instance simply does not bug out with rare exceptions for me.
Maybe my experience is atypical but it seems to me this is a reCAPTCHA problem, not a Mozilla one. It’s Google’s problem. I imagine they can solve this but simply don’t want to.
Maybe I’m wrong but again, i encounter more issues with their “anti bot” methods than any other by a massive margin.
Concretely: Google meet blocks all sorts of browsers / private tabs with a vague: “you cannot join this meeting” error. They let mainstream ones in though.
I use Safari (admittedly, with Private Cloud and a few tracking-blocking extensions) and get bombarded with Cloudflare's 'prove you are human' checkbox several times an hour.
> I have a hard time understanding robot detection as an issue of "user freedom" or "browser competition".
In the name of robot detection, you can lock down device, require device attestation, prevent users from running non-standard devices/OS/software, prevent them from accessing websites (CloudFlare dislikes non-chrome browser and hates non-standard browsers, ReCaptcha blocks you out if you're not on Chrome-like/Safari/Firefox). Web Environment Integrity[1] is also a good example of where robot detection ends up affecting the end user.
The purpose here isn't to deal with sophisticated spoofing. This is setting a couple of headers to fixed and easily discoverable values. It wouldn't stop a teenager with Curl, let along a sophisticated adversary. There's no counter-abuse value here at all.
It's quite hard to figure out what this is for, because the mechanism is so incredibly weak. Either it was implemented by some total idiots who did not bother talking at all to the thousands of people with counter-abuse experience that work at Google, or it is meant for some incredibly specific case where they think the copyright string actually provides a deterrent.
(If I had to guess, it's about protecting server APIs only meant for use by the Chrome browser, not about protecting any kind of interactive services used directly by end-users.)
I would imagine that this serves the same purpose as the way that early home consoles would check the inserted cartridge to see that it had a specific copyright message in it, because then you can't reproduce that message without violating the copyright.
In this case, you would need to reproduce a message that explicitly states that it's Google's copyright, and that you don't have the right to copy it ("All rights reserved."). Doing that might then give Google the legal evidence it needs to sue you.
In other words, a legal deterrence rather than a technical one.
> Why do you think Chrome bothers with this extra headers. Anti-spoofing, bot detection, integrity or something else?
Bot detection. It's a menace to literally everyone. Not to piss anyone off, but if you haven't dealt with it, you don't have anything of value to scrape or get access to.
> Bot detection. It's a menace to literally everyone. Not to piss anyone off, but if you haven't dealt with it, you don't have anything of value to scrape or get access to.
What leads you to believe that bit developers are unable to set a request header?
They managed fine to set Chrome's user agent. Why do you think something like X-Browser-Validation is off limits?
Because you would need to reproduce an explicit Google copyright statement which states that you don't have the right to copy it ("All rights reserved.") in order to do it fully.
That presumably gives Google the legal ammunition it needs to sue you if you do it.
I'm no lawyer, but my take on it is that by reproducing this particular value for the validation header, you are stating that you are the Chrome browser. It's likely that this has been implemented in such a way that other browsers could use it too if they so choose; the expected contents of the copyright header can then change depending on what you have in the validation header.
To me, it seems likely that the spec is for a legally defensible User-Agent header.
Only if they know to implement it and while it uses a more trivial approach. I expect it to become increasingly difficult gradually. It's also yet another way to make mistakes and make it entirely obvious that one is forging Chrome.
Yes I think it is part of their multi level testing of for new version rollouts. In addition to all the internal unit and performance tests, they want an extra level of verification that weird things aren't happening in the wild
It's still not clear to me because it's called the default API key. And for me, default means that this is normally overwritten. And if overwritten, during build or during install? That's what I'm asking myself.
Plenty of improvements to mouse movement algorithms have already been made and they’re still evolving. While the blog post and the product it introduces offer some interesting ideas, they don’t yet reach the robustness of modern anti-bot solutions and still trail current industry standards. I doubt it would take me - or any average reverse engineer - more than five seconds to bypass something like this. There are already numerous open source mouse movement libraries available; and even if they didn’t exist, writing one wouldn’t be difficult. Yes, mouse movement or keyboard data can be quite powerful in a modern anti-bot stack and an in depth analysis of it is genuinely valuable, but on its own it’s still insufficient. Relying on this data alone isn’t costly for the attacker and offers little real protection.
> To clarify, If I disclose the exploit publicly, my concern is that the company could take legal action against me, even if I don’t share any technical details or information that would allow someone to reproduce it. – Something I really don't want to deal with.
To clarify, If I disclose the exploit publicly, my concern is that the company could take legal action against me, even if I don’t share any technical details or information that would allow someone to reproduce it. – Something I really don't want to deal with.
In its current state, the protections are pretty weak. I’m sure they’ll update it, and we’ll see what changes they bring. If this header is meant to serve as an anti-bot measure, then there’s a lot more work they need to do both on the JS and WASM sides. On top of that, processing fingerprint data on the backend, like building user/fingerprint profiles, analyzing detailed browser, device and low level connection info, and using AI to spot patterns, makes the system a lot more complex. However, based on the current implementation, I anticipate they’ll likely stick to a relatively simplistic approach.
You’re right. In this case, just knowing the guest_id is enough to break down the header. Twitter’s main goal here is mostly to obfuscate the data and make the reverse engineering process more painful.
Why do you think Chrome bothers with this extra headers. Anti-spoofing, bot detection, integrity or something else?