Hacker Newsnew | past | comments | ask | show | jobs | submit | dsekz's commentslogin

Dug into chrome.dll and figured out how the x-browser-validation header is generated. Full write up and PoC code here: https://github.com/dsekz/chrome-x-browser-validation-header

Why do you think Chrome bothers with this extra headers. Anti-spoofing, bot detection, integrity or something else?


Making it easier to reject "unapproved" or "unsupported" browsers and take away user freedom. Trying to make it harder for other browsers to compete.


That can be done already based on User-Agent, though. Other browsers don't spoof their agent strings to look like Chrome, and never have (or, they do, but only in the sense that everyone still claims to be Mozilla). And browsers have always (for obvious reasons) been very happy to identify themselves correctly to backend sites.

The purpose here is surely to detect sophisticated spoofing by non-user-browser software, like crawlers and robots. Robots are in fact required by the net's Geneva Convention equivalent to identify themselves and respect limitations, but obviously many don't.

I have a hard time understanding robot detection as an issue of "user freedom" or "browser competition".


>I have a hard time understanding robot detection as an issue of "user freedom" or "browser competition".

The big one is that running a browser other than Chrome (or Safari) could come to mean endless captchas, degrading the experience. "Chrome doesn't have as many captchas" is a pretty good hook.


Not to mention how often you can get stuck in an infinite loop where it just will not accept your captcha results and keeps making you do it over and over. Especially if you’re using a VPN. It’s maddening sometimes. Can’t even do a basic search


So the market isn't allowed to detect robots because some sites have bad captcha implementations? I'm not following. Captchas aren't implement by the browser.


> So the market isn't allowed to detect robots (...)

I don't know what you mean by "the market".

What I do know is that if I try to go to a site with my favourite browser and a site blocks me because it's so poorly engineered it thinks I am a bot just because I'm not using Chrome, then it's pretty obvious that it's not detecting bots.

Also worth noting: it might surprise you that there browser automation frameworks. Some of them, such as Selenium, support Chrome.


So the cool thing is we can now add an x-browser-validation header to selenium (and firefox).


Exactly


I’m not sure who “the market” is in this case, but reCAPTCHA is owned and implemented by Google and clearly favors their browser. Any attempts to use other browsers or obfuscate your digital footprint in the slightest leads to all kinds of headaches. It’s a very convenient side effect of their “anti-bot” efforts that they have every incentive to steer in to.


This isn't Google's doing but Mozilla's. Firefox's strict tracking protection blocks third-party cookies. The site you're trying to visit isn't hosting reCAPTCHA itself; reCAPTCHA was loaded from a third-party origin (Google); so the cookie that Google sets saying you passed the CAPTCHA is blocked by Firefox.

You can add an exception in Firefox's settings to allow third-party cookies for CAPTCHAs. Google's reCAPTCHA cookie is set by "recaptcha.net", and CloudFlare's CAPTCHA has exactly the same problem, whose domain is "challenges.cloudflare.com".

If the cookies aren't set and passed back, then they can't know that you've solved it, so you get another one.


You're blaming Mozilla because they fixed a security vulnerability, and then saying that the workaround is to reenable the vulnerability so that Google can continue surveilling.


Yet for some inexplicable reason all the other bot detection methods I encounter online don’t struggle at all with me and don’t stick me in infinite loops. Cloudflare for instance simply does not bug out with rare exceptions for me.

Maybe my experience is atypical but it seems to me this is a reCAPTCHA problem, not a Mozilla one. It’s Google’s problem. I imagine they can solve this but simply don’t want to.

Maybe I’m wrong but again, i encounter more issues with their “anti bot” methods than any other by a massive margin.


Concretely: Google meet blocks all sorts of browsers / private tabs with a vague: “you cannot join this meeting” error. They let mainstream ones in though.


I use Safari (admittedly, with Private Cloud and a few tracking-blocking extensions) and get bombarded with Cloudflare's 'prove you are human' checkbox several times an hour.

It's already a pretty degraded experience.


I mean you're using a VPN, they can't tell the diff between you and a bunch of bots


Requests per second?


Harder to scale, stateful.


I think you mean they can't profit from selling data from a bunch of bots.


> I have a hard time understanding robot detection as an issue of "user freedom" or "browser competition".

In the name of robot detection, you can lock down device, require device attestation, prevent users from running non-standard devices/OS/software, prevent them from accessing websites (CloudFlare dislikes non-chrome browser and hates non-standard browsers, ReCaptcha blocks you out if you're not on Chrome-like/Safari/Firefox). Web Environment Integrity[1] is also a good example of where robot detection ends up affecting the end user.

[1] https://en.wikipedia.org/wiki/Web_Environment_Integrity


Aren't all those solutions even more impactful on the user experience though? Someone who cares about user freedom would think they're even worse, no?


The purpose here isn't to deal with sophisticated spoofing. This is setting a couple of headers to fixed and easily discoverable values. It wouldn't stop a teenager with Curl, let along a sophisticated adversary. There's no counter-abuse value here at all.

It's quite hard to figure out what this is for, because the mechanism is so incredibly weak. Either it was implemented by some total idiots who did not bother talking at all to the thousands of people with counter-abuse experience that work at Google, or it is meant for some incredibly specific case where they think the copyright string actually provides a deterrent.

(If I had to guess, it's about protecting server APIs only meant for use by the Chrome browser, not about protecting any kind of interactive services used directly by end-users.)


I would imagine that this serves the same purpose as the way that early home consoles would check the inserted cartridge to see that it had a specific copyright message in it, because then you can't reproduce that message without violating the copyright.

In this case, you would need to reproduce a message that explicitly states that it's Google's copyright, and that you don't have the right to copy it ("All rights reserved."). Doing that might then give Google the legal evidence it needs to sue you.

In other words, a legal deterrence rather than a technical one.


It's easy to change the User Agent and we cannot handwave this fact away for the sake of argument.


> Why do you think Chrome bothers with this extra headers. Anti-spoofing, bot detection, integrity or something else?

Bot detection. It's a menace to literally everyone. Not to piss anyone off, but if you haven't dealt with it, you don't have anything of value to scrape or get access to.


> Bot detection. It's a menace to literally everyone. Not to piss anyone off, but if you haven't dealt with it, you don't have anything of value to scrape or get access to.

What leads you to believe that bit developers are unable to set a request header?

They managed fine to set Chrome's user agent. Why do you think something like X-Browser-Validation is off limits?


Because you would need to reproduce an explicit Google copyright statement which states that you don't have the right to copy it ("All rights reserved.") in order to do it fully.

That presumably gives Google the legal ammunition it needs to sue you if you do it.


Companies like SEGA have tried doing stuff like that in the past, and lost.


It seems like the requirement to reproduce this copyright header alone, nevermind the validation hash, would be enough to scare off scrapers?


I'm no lawyer, but my take on it is that by reproducing this particular value for the validation header, you are stating that you are the Chrome browser. It's likely that this has been implemented in such a way that other browsers could use it too if they so choose; the expected contents of the copyright header can then change depending on what you have in the validation header.

To me, it seems likely that the spec is for a legally defensible User-Agent header.


> They managed fine to set Chrome's user agent. Why do you think something like X-Browser-Validation is off limits?

It's not off-limits technically. But do you think it'll remain this simple going forward? I doubt that.


Do you mean bot and non-Chrome-using human detection?


Bots can easily copy the header though so I don't see how that helps?


Only if they know to implement it and while it uses a more trivial approach. I expect it to become increasingly difficult gradually. It's also yet another way to make mistakes and make it entirely obvious that one is forging Chrome.


Bullshit. You don't have anything of value either. Scrapers will ram through _anything_, and figure out if it's useful later.


Seems like they are using these headers only for google.com requests.


Yes I think it is part of their multi level testing of for new version rollouts. In addition to all the internal unit and performance tests, they want an extra level of verification that weird things aren't happening in the wild


They probably are using it to block bots scraping Google results is my theory


I have two questions:

1. Do I understand it correctly and the validation header is individual for each installation?

2. Is this header only in Google Chrome or also in Chromium?


>1. Do I understand it correctly and the validation header is individual for each installation?

I'm not sure how you got that impression. It's generated from fixed constants.

https://github.com/dsekz/chrome-x-browser-validation-header?...


It's still not clear to me because it's called the default API key. And for me, default means that this is normally overwritten. And if overwritten, during build or during install? That's what I'm asking myself.


I had the same question (2). https://news.ycombinator.com/item?id=44560664

If it's only in the closed-source Chrome, then it seems it's intended to help Google's servers distinguish between Google's own products and others.

But I've never seen a Google site which worked less-well in Chromium than in Chrome, so I'm somewhat skeptical of this. Perhaps there are exceptions


Is it not likely that it protects against AI bot Llama?


I don't see how you can "protect" against a large language model that cannot do browsing.


Plenty of improvements to mouse movement algorithms have already been made and they’re still evolving. While the blog post and the product it introduces offer some interesting ideas, they don’t yet reach the robustness of modern anti-bot solutions and still trail current industry standards. I doubt it would take me - or any average reverse engineer - more than five seconds to bypass something like this. There are already numerous open source mouse movement libraries available; and even if they didn’t exist, writing one wouldn’t be difficult. Yes, mouse movement or keyboard data can be quite powerful in a modern anti-bot stack and an in depth analysis of it is genuinely valuable, but on its own it’s still insufficient. Relying on this data alone isn’t costly for the attacker and offers little real protection.


> they don’t yet reach the robustness of modern anti-bot solutions

Like what?


You can look at my previous answer:

> To clarify, If I disclose the exploit publicly, my concern is that the company could take legal action against me, even if I don’t share any technical details or information that would allow someone to reproduce it. – Something I really don't want to deal with.


To clarify, If I disclose the exploit publicly, my concern is that the company could take legal action against me, even if I don’t share any technical details or information that would allow someone to reproduce it. – Something I really don't want to deal with.


In its current state, the protections are pretty weak. I’m sure they’ll update it, and we’ll see what changes they bring. If this header is meant to serve as an anti-bot measure, then there’s a lot more work they need to do both on the JS and WASM sides. On top of that, processing fingerprint data on the backend, like building user/fingerprint profiles, analyzing detailed browser, device and low level connection info, and using AI to spot patterns, makes the system a lot more complex. However, based on the current implementation, I anticipate they’ll likely stick to a relatively simplistic approach.


You’re right. In this case, just knowing the guest_id is enough to break down the header. Twitter’s main goal here is mostly to obfuscate the data and make the reverse engineering process more painful.


Reversing will always win


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: