The problem is that Google is doing the wrong test to prove their point.
The should search for Justin Bieber and add in the Bieber search results some really horrible link.
MS should have a lot of relevant info for Justin Bieber, so the Google data should be rated lowly. If this non-sensical result makes the first page then you can begin to assume that their weighing it heavily. But when your query is "erftqnvpwedf" -- that just means that the only relevant info is coming from the toolbar. Bing would show that result even if the toolbar only accounted for 1/(2^50) of the total relevance.
I suspect Google did this test and has nothing to report.
The test you suggest doesn't address the actual problem that Google is complaining about - long-tail, rare searches, especially ones that contain misspellings.
This is the very thing that Google is saying that Bing is stealing with their clickstream data, and it's also the one that you're agreeing would occur - "that the only relevant info is coming from the toolbar" for highly unlikely queries.
So it seems to me, looking at your argument, that you actually agree with Google's point.
My comment is in response to a comment about "popular queries". And really to the meta issue about how much it is weighed in general.
Google is trying to argue a huge PR point by saying, "Bing copies" with the ramification being that when you search on Bing you're really searching on Google. If Google said this, "For extremely rare searches where Bing has few, if any, good signals, clickthroughs from Google will be weighed in such a way that these results may make the first page".
I'd buy that 100%. The Bing team may even buy it. Google seems to want to start with that thesis, but then try to shove the whole camel in too.
They're talking about their top-ranking result being used. Are you suggesting that Google place garbage as the top-ranking Bieber result for visitors who use the toolbar?
No. Just for visitors that are coming from a specific IP address (their engineers). The only people that would see the bad results are their engineers. But the URL and data that the toolbar gets looks just like any other one.
This is pretty straightforward and I'd be surprised if Google didn't have the infrastructure in pface to do this today (like literally today).
But then their bad results would be outweighed by the hundreds or thousands of other Bing toolbar users doing Bieber searches getting legitimate results fed back to Bing.
And to be clear (if anyone from Google is reading) you don't just want to do 'Justin Bieber' since I suspect a lot of people with the toolbar actually do that search.
Also do things like 'radix-2 fft', something where there is a lot of additional signals (I suspect), yet something that other toolbar members probably aren't searching for. So the toolbar data MS gets is strongly skewed to your results.
The should search for Justin Bieber and add in the Bieber search results some really horrible link.
MS should have a lot of relevant info for Justin Bieber, so the Google data should be rated lowly. If this non-sensical result makes the first page then you can begin to assume that their weighing it heavily. But when your query is "erftqnvpwedf" -- that just means that the only relevant info is coming from the toolbar. Bing would show that result even if the toolbar only accounted for 1/(2^50) of the total relevance.
I suspect Google did this test and has nothing to report.