Jacques suggests that this is all about those 9 results that Google was able to "force" into Bing's index. But -- unless Google are simply lying about this -- the point isn't really those 9 results; they are just the clearest evidence Google has of shady behaviour by Microsoft.
Google claim that they saw lots of (less obvious) evidence of Bing mining search results from Google before they began their sneaky test, and the point of the test was simply to confirm that Microsoft is doing what they thought.
This is not about whether Bing is easy to "game" -- whether Google can get nonsense into Bing's index by sneaky means. It's about whether real Bing searches commonly derive their results from Google.
Imagine that I think you're reading my email and using the information in it to play the stock market (maybe I have secret insider information about some companies, or something). So I do a test: I arrange to be sent a bunch of email that, if you acted on it, would make you buy particular companies' shares that you'd otherwise have no reason even to have heard of. And, lo, you do that for 10% of the companies involved. Would anyone, looking at that, say that the real news is that I was unable to "game" your stock market transactions effectively, and that your spam filters caught 90% of the junk I tried to inject into your information?
If Google think they have other evidence, let's see it. The burden of proof is on the accuser, I think.
BTW, that's not necessarily about whether they are "lying". The other evidence they think they have could be convincing to them (because, e.g., they already have a strong predisposition to believe that they are inherently far better than anyone else, and therefore anyone building a competitive search engine could only possibly be copying them - this is a caricature, but you get the idea) but not necessarily convincing to others.
Why should Google show other evidence? The evidence they have provided is enough for me to be suspicious: How could Bing possibly come up with the same odd spell correction without copying from Google?
Yes, it could be a side effect of some machine learning algorithm. But Bing never explained that, which would have been very easy if it was the case.
What Google presented was enough to show that there exist possible situations where an individual (query, URL) pair's presence in Google's search results causes it to appear in Bing's. That definitely demonstrates that Google has a nonzero influence on Bing, and you can call it "copying" if you like - on the level of individual (query, URL) pairs.
I personally don't think there's anything wrong with this, in itself. I think an individual (query, URL) pair is small enough to be 'fair use', more or less. But it's another matter (to me) if this sort of thing is happening often enough, and in important enough cases, to have a strong aggregate effect on the Bing search engine as a whole. Google has insinuated that they believe this to be the case, but they haven't shown it. That was what the post above mine was about and that was what mine was about.
If you think that what they actually did demonstrate is bad enough in itself, none of this matters.
This is also true, but is sort of tangential to the point we were discussing. (Personally, I think they probably were special-casing Google, at least insofar as it was one of a list of set URL patterns it knew how to parse.)
I like your stock market insider information analogy because it illustrates the flaw in Google's experiment very well. And very few people seem to realize it. I need to modify it slightly though to make it more analogous.
Suppose your financial advisor (Google) was suspecting that someone (Bing) was stealing their confidential financial reports on stocks. Suppose your financial advisor told you (Google Engineer) to buy 100 random shares (search and click on 100 specific search terms) and see if the suspect (Bing) acted on it.
Even if the suspect bought 100% of the shares (Bing indexes all the search terms with the irrelevant links), you still haven't proven the suspect is stealing information from the financial advisor because there's more than one source this information could have come from. It could have come from you (Google Engineer doing the clicking) or it could have come from your financial advisor (Google itself). A way to solve this issue is if you (Google engineer) had another financial advisor (another website) which told you to buy certain companies. If the suspect didn't act on those shares then you would have MUCH more conclusive evidence that the suspect was stealing from the financial advisor represented by Google.
For the point you are trying to make, the email analogy falls flat because it is not really an unsuccessful technology which creates the purchases but an appeal to human emotion. It's not as if an email can make a person take an action in the same way that a data packet can determine the actions of a computer.
(Determining the degree to which Google's methods are similar to a 419 scam is an exercise for the reader).
While I find the thought interesting, that Microsoft may have google-specific code to filter out redirects, I don't like the notion that Google should not complain because it allegedly ignores copyrights.
I see a difference between aggregating content and presenting it and mentioning the source and just plain copying (such as spell corrections) with no mention of no source.
I also think there's a big difference ("ethically" - I'm thinking more in terms of an artistic/creative idea of originality than an academic or legal definition) between using content from a blog (or whatever) to make a search engine, and using content from another search engine to make a search engine.
Mentioning the google redirect is a point I haven't seen yet and extremely valid. "Anonymous click data", as Bing said, can't account for what was seen.
But I think the author focused too much in the "9%" part. Who knows what Google did with the other 91%? Maybe they were trying different approaches, which actually would be the most sensible thing to do.
That is sensible, but can easily slide into an ends justifying the means. In other words, we know that the engineers were tasked with proving that Microsoft was copying Google. That's a different task than figuring out what the Bing toolbar does with data from the Google search page.
Given the low success rate, it is not implausible that the engineers pushed whatever ground rules there were (if any) in the pursuit of evidence. It seems to me that is the most plausible explanation for the uncertainty about whether it was 7, 8, or 9 cases.
To go beyond what is easy for the media to report, it is reasonable to expect that a company in Google's position has twenty or more full time engineers analyzing their competitor's products. The story about "torsoraphy" really only makes sense if Google has such a program. Seriously, is anyone surprised that Google and Bing analyze each other's engineering?
[WildSpeculation]
One or more Google engineers tasked with analyzing the competition discovered "torsoraphy" connection and identified its correlation with the Bing toolbar - however, keep in mind that Google has not claimed that the "torsoraphy" naturally occurred in the wild.
[/WildSpeculation]
[GoogleClaim]
Based on the "torsoraphy" discovery, one or more Google engineers hard coded web pages - presumably with permission from senior managers since a leak that it was done casually has such serious blowback potential - twenty Google engineers armed with laptops were tasked with creating top keyword rankings.The project was at least active for two weeks over the traditional Christmas Holiday[/GoogleClaim]
[WildSpeculation]
The Google engineers, surprised at their initial lack of success tried increasingly diverse and aggressive methods as time passed despite the diversity of IP's they used as they traveled over the Christmas Holidays. Being good hackers some even tried exotic techniques perhaps even creating manipulating existing web pages to influence the search rankings. On advice of lawyers, Google is not comfortable accusing Microsoft of copying in these cases.
A month later, Google plans to announce the Android Marketplace while Apple is going to proclaim that Rupert Murdoch is the future of journalism. Google is going head to head with the reigning PR champion of the world. They turn the experiment into a torrid story and release it on February 1.
Larry Page knows the difference between a Founder from Stanford and a former IBM'er from a cow college in Alabama. It's no match. Binggate drowns out the traditional eve-of-event Apple adulation. On the day of the events, the mudslinging is far sexier than "and it has a hundred pages" for the tech press.[/WildSpeculation]
How bad was it yesterday for Apple? Stories about Apple's triumph with The Daily didn't make the front page of HN. The tech press even found Microsoft more interesting than Apple yesterday. Binggate was Googles attemt to kill two birds with one stone.
Google shows only 50 pages out of about 300 and provides several links to buy that book, including to Microsoft Press.
So that 'excellent point' only shows that Google even promotes products of its competitor. I don't think it says anything about Google "ignoring" copyright. Heck, you could even look into that book at Amazon for free: http://www.amazon.com/gp/reader/0735622841
Along your reasoning, every library or friend that shows you a book is ignoring copyright.
Okay, Google links to pages to buy that book. Revenue goes to Microsoft. You can see clearly all over the place that it's a Microsoft book. You can browse where it came from originally.
Microsoft "copies" the results. Does not say where they came from and presents them as their own. In this context, that would be copying all the text from that book, removing all the branding and creating a new book saying that Google wrote it.
Can anyone quote the opt-in language that Microsoft uses for the Bing toolbar? Although I'm still not sure about whether there's a difference, I think there's a clear distinction between watching users of your site and taking action based on that (Google and search link clicks) and watching users of your toolbar everywhere on the internet and taking action based on that. (And that distinction is as clearly drawn between google.com and the Google toolbar.)
I've always felt that, yes, Google collects a lot of personal data, but they're up front about the collection and they give me good value in return - which is sadly not my general expectation of data collection.
It's clear that search engine use user 'feedback' to improve their search results.
But if
"Bing even sent Google synthetic and organic search terms, to analyze and make use of the results",
I would suggest they add 'powered by Google' to their search result page.
Further, I don't think it makes a difference if Bing is querying Google directly or using opt-in users to do so. In any case, their copying search results efforts from Google.
A long, long time ago I decided to build a machine to pass the Turing test. Part of this involved giving 1000's of people microphones to record the questions they asked and responses they received in normal life. This data was streamed directly to my machine. I then programmed my machine with a complex algorithm that used this data, along with data from many other sources, to attempt to produce human like responses to questions it received. Unfortunately, a group of friends who applied for the microphones decided to ruin my attempt to pass the Turing test by grouping together and repeatedly asking a friend, Jim, 100 different nonsensical questions to which Jim, who was in on the trick, gave pre-determined responses. When I finally finished I put my machine in a box and asked the public to ask it questions. It was doing pretty well for a while since no one could distinguish it from a human. Then Jim and his buddies turned up. They asked it the 100 nonsensical questions, to which, for 7 of them, it gave the same response as Jim had. Jim got very angry. “This machine is copying me!” said Jim.
Interesting point that Microsoft may have special code to filter out google redirects.
On the other hand, do we know the percentage of searches that actually get redirects as results? The 'honeypot' was rather small, so the redirects just might have been too little to appear as a signal in Bing.
I'm just glad to see I'm not the only one who's been led to think: if Google can demonstrate that Microsoft is picking up search correlations from the Google site, and Microsoft just explained that, no, it's just that we pull in clickstream data, then couldn't one feed synthetic clickstream data to Bing as a blackhat SEO technique? That seems like much less work that setting up content farms or botnet clickfraud.
If I were building a toolbar that followed tracked people's clicks, I would take some measures to have it record the 'final' URL loaded by the browser and not the naive link from the DOM. There are all sorts of redirectors in use and not working around them generically would give distorted results.
The lack of google redirects in bing's results doesn't look like proof, or even a smoking gun to me.
In fact, he suggested that that might be what's going on: "It is also possible that the 91% that didn't 'make it' was actually because they were pointing to google rather than to the target. Of course Bing does not like to link to its competitor and filtering out www.google.com/url can't be that hard."
Google claim that they saw lots of (less obvious) evidence of Bing mining search results from Google before they began their sneaky test, and the point of the test was simply to confirm that Microsoft is doing what they thought.
This is not about whether Bing is easy to "game" -- whether Google can get nonsense into Bing's index by sneaky means. It's about whether real Bing searches commonly derive their results from Google.
Imagine that I think you're reading my email and using the information in it to play the stock market (maybe I have secret insider information about some companies, or something). So I do a test: I arrange to be sent a bunch of email that, if you acted on it, would make you buy particular companies' shares that you'd otherwise have no reason even to have heard of. And, lo, you do that for 10% of the companies involved. Would anyone, looking at that, say that the real news is that I was unable to "game" your stock market transactions effectively, and that your spam filters caught 90% of the junk I tried to inject into your information?