So, just pull and average. No attempt to align, no attempt to decompose, no attempt at PCA, no processing at all. Just pull and average.
I'm struggling to see why this is interesting.
Added in edit: Rather than just downvoting, perhaps you could tell my why this is interesting. Were there technical challenges to overcome? If so - what? What did the implementor learn by doing this? What are you learning by using it? Please - help me to see why this is at all interesting! I genuinely don't understand.
To the extent that it is interesting, I think it is in large part precisely because of the simplicity. It's doing the simplest thing that could possibly work (where work is defined as create an interesting image which in some sense is interpretable as the "essence" of the search), and surprisingly, even with such a simple approach it does work in many cases. See the results for Rolex (http://imgessence.com/browse/view/1543). It's very watch-y, even without any processing.
If you try to judge it more as art than tech, it's more interesting I think.
I agree that it needs to be tweaked a bit so the images align better. It doesnt handle portrait oriented images very well.
I think as programmers, sometimes we fail to realize that just because something is simple to us doesnt mean it is to everybody else. The reason I put this together is because whenever I saw these types of images posted somewhere, non-technical people would reply saying they wish they had a way to do it themselves.
There werent many technical challenges honestly. The lessons I learned were more about finding something people want and delivering it quickly with the minimum features. Now I can tweak and iterate and try the more challenging aspects.
Spoon is an interesting example : http://imgessence.com/browse/view/190 - You can see the many basic outlines of a 'spoony' shape. However, all of those images of spoon are presenting the object rotated at some more or less random angles. It would be interesting to add some algorithm that would try to match the images (rotate / scale) to the average and hence give less noisy output.
i actually kind of like that it doesn't attempt to rotate/scale to an average.. this way you get a rough sense of the most common orientations.
in this case the images with the spoon 'head' to top-right look to be most common (darkest/most defined). but the next most common orientations appear to be where the spoon is flipped with the head aligned somewhere between top and middle left, and with many more variations..
I threw this together so quickly that I didnt honestly do much benchmarking. I will be trying to improve performance as much as I can. It does only pull high res photos from Bing. Which in itself, takes around 1 minute over CURL (longer if it gets hung up on a heavy file). The image processing takes another minute or two usually.