This is a toy example of the kind of problem that the field of Computer Vision i...

This is a toy example of the kind of problem that the field of Computer Vision is actively working on: object detection. In a (tiny) nutshell, our best answer for general images and objects is:

1) Instead of using the full color pixel image, use an "edge image" with some simple additional normalizations. If color is important, do this per color channel.

2) Create a dataset with as many cropped examples of the target object as you can find (mechanical turk is useful for annotating large datasets); every other crop of every image is a negative example.

3) Train a classifier (SVM if you want it to work, neural network if you're so inclined) using this dataset.

4) Apply the classifier to all subwindows of a new image to generate hypotheses of the target object location. This can be sped up in various ways, but this is the basic idea.

5) Post-process the hypotheses using context (can be as simple as simply finding the most confident hypotheses within a neighborhood).

If you're interested in object detection, an excellent recent summary of the recent decade of research is due to Kristen Grauman and Bastian Leibe: http://www.morganclaypool.com/doi/abs/10.2200/S00332ED1V01Y2... (do some googling if you don't have access to this particular PDF).

A cool paper from a few months ago that should be mentioned when commenting on a post called "Where's Waldo?" is http://www.cs.washington.edu/homes/rahul/data/WheresWaldo.ht...