More

dumitrue · on Sept 22, 2016

Apart from the captioning work already mentioned, there's also visual analogy making work by Reed et al. http://www-personal.umich.edu/~reedscot/nips2015.pdf which is pretty exciting (sort of a converse to what you're proposing).

dumitrue · on June 4, 2015

But the test set has 50,000 images (across 1000 categories). It's not that easy to just "try many times" to get it right.

mcguire · on June 5, 2015

...for some definitions of "trivial" and "perfect". At this level, I suspect even a small advantage would result in winning the contest, which is the point here.

dumitrue · on May 21, 2015

I don't think OpenCV really solved computer vision to be fair. There's definitely no model out there that can do image-based question & answering as well as a human can, or interpret the contents of an image (parse it, if you will) in an accurate way, with the exception of very few special cases.

dumitrue · on March 2, 2015

Not really -- he's been an active Google+ poster even before he joined FB (https://plus.google.com/+YannLeCunPhD/posts)

dumitrue · on Nov 18, 2014

Just went out: http://arxiv.org/abs/1411.4555

iandanforth · on Nov 18, 2014

I'm curious to hear your thoughts about learning object saliency from these datasets. Most human generated images have built-in biases toward framing things humans care about, and all of the captions will reflect the relative importance (to humans) of pictured objects.

Captioning images, for humans, is a subset of a much more general skill set. Humans can scan a broad visual scene for salient components, focus on those while ignoring non-salient objects, and then organize their thoughts about what has been seen in such a way as to produce an extremely low dimensional description of the scene (a descriptive sentence.)

Human's also have the advantage of immediate feedback to their generated descriptions from peers or parents.

I haven't seen much work that has attempted to tackle datasets that aren't pre-framed by humans, or ones that try to scale reinforcement learning. I'd love to hear your thoughts or get suggested reading if any pops to mind.

dumitrue · on Aug 19, 2014

Just FYI, the only additional data used by the GoogLeNet entry was from the classification challenge (aka provided by the organizers), hardly something that would make you lose sleep at night.

contingencies · on Aug 20, 2014

I was not suggesting Google was pulling data from other sources in some sort of conspiratorial way, but rather pointing out for interest that its algorithmic superiority was weighted toward large data sets. Given the volume of data they see and store in their existing operations, I saw that as a potentially interesting correlation.

dumitrue · on Sept 27, 2013

The Los Angeles office is right by Venice Beach, in a beautiful setting. It's a mid-sized office (~500 employees according to http://venice.patch.com/groups/business-news/p/silicon-beach...) in the same time-zone as Mountain View and a less than 1h flight to the mothership should you need to go there.

Unlike the main office, most people don't have to choose between commuting from SF or living in Mountain View, because Venice/Santa Monica is actually a nice area to live in :). Naturally, the breadth of projects is not as big as in Mountain View, but there's a number of exciting things happening here (computer vision, quantum AI etc).

epsylon · on Sept 27, 2013

> Venice/Santa Monica

I bet you have a few surfers there! I'd kill to work next to the beach. Nothing beats a good early morning surf session. (Figuratively, of course)

dumitrue · on Sept 25, 2013

It's actually already possible to train convolutional network-like models to distinguish between a variety of dogs, cats etc with precision that is pretty much super human. The real problem is getting high-quality training data without involving tons of domain experts that would tells us with high degree of confidence whether a given image is of a specific breed of dog (getting millions of images of dogs is easy, so is building a classifier).

It's not immediately obvious to me how useful such an app would be btw. Unless I of course misunderstood what a "real life pokedex app" is :).

victorf · on Sept 25, 2013

If you can figure out the enemy dog is a fire type, you can switch your team up accordingly :)

Is state-of-the art for that kind of recognition deep learning?

dumitrue · on Sept 25, 2013

Yes, though I think on public benchmarks this is still not the case. There's a dog-breed classification problem in this year's Fine-Grained challenge (https://sites.google.com/site/fgcomp2013/) so we'll see in December!

dumitrue · on Sept 25, 2013

Yes, it'd be surprised if the straightforward implementation from https://code.google.com/p/cuda-convnet/, run on a GPU with lots of transformations, wasn't the winning entry.

dumitrue · on June 18, 2013

It's possible that the underlying model is just not particularly good at learning from data. 11B parameters is a lot of free parameters to learn -- for instance, the main competitor to that paradigm is the work by Krizhevsky et al., which are convolutional networks with lots of parameter sharing, and I think they get better performance (on a comparable task) with ~60M free parameters.