Can we perhaps edit "singularity is near" out of the title? This sounds impressive, but having a bunch of racks able to classify the outline of a face is vastly disconnected from machine and humanity merging.
I was going to make the same request.
The singularity should be discussed where relevant, not added to everything.
This paper is producing high level features from noisy data in an unsupervised fashion -- a human still needs to indicate the task it should be targeted for and a human still needs to provide labelled training data for these high level features to be of use.
This work is interesting enough to warrant detailed discussion on the topic at hand, large scale machine learning, rather than just rehashing discussions of the singularity.
Added: As I can't reply to the comment below I'll do it here =] The network provides learned representations that are discriminative.
The aim of the network is to learn high level features representative of the content.
One of the many features it produced was one which accurately indicated the presence of a face in the image.
Note that they said train a face detector and not classify.
For example, from the same network there was a feature which accurate detected cats yet they didn't explicitly train a cat detector either (see the section "Cat and human body detectors").
As the network represents the content as generic features it is clear that, if it reaches a high enough level, those features are essentially classifications themselves.
tldr; High-level features generated by this unsupervised network are so high-level that one of them aligns with "has a face in the image", others with "has cat in image", etc, but these features cannot be used without labelled training.
Actually, what's significant about this work is that labeled training data was not required:
"Contrary to what appears to be a widely-held intuition, our experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not."
I replied by adding to my comment above as it wouldn't allow me to reply earlier. Reference that.
tldr; High-level features generated by this unsupervised network are so high-level that one of them aligns with "has a face in the image", another to "has cat in image" (see the section "Cat and human body detectors") and so on.
Note however that they select the "best neuron" for face classification -- the only way they can do that is via using labelled data and testing all the neurons (where each neuron's activation is a feature).
Thus, these features cannot be used without labelled training.
But the difference is that you can show it 1 billion unclassified images, then show it 1000 images you know to be faces, analyzing how its neurons respond to the known inputs to use it to classify the rest of the images.
Strictly speaking, you do need to have some labeled data at the end in order to determine how the neural net views faces, but I think that obscures what's notable about this system.
The amount of human participation involved in training is potentially six or more orders of magnitude less. That's a breakthrough, and a change in kind, not just degree.
In a more general response: I don't think what I stated obscures what's notable about the system, I feel I stated exactly what was notable and specifically avoided overstating it.
Overhyping when it comes to machine learning and AI seems to be the norm and has already hurt AI/ML severely in the past[1].
More specifically: I didn't disagree with anything you've stated, simply pointed out that labeled training data is necessary in response to the statement that it wasn't.The high-level feature extraction the paper discusses is unsupervised but the classifiers it produces are semi-supervised. It's an important distinction.
Having a bachelor's emphasis in AI, I think you described it perfectly. I was wondering too from their abstract how they were recognizing "faces" entirely without labels, this makes it clearer. As you said, unsupervised they can find extremely high-level categories. That is pretty impressive.
How does this work? I thought neural nets only learned when they got some kind of feedback that let them know whether what their classification was right (back propagation).
The neural network in this paper, an autoencoder, doesn't require labelled data.
Autoencoders take high dimensional input, map it to a lower dimensional space and then try to recreate the original high dimensional input as closely as possible.
The idea is to learn a compressed representation for the data and hope that this compressed representation works as a high level featureset.
As the model is just trying to represent the original input, no labelled data is required for the initial part. Labelled data is later introduced when the high level features are used for classification.
What's most interesting about this paper is that one of the features learned by the model maps quite well to "image contains a face" without any prompting by the researchers.
Did we? A lot of early works in AI were ... /overstated/.[1] While a lot of concepts were created way back when, a lot of results weren't really generated. It's extremely valuable for someone to actually go and do a thing, now that we can, even if someone had the idea for the thing eons ago.
The early works in AI with regards to unsupervised learning were in the 1940s and 1950s. Claude Shannon had demonstrated a chess learning system which taught itself by playing him to defeat him in under two weeks as early as 1949.
No, they weren't overstated. They were hyped by a clueless press. There's a pretty critical difference. It's a bit like how the early web pioneers didn't say that the web was going to revolutionize the delivery of dog food; it was a journalist who said that.
"It's extremely valuable for someone to actually go and do a thing, now that we can"
Self organized unsupervised learning was in use for optical classification of potatoes in the feeding of Frito Lay automated processing plants in the late 1970s.
Please distinguish that you haven't actually looked for earlier examples from that you imagine none exist. Thanks.
I find both of your comments extremely condescending, both toward saalweachter and the authors of this article.
1. The fact that Claude Shannon succeeded in training a chess system has virtually no impact on sallweachter's claim that many AI results were overstated.
2. Certainly the press overstated them, which supports saalweachter's premise rather than weakening it. Even if the _implied_ claim was that _researchers_ overstated results, your argument does nothing to weaken this claim.
3. Frito Lay solved a problem several orders of magnitude easier that of face recognition in natural images, which is still very much an open problem in computer vision.
4. Similar to 1., the Frito Lay example contributes nothing to your goal of weakening saalweachter's claim that this is valuable research--a claim which is exceedingly innocuous.
I understand that you've probably got a bone to pick against the many AI naysayers and saalweachter's comments conjured a few common misrepresentations (i.e. (a) that the "AI revolution" burnt-out because it's researchers were somehow naive and (b) that neural networks are something new invented by computer vision researchers). You'd be justified in arguing against these claims, and I'm sure your father (respected AI researcher of the same name) would make them too, if saalweachter had tried to make them (which he didn't). But even if you were justified in making the argument, I would expect a less condescending one that made better use of evidence than the argument you've made here.
"I find both of your comments extremely condescending"
When a comment opens with a tone like this, I usually don't bother to respond, but I'll give you a chance, because you seem to have done a lot of honest mis-reading.
To wit, it may be of value for you to inspect your own tone, if you find public condescention inappropriate.
.
"1. The fact that Claude Shannon succeeded in training a chess system has virtually no impact on sallweachter's claim that many AI results were overstated."
It wasn't meant to. Sallweatcher's claim was silly. Who cares if many things were overstated? That has zero bearing on that valid work was, in fact, being done.
The purpose of that statement was to remind us that as early as the 1940s, machine learning was able to defeat its own creator at what remains today regarded as a highly intellectual pursuit. My goal was to ignore the FUD of "some people got it wrong" as an attempt to suggest that there was nothing right.
Some people always get some of everything wrong. His claim is tautological and disinteresting. I was politely declining to shame him for it, but since you've presented me as having false goals, I now have no choice but to clarify.
It is generally inappropriate, for reasons like these, to chastize strangers over imagined motivations. Frequently, you don't know strangers' motivations as well as you might imagine from a simple read of a few paragraphs.
.
"2. Certainly the press overstated them, which supports saalweachter's premise"
You are now repeating something I said to me back to me. From that, you are deriving the false conclusion that because a journalist somewhere said something wrong, an important thing has been discovered.
What I'd like to point out is that the net result of observing that journalists made mistakes is still "so what?"
"Even if the _implied_ claim was that _researchers_ overstated results"
It isn't.
"your argument does nothing to weaken this claim."
You have not correctly identified what I was speaking to. This is akin to telling someone discussing environmental damage that some farmer is talking about crop yield and the speaker hasn't weakened their claim.
Again: so what? I never argued that there are journalists who got things wrong. I'm the one who brought it up.
What does that have to do with my original discussion?
.
"Frito Lay solved a problem several orders of magnitude easier that of face recognition in natural images"
Discovering defects in potatoes moving at 45 miles an hour inside a water sluice from a single blurry image from a single angle in hard realtime using 1970s hardware is not several orders of magnitude easier than locating things on a face in slow time on modern hardware.
It's actually quite a bit more difficult even in fair conditions. Potato defects are under the surface, and have to be located by subtle color variation. It is not hard to find the characteristic shape and shadow of the nose.
With respect, sir, it's quite clear that this is not something you've done. You're claiming that easy things are more difficult than hard things, and you're forgetting the 40 year technology gap inbetween in your rush to show that a 2012 project is more impressive than a 1973 project.
To be clear, Babbage's mechanical calculator is also more impressive than an algebra solving system made in prolog. Why? Because it's more work and it's more difficult.
Your claim of several orders of magnitude simpler suggests that you are inventing data for the sake of feeling correct in an argument, and that you do not actually have the experience to show correct guesses in this field. That, combined with a tone suggesting that you feel it appropriate to rebuke strangers in public, suggests that I don't really want to much talk to you anymore.
.
"Frito Lay example contributes nothing to your goal of weakening saalweachter's claim that this is valuable research"
Again, you've misidentified my goal, and the way by which you've done that is to drop a critical piece of his actual claim.
I don't know why you feel that it's okay to guess at people's goals, then tell people how morally wrong your guesses are. I really don't.
My actual goal was to point out the jarring unfamiliarity with the field that both he and you evidence:
"It's extremely valuable for someone to actually go and do a thing, now that we can, even if someone had the idea for the thing eons ago"
The thing I was focussing on was to show him that this thing that he's applauding someone for doing in 2012 for the first time now that it's practical, even though it isn't being used in industry, was actually outclassed by a much more difficult problem on much more limited hardware in realtime 40 years ago by a company that nobody would think of as a technology giant.
The goal was to display just how far out of touch saalweatcher was with the state of the industry.
Please don't speak to my goals anymore. For someone who'd like to speak about condescention (when I think you actually mean arrogance,) for you to tell me what I meant and what I was getting at - incorrectly - then lambast me for it in a tone far more severe than that which you're criticizing is, I admit, difficult to swallow politely.
.
"I'm sure your father (respected AI researcher of the same name) would make them too"
Do not speak for, or involve, my recently deceased father in your attempt to be correct, sir. Especially not while you're telling someone else they're being rude.
"I would a less condescending one that made better use of evidence than the argument you've made here."
Unfortunately, though you suggest this, taking a brief look through your comment shows that this is not in fact correct. You have been radically uglier than that which you are criticizing, involving personal attacks, false claims of other people's intent, false claims of other people's goals, and the repeat involvement of a recently deceased relative.
I would prefer not to hear from you again. Thanks.
Also, this paper is about 20,000 object categories, not just 1 (faces). And the neural network is not the standard type but of the deep learning variety which has only existed since 2005 (invented by geoff hinton, who was also big in neural net circles in the 80s so he's not some newcomer who hasn't done his literature search). One of the couthors of the paper is andrew ng, head of the stanford ai lab, so he's pretty legit.
I have no interest in your presenting your unwillingness to do basic research as if it was a valid form of skepticism.
Whether or not you believe me, everyone else just went ahead and took a quick look, and learned something.
Frankly, I would be happier, given your seeming inability to be a part of this conversation in a polite way, yet also your seeming unwillingness to depart this conversation even after it was requested, that you actually believe I'm wrong, and go around "calling people on this," so that everyone has early warning just how much you actually know about this field, instead of having to wait to listen to you speak.
"Shannon was a much better chess player than any program available in 1949."
On a technicality, this is correct: he started his work on December 29, and it wasn't until five days later, January 2 of 1950, that it was able to beat him.
All the same, you have no idea what you're talking about, and are asserting your beliefs as fact.
The correct way to handle "that doesn't sound right" is a search engine, not putting your hands on your hips and telling someone they're wrong in public.
Normally I advocate adherence to posting the original article title on HN, but if that had been the case I doubt this article would have ever got enough attention to be upvoted. Singularity is near is over the top.
It does this for 20,000 different objects categories - this is getting close to matching human visual ability (and there are huge societal implications if computer reach that standard).
This is the most powerful AI experiment yet conducted (publicly known).
"It does this for 20,000 different objects categories - this is getting close to matching human visual ability"
No, it isn't. This classifier cannot identify theme variations, unknown rotations, will confuse new objects for objects it already knows, is unable to cope with camera distortion, needs fixed lighting, has no capacity for weather, does not work in the time you need to run away from a tiger, requires hundreds of times more data than a human eye presents, and does a far lower quality job, all while completely losing the ability to give a non-boolean response.
To say this is approaching human abilities is to have no idea what human abilities actually are.
"This is the most powerful AI experiment yet conducted"
No, it isn't. Please stop presenting your guesses as facts. Cyc runs circles around this, as do quite a few things from the Netflix challenge, as well as dozens of other things.
I personally have run far larger unsupervised neural networks than this, and I am not a cutting edge researcher.
I'm not a Machine Learning / AI expert, so I have to ask: if running a neural network on 16,000 cores with a training set of 10 million objects isn't cutting edge research -- and if running "far larger" networks than this, as you say you have, also isn't cutting edge research -- then please tell me: what is cutting edge research?
I ask this question in all seriousness; I'd really like to know.
(And yes, I see that your username is that of a noted AI researcher. Who died in 2010. So if you're actually his beta simulation, then I'll indeed be rather impressed...)
Let's take the example of The Netflix Prize, a $1 million bounty that the movie shipping organization ran several years ago. Their purpose was to improve their ratings prediction algorithm, under the pretext that people frequently ran out of ideas of what to rent, and that a successful suggestion algorithm would keep people as customers longer after that point.
So, they carefully defined the success rate of their algorithm - that is, make it predict some set of actually-rated movies X on a 1-5 half-integer scale, take the arithmetic mean of (the sum of (the square of each error from the real rating)) - which we'll call root mean square error, or RMSE - and you have your "score," where towards zero is perfect.
Their predictor had a score of I think 0.973 something (it's been years, don't quote me on that.) Their challenge was simple.
Beat their score by ten percent, and you trigger a one month end-of-game. At the end of that month, whoever's best wins le prize. One million dollars, obligatory Doctor Evil finger and all.
Netflix provided (anonymized) a little over 100 million actual ratings, where all you had was a userID, a movieID, a real rating, and separately, a mapping "this movieID is this title." You were only allowed to use datasets in your solution that were freely available to everybody, and you had to reveal them and write a paper about your strategy within one month after you accept le prize, honh honh honh.
Seriously, it was awesome. They were going to do a second one, but lawyers, and the world sadded.
So, there, you've got a ten times larger dataset. So surely sixteen thousand cores is the drastic thing, right?
Well, not really. I was running my solution on 32 Teslas, which in the day were $340 in bulk and had 480 cores each. So I actually "only" had 15,360 cores, which falls a whopping four percent short of Google's approach, which several years ago cost me about the price of a recently used car, and which I was able to resell afterwards as used, but without the bulk discount, for almost exactly what I paid for them in the first place.
Swish.
And I mean, I've got to imagine that someone else chasing that million dollar prize who thought they were going to get it invested more than I did. There were groups of up to a dozen people, data mining companies, etc.
So if one dude sitting in his then-Boise apartment can spend like $11k on a ten times this dataset dataset over a commercial prize?
An image is a lot more complicated than a pair of ids and a rating. Counting the number of rows in the training database is misleading. I can build a reasonable dataset for a prediction task from a set of 100M rows from a database that I maintain in my spare time (http://councilroom.com , predict player actions given partial game states).
Don't get me wrong, the Netflix prize was cool.
What's cool about this is that Google hasn't given the learning system a high level task. They basically say, figure out a lossy compression for these 10 million images. And then when they examine that compression method, they find that it can effectively generate human faces and cats.
"An image is a lot more complicated than a pair of ids and a rating."
Predicting someone's reaction to a given movie is a lot more complicated than a pair of IDs and a rating, too, it turns out.
Let's take the speculation out of this.
You can get features of an image with simple large blob detection; four recurring boltzmann machines with half a dozen wires each can find the corners of a nose-bounding trapezoid quite easily. They'll get the job done in less than the 1/30 sec screen frame on the limited z80 knockoff in the original Dot Matrix Gameboy. You'll get better than 99% prediction accuracy. It takes about two hours to write the code, and you can train it with 20 or 30 examples unsupervised. I know, because I've done it.
On the other hand, getting 90% prediction accuracy from movie rating results takes teams of professional researchers years of work.
.
"I can build a reasonable dataset for a prediction task from a set of 100M rows from a database that I maintain in my spare time"
And you won't get anywhere near the prediction accuracy I will with noses. That's the key understanding here.
It's not enough to say "you can do the job." If you want to say one is harder than the other, you actually have to compare the quality of the results.
There is no meaningful discussion of difficulty without discussion of success rates.
I mean I can detect noses on anything by returning 0 if you ignore accuracy.
.
"What's cool about this is that Google hasn't given the learning system a high level task."
Yes it has. Feature detection is a high level task.
.
"They basically say, figure out a lossy compression for these 10 million images."
I have never heard a compelling explanation of the claim that locating a bounding box is a form of lossy compression. It is my opinion that this is a piece of false wisdom that people believe because they've heard it often and have never really thought it over.
Typically, someone bumbles out phrases like "information theory" and then completely fails to show any form of the single important characteristic of lossy compression: reconstructibility.
Which, again, is wholly defined by error rate.
Which, again, is what you are casually ignoring while making the claim that finding bounding boxes is harder than predicting human preferences.
Which is false.
.
"they find that it can effectively generate human faces and cats."
Filling in bounding boxes isn't generation. It's just paint by number geometry. This is roughly equivalent to using a point detector to find largest error against a mesh, then using that to select voronoi regions, then taking the color of that point and filling that region, then suggesting that that's also a form of compression, and that drawing the resulting dataset is generation.
And it isn't, because it isn't signal reductive.
Here, I made one for you, so you could see the difference. Those are my friends Jeff and Joelle. Say hi. The code is double-sloppy, but it makes the point.
The person who invented the boltzman machines - is - the inventor of this technique. He invented boltzman machines in the 80s and spent over 20 years trying to get them to actually work on difficult tasks.
Your rant about this not being compression or whatever you're trying to say is completely off the mark. You don't seem to understand what this work is about.
The netflix challenge is a supervised learning challenge. You have lots of 'labeled data'. This technique is about using 'unlabeled' data.
(Side note: At one point, Geoff Hinton and his group using this technique had the best result in the netflix challenge, but were beaten out by ensembles of algorithms.)
Cyc has nothing to do with this and is huge failure at AI.
tldr; You don't seem to be knowing what you're talking about after having reading your comments, and seem to readily discount the some of the most prominent machine learning researchers in the world today. You're obscuring important results that newcomers might have found interesting to follow up on.
I'm not a large scale ML person, and not intending to take away from the achievement of the team in the OP, but experiments in large scale, unsupervised learning have been going for a long time (even using the autoencoder approach). When you think about it, large scale requires unsupervised...
Here is an old example with hundreds of millions of records and instances:
Also, people here may not be as up to speed on the state of the art in face rec as they think they are. It's not as much of an unsolved problem as it was even 10 years ago.
"When you think about it, large scale requires unsupervised..."
Not necessarily. Crowdsourcing is another option, like Google's image tagging game, reCAPTCHA, et cetera.
Pay a herd of people to do things, and they'll do things for you. You don't have to pay them in money. Telling them they have a high score is often enough.
Yes, "requires" was too strong. I should have said they go well together. I was trying to get at the fact that it's highly common for large-scale work to be unsupervised.
Face recognition usually uses a hand-coded layer followed by a machine learning algorithm. This technique automatically devises that hand-coded layer. It also did this for 20,000 other categories and can also be applied equally well to audio or any other data type. Huge difference.
> It does this for 20,000 different objects categories
With 15.8% accuracy.
> This is the most powerful AI experiment yet conducted (publicly known).
It's only powerful because they threw more cores at it than anyone else has previously attempted. From a quick skimming of the paper, there does not appear to be a lot of novel algorithmic contribution here. It's the same basic autoencoder that Hinton proposed years ago. They just added in some speed ups for many cores.
It's a great experiment though. You shouldn't detract from its legitimate contributions by making outlandish claims.
That in itself is fairly interesting, it says we can make dramatic improvements just throwing more processing power at the problem. Whatever happens on the algorithms research side of the problem in coming years, you can count on us having access to more processing power.
I think this is the most important aspect of this paper. Throwing more computing power at the problem increases performance significantly. It is possible that our algorithms are adequate but our hardware is not.
> This is the most powerful AI experiment yet conducted (publicly known).
That's an ill-defined statement. AI is a vast and diverse field: what makes one demonstration more "powerful" than another? There are definitely other projects that could be viewed as being in the same class of "powerful" as this cluster.
This is certainly an interesting paper, but it has to be viewed in the context of a large and active field.
In return, I will offer you two interesting non-sequiturs, because I don't have anything topical and a non-sequitur seems like it's worth half what something germane would be.
.
Bret Victor, "Inventing on Principle." First 5 minutes are terribly boring. Give him a chance; it's 100% worth it.
I put the singularity bit in to make it relevant for those who are non-technical. This experiment is significant because it shows that large artificial neural networks can be made to work. People have tried and failed at this for decades.
This technigue was "discovered" by geoff hinton at the university of toronto in 2005. However, nobody at tried (or maybe got enough funds) to try it this scale.
If this continues to work at larger and larger scale, this would be a machine learning technique that can work accurately on tasks that are hugely important to society
- accurate speech recognition
- human level compuer vision (make human manual labor redundant)
Even so, the singularity bit is editorializing a link to a white paper on an equally significant scale. Nowhere in the link is the singularity referenced.
As for the point about it being for non-technical people, I don't understand where you're coming from. This is hacker news. If people don't understand it and don't upvote it, then that's their problem, not yours.
"I put the singularity bit in to make it relevant for those who are non-technical." Yeah, I'm sure there's a lot of those on HN...I'd expect this kind of crap in something like Wired, but not here.