Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Humans can build associations with very few samples

This to me is an example of whatever the opposite of anthropomorphism is - assuming that humans sample like computers, and then extrapolating to "low" relative to computing. It's also my #1 pet peeve in DL debates.

As someone who has also raised children I can see how this conclusion (low sample rate) can be made, however as someone also deep into ML/RL I see how wrong it is.

You say "very few" samples without a metric. I've seen people in the past cite 2 or 3 presentations of a stimulus to a child, for example in the form of a toy, and then state that the child has correctly visually identified the toy with a verbal label in subsequent tests.

Assuming that these 2 or 3 presentations correlate with 2 or 3 samples is wrong because it doesn't take into account sample rate.

Every presentation batch is a 4D (continuous time + three dimensional) multi-sensory supervised labeling exercise at first (no RL until the first recitation/exploration). Using rough abstractions, at 60 "frames per second" input rate, and lets assume there was a "supervised labeler" (aka parent/guardian) which said the word "toy" multiple times across a 5 minute play period, you have up to 18,000 "labeled" pieces of training data across multiple sensory inputs for one object.

If you blindfolded the child and had them identify the object by feel you may need more batches, similarly with other senses (smell for example).

Obviously this is a gross simplification - but the constant 1:1 batch comparison at the sampling rate between humans and [linear models/MDP/differentiable programs/Neural Networks] really is way off.



The problem is that when you show a single example of "a toy" (let's say- a fire engine) to a child, they don't just learn to recognise the unique object you showed them as a "fire engine"; they learn the concept of "fire engine" that they can subsequently correctly recognise in different objects, with very different characteristics. Having learned what a fire engine is, they can then recognise fire engines of all shapes an sizes as belonging to the same category of "fire engine" as the original; a blue fire engine as a special case with a surprising colour; or a real fire engine as a different class of fire engines that is not a toy; and so on.

Machine vision classifiers can do nothing of the sort, no matter how many examples you give them and for how long they learn to look at them. If you label a fire engine toy as a "fire engine" then either the classifier will only be able to recognise toy fire engines, or it will have to mislabel real fire engines as "toy fire engine".

I agree that the difference between the sampling rate of humans and machine vision classifiers is not well defined, but it is obvious (and far as I can tell there's a strong consensus on this) that machine vision algorithms are many orders of magnitude less sample efficient than humans.


when you show a single example of "a toy" (let's say- a fire engine) to a child, they don't just learn to recognise the unique object you showed them as a "fire engine"; they learn the concept of "fire engine" that they can subsequently correctly recognise in different objects

I don't have that same experience at all. In fact if anything it's the opposite. My kids called ambulances "fire trucks" until I - the supervised labeler - corrected them.

that machine vision algorithms are many orders of magnitude less sample efficient than humans.

I don't think anyone disputes that - but they are at least in the same ballpark in terms of structure, especially if you look at the way RL works.


As someone who raised children and grandchildren, I can't find any explanation to how fast they learn the language, based on very few samples (where your 4D argument doesn't apply). Sure, children learn from conversations with adults, but those are mostly trivial, and involve trivial concepts. And children seem to be able to learn not only from the very limited number of samples, but also (in a sense) - learn more than these samples contain. BTW, did anyone try to analyze how many words/phrases the child heard, say, by the age of 7, when they develop perfect understanding of the language and ability to speak like adults? And after that age, one can spend 50 years learning foreign language and still not get it.


Intially, kids don't learn language that fast, they spend a whole year getting samples from parents where we try-and-try-and-try over and over to get them to say something, so there is a high sample rate going on for sure. However, it is also true that humans learn faster at some point by using self other tools, like consciousness. Not sure how exactly that works on a toddlers brain, but on mine, if you ask me to remember a phone number, I will repeat it in my head several times and try to make associations, those higher-level learning processes seem to be the algorithms that we are missing to discover and implement successfully.


> (where your 4D argument doesn't apply)

The 4D argument is even more applicable to human language, IMO. Object recognition pretty exclusively involves sight and touch. Human language involves all the senses, frequently at once.

My Spanish is not great, but usually I can communicate pretty well despite that, partially because there are a lot of other contextual cues (body language, nonverbal vocalizations, known objects) I can use to figure things out.

It's amazing how frequently the words don't matter at all, and the meaning is almost entirely contained in tone and pacing of speech.


based on very few samples (where your 4D argument doesn't apply)

Again, define "few." Language development starts in-utero [1] and basically is a constant stream thereafter.

Children who have more consistent exposure to directed language and singing from their parents learn language faster, so there is absolutely correlation between exposure rate (sample rate) and acquisition time.

Additionally the idea that language isn't 4D is just completely missing the concept. There is no linguistic association with a "ball" if there is no physical (visual/tactile) representation of said ball. Assuming a child doesn't have a disability there are no single sense concepts that I can think of.

[1]https://www.washington.edu/news/2013/01/02/while-in-womb-bab...


I often wonder how much the multi-sensory aspect plays into it. When I see an image, I don't really process it as an image, I map that visual cue into the full gamut of sensory memories (?) of that object. I could write a page worth of these descriptive meatspace 'vectors' that are invoked when i see a banana and color my interpretation of its context.

If my understanding is remotely correct, an RNN's view of a banana is basically like the face in Aphex Twin's Equation -https://youtu.be/M9xMuPWAZW8?t=5m30s (headphone users beware). No qualitative or quantitative information about the object, just a certain tone of integer triplets in a cacophony of noise.

It seems like a many-dimensional view of the world around us is going to be necessary for systems to more effectively intuit about interacting with it. It could be something we synthetically inject or we may need to give our models new senses they can use to extract their own meaning.


Well that's why I call it 4D. It's a multidimensional understanding of "banana" that crosses multiple sensory barriers.

As you more or less correctly point out, the way a DNN understands a 2D image of a banana is by basically compressing (convolving and pooling) an image into a mathematical "fingerprint" for which we provide a label. If the labeling process is homogenized then we can relatively rapidly generate inferences when testing the fingerprints on new images at a high probability.

That is to say the complexity of the "fingerprint" of a banana is several orders of magnitude greater in humans than it is for even our most advanced object detectors - if for no other reason than the mapped data is multi-sensory.


Also the pre-training takes 2 years


We are also rather prone to seeing patterns in noise.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: