This represents the absolute worst kind of science journalism, completely devoid of context and domain knowledge. Virtually every definitive statement in here is wrong. Their explanation of spiking alone is a complete disaster.
Out of all the modalities, vision is easily the one we know the most about. And we do so at a fairly deep level. The discussed work seems fine but it's not the groundbreaking insight it's made out to be. Great PR work from the involved scientists (or their enterprising university marketing department).
Cognitive scientists have been studying the computational foundations of vision in depth since at least the 1980s (see David Marr's 1982 book Vision), and AI scientists have been using neural networks for computer vision tasks for at least as long. So yeah, I'm no expert, but this probably isn't the ground-breaking work the article makes it out to be.
This work is about the very early steps of the primate visual pathway -- retina to LGC to the input layer of V1. Marr's opus is meant to be a much more wholistic view of the visual system. A more appropriate context for this article is perhaps David Hubel's (very accessible) Eye, Brain, Vision.
(The book used to be freely available on a website hosted by Harvard Med, but I can't seem to find it anymore.)
FWIW, I work a little bit in computational neuroscience. While I think "ground breaking" is an exaggeration, and I wish the article spent more time explaining the general thinking in the field and why it matters, the content is not terribly written for an article of this type and length. And it should be emphasized that the point of this modeling is really understanding the biology of the primate visual system; what it says about the general problem of vision is a separate question.
Disclaimer: I was not involved in this work, but did collaborate with one of the scientists extensively in the past.
Can't edit anymore, so two very minor corrections: "LGC" should have been "LGN" (had "RGC" and "LGN" both on my mind), and "disclaimer" should really have been "disclosure."
> Not only are LGN cells scarce — they can’t do much either. LGN cells send a pulse to the visual cortex when they detect a change from dark to light, or vice versa, in their tiny section of the visual field. And that’s all.
[I think the "scarcity" is real, but some areas have better coverage than other. But I really don't remember anything similar to the other part of the model.]
>>> LGN cells send a pulse to the visual cortex when they detect a change from dark to light, or vice versa
This looks like an binary toggle encoding (and that the receiving end must remember and count how many pulses received to know if that part is dark or light).
I vaguely remember something like that the neurons in that part or nearby send pulses periodically, and the time between pulses is smaller (or bigger?) when there is more light. (Or perhaps keep the time between pulses, but use double/triple/... pulses more light.) Perhaps add some slow adaptation to the light level, so after a time at a fixed light level the neuron uses the default interval between pulses. I'm not sure about the actual encoding, but all of what I remember are very different from the encoding in the article.
Agree - very disappointing. I studied human and machine vision in grad school 20 years ago and fail to see the breakthrough hinted at by the title of the article.
Running Windows is perfectly fine; the major libraries for GPU-accelerated autodiff and networks (CUDNN with Pytorch or Tensorflow) have great support nowadays. It's the AMD GPU that remains essentially useless, as of 2019. If you want to get into the game, I'd recommend buying a middle-of-the-road NVIDIA GPU like the RTX2060.
For toying with autodiff and basic CNNs, CPU works just fine by the way...
The point is that and AMD GPU is far from useless. The only thing that it DOES lack is the out of the box from major Python/R/whatever libraries. Why? Not because AMD GPU does not work, but because most (perhaps all) of these high-level libraries rely of underlying performance libraries provided by Nvidia.
Despite all the talk about autodiff this or that, the stuff that matters is implemented by hand by Nvidia and Intel engineers and then high level libraries build on top. AMD is simply lagging in providing low-level C libraries and GPU kernels for that.
For example, let me chip in with the libraries I develop, in Clojure, no less. They support BOTH Nvidia GPU AND AMD GPU backends. Most of the stuff is equally good on AMD GPU and Nvidia GPU. With less fuss than in Julia and Python, I'd argue.
Check out Neanderthal, for example: https:neanderthal.uncomplicate.org
Top performance on Intel CPU, Nvidia GPU, AND AMD GPU, from Clojure, with no overhead, faster than Numpy etc. You can even mix all three in the same thread with the same code.
That's not quite how the process works. These papers go through multiple (> 2) revisions. At any iteration, there's ample opportunity for updating references. This applies double given how long the Cueva/Wei work has been available (preprint & CCN'17 (?) contribution).
It's certainly an interesting paper, but there's a bit of publication weirdness at play here.
In October '17, Cueva & Wei put out a(n anonymous) paper that recapitulates the core result almost exactly -- that training a recurrent neural network to perform dead reckoning/path integration gives you intermediate units whose place fields strongly resemble grid cells. Critically, this only happens when regularization is applied; Cueva/Wei used noisy inputs and DeepMind implemented 50% stochastic dropout in the intermediate linear layer. There are some superficial differences (generic RNN units vs. LSTM), but at their core these studies are virtually identical. Check it out:
What I don't get -- why doesn't DeepMind acknowledge this result? Sure, the Nature paper was submitted in July '17, but these things go through many revisions. Clearly, DeepMind went a bit further with the whole integrating visual CNNs/grid cells part. Nonetheless: Fig. 1 is the core result, everything from Fig. 2 onwards is nice-to-have but not essential, and I feel like Cueva/Wei got there first.
Ah, well. At least the minor controversy brings in great publicity for the Cueva/Wei paper.
As someone who works in visual neuroscience, this article's a tough read. Lots of statements that are semi-accurate at best.
1) Eyes don't work like cameras; there's no real "exposure" phase as such (even though there's lots of thresholds). So it's misleading to talk about discrete images that we sample at some fixed frequency. Instead, it's much more helpful to think of photoreceptors and subsequent processing stages as continuous band-pass filters. At some point, high frequencies are simply cut off because the electro-chemistry of the cell can't keep up. For us, that cut-off comes earlier than it does for invertebrates.
2) There's no mechanical interaction between light and photoreceptor. Instead, the transduction cascade of the dipteran eye seems to encompass a mechanical (as opposed to biochemical) step.
3) It's pure conjecture to talk about a fly's slowed down "perception" of the world. The reason why they take off before you get to them is much simpler -- there's a highly optimized reflex that connects eye and flight muscles via the giant fiber (a particularly rapid nerve). We have similar responses, like eye lid closing etc. Additionally, their photoreceptors are sensitive and fast. But there's zero evidence that flies have any sense of continuous time that could be faster than ours.
Thanks for the informed opinion. I know way less about animal vision than machine vision but the statement that eyes have a "frame rate" and "send images to the brain a fixed number of times a second" smelled really bad.
A maybe-dumb question about point (3) - I've noticed that when I get a blink/flinch response from something (usually some sand or a bug hitting my face when I'm on the bike), it feels like I blink just a split second before the thing hit me. Given that I'm unlikely to have any kind of precognition, do you think this might be related to the blink reflex being 'hard wired' and so my brain gets the "hey, a thing hit your face" signal after the "hey, your eyes just closed" signal? (Alternately, I read something once about our perception of audio being delayed by ~100ms so that it synchs up with our perception of vision, despite our visual processing being slower than audio - maybe the signal that caused the flinch gets 'buffered'?)
> I've noticed that when I get a blink/flinch response from
> something (usually some sand or a bug hitting my face when
> I'm on the bike), it feels like I blink just a split second
> before the thing hit me.
We know very little about conscious perception or even the locus at which sensory signals are integrated to generate a conscious percept. But it's perfectly possible that delays differ across modalities and that the proprioceptive signal about lid-closing reaches whatever-relevant-area before your visual system catches up.
> I read something once about our perception of audio being
> delayed by ~100ms so that it synchs up with our perception
> of vision
Not an expert on audition, but the brain is really good at generating coherent representations of the physical world across modalities. I wouldn't be surprised if such cross-sensory synchronisation happened in some form.
Is it possible that there is a delay for one-off unanticipated events, but the feedback loop we form between perception and playing is much tighter? I definitely have a hard time playing anything with much more than 5ms audio buffers myself.
How about an analogy to CPU clock speed, bus speed, etc.? I think that's really what the camera analogy is getting at -- the rate at which the signals are getting processed and acted upon. Clock speed / bus speed would similarly determine how high of frequency you could "hear" stuff if we were talking about ears instead of eyes. I know the computer model of the brain is way off in many respects but I find it pretty useful for stuff like this.
Short abstract snippet (the "parallel circuits" are other, non-giant descending neurons that also trigger the escape behavior upon a looming stimulus):
"Intracellular recording of the descending giant fiber (GF) interneuron during head-fixed escape revealed that GF spike timing relative to parallel circuits for escape actions determined which of the two behavioral responses was elicited. The process was well described by a simple model in which the GF circuit has a higher activation threshold than the parallel circuits, but can override ongoing behavior to force a short takeoff. Our findings suggest a neural mechanism for action selection in which relative activation timing of parallel circuits creates the appropriate motor output."
If your brain and body spanned the size of the Earth, the signals into and within the brain would as a matter of physics take much longer than with a human.
It seems reasonable to suppose this is also true when you compare a tiny fly with a comparatively massive human.
That said, I haven't seen specific, explicit evidence to prove this seemingly logical theory.
Regarding point 3, I have no sources at the moment but have read several scientific articles in the past claiming that different organisms really do have a fundamentally faster or slower perception of continuous time, and also that drugs can temporarily influence this perception. Is there no truth to this at all?
Great! Any idea what mechanism adrenaline activates that gives the perception of time slowing down? That is something I've always wondered about having experienced it probably half a dozen times in my life so far.
Still what about non escape reflex sense of "time" ? Say like landing. Don't they perceive the world at a faster (or I should say systemically adequate for them) rate ?
I'm really surprised at the resistance to this idea. Granted that we can never know another person's conscious experience, never mind another species', what reason could there be to think that it would be the same in this regard as ours? Seems to me the burden of proof falls more on that claim than on the claim that they're different, which I find completely plausible.
I only skimmed the notebook, but the code looks fairly inadequate. The pandas portion, for instance, treats the data frame as a dumb array and critically ignores grouping functionality which should offer a tremendous speedup. Moreover, for the exact same task pandas should never be slower than NumPy.
Benchmarks are hard to get right but this one falls way short of saying anything at all about performance penalties incurred by various libraries and abstractions.
We didn't have to reverse-engineer the CPU. Also, CPUs are rationally designed; we have zero guarantees that an evolved brain follows any principles at all. Every brain region may be highly specialised to a single algorithm or task with zero mechanistic overlap. Figuring out deep neural nets is a closer analogy, and at the moment we have very little intuition for how to do even that. Keep in mind that the brain is vastly more complex than even the most advanced ANNs.
On the technical side: The most advanced neural techniques allow for parallel measurement calcium signals in ~10,000 neurones. That's a long way off from complete observation!
Even in these apparently simple feedforward sensory networks, connectomics haven't been the anticipated panacea. There's been a flurry of follow-up papers to Takemura et al., essentially refuting the suggested model.
Turns out, even where they should connections don't constrain circuits to a sufficient degree.
Awesome, do you have a good cite for that? The last time I paid attention in this space was a year+ ago, when the vision people were trumpeting these results, so I'd love to know more about the current thinking. It's been my go-to example for "connectomics will help with some things", but I'm not a sensory physiologist.
I guess the key lesson is -- don't rely on a single approach, because its limitations may well lead you astray. Applies to connectomics, physiology, modelling, etc.
Out of all the modalities, vision is easily the one we know the most about. And we do so at a fairly deep level. The discussed work seems fine but it's not the groundbreaking insight it's made out to be. Great PR work from the involved scientists (or their enterprising university marketing department).