Friday 13 January 2017

neuroscience - Ventral stream pathway and architecture proposed by Poggio's group


Please can you give me a very brief explanation about all functions in the ventral stream architecture summarized in this figure: enter image description here


This figure is from Serre et al.'s A quantitative theory of immediate visual recognition. Prog Brain Res. 2007.


I read multiple articles about this model, but I still don't understand the basic aim, especially behind the two operations (Gaussian-like and max-like operations).. So please can someone explain to me in details the ventral stream pathway (from V1-V2-V4-IT-PFC) including the two operations in this model.


For example : I don't understand how the cells in S1 are constructed...



Answer



This is a typical architecture of computation proposed as a model for ventral stream of visual processing in primates. It has a long history (e.g., Neocognitoron by Fukushima was 1980) and still widely accepted in machine learning (e.g., deep learning) and neuroscience.



Neocognitron


It is motivated by the organization of V1 simple cells and complex cells. Simple cells in V1 can be approximately thought of as edge detectors at a specific retinal location. This is why on the figure you cite, they are represented as a circle with a bar (a cartoon receptive field). The simple cells can only detect things very locally, meaning if the edge appears at a different location in your field of view, it will not respond.


Mathematically, you can think of a spatial filter that detects an edge (e.g., oriented Gabor patch) multiplied to your retinal image, and summed. For example the filter below will detect match a 45 degree bar aligned on the hot-colored area, but will have less activity if the bar is shifted out of the specific position.


oriented Gabor patch


The complex cells in V1, on the other hand is still an edge detector, but has some location invariance. In other words, when the edge is slightly displaced, the response of complex cells does not seem to change. It is believed that this is because complex cells pull from multiple simple cells with the same orientation. This is what you see in your figure where a single complex cell pulls information from the same orientation simple cells but at different locations.


Mathematically, a soft-max operation or a max operation over the simple cell outputs can lead to a good complex cell model. But, it is not limited to such operations. In fact, quadratic or other nonlinear models are also widely used in computational neuroscience.


The full hierarchy for ventral stream is then simply obtained by extending repeatedly using the simple-cell-complex-cell analogy. For each stack simple cell layer extracts some local feature (by computing on the previous layer's complex cell's output), and complex cell layer makes it invariant over space. From edges in V1, one can get corners on the next layer, then complex contours, and all the way up to objects. At least that's how the story goes.


No comments:

Post a Comment

evolution - Are there any multicellular forms of life which exist without consuming other forms of life in some manner?

The title is the question. If additional specificity is needed I will add clarification here. Are there any multicellular forms of life whic...