Most Human Language Understanding systems are based on statistical and machine learning pattern matching technics either implemented as graphical models (HMM, Language models, etc.) or as formal neural networks encoding the firing rate of neurons (convolutive N.N., deep N.N., Boltzmann N.N., etc.). Impressive practical classification and pattern matching results can now be reach thanks to recent developments in computing power and hardware implementations, notably on GPU (Graphical Processing Units).
Human communication through language is essential to our survival and is in strong interaction with motricity, vision, emotion, etc. Abstract interpretation of the acoustical signal (semantic, emotion, etc.) requires the use of most areas of our brain (motor, visual, planning, … areas). One of the striking performance of the brain is the auditory scene analysis and the capacity to decompose auditory scenes into auditory streams and objects. The most common known application of the auditory scene is the cocktail party effect, that is in practice only a side effect of a more complex and general process that implies our multisensory brain. In fact, auditory scene analysis is also fundamental to the acquisition of a new language and to the understanding of speech and sounds. Our ability to analyse auditory scenes by integration of visual and motor feedbacks is fundamental to our build up of human language understanding. Taking into account these feedbacks and these multisensory interactions for better human language understanding and acquisition systems cannot be reduced to pattern matching or classification algorithms. Dynamic feedbacks, active cochlea, attentional processes, anticipation, intention, planning,… occurring in the multisensory brain have to be taken into account and implemented for better auditory scene analysis modules that are part of human language understanding systems.
The poster discusses potential research directions and solutions to the design of better human language understanding systems that comprise robust auditory scene analysis modules in interaction with other sensory-motor modalities of our brain. Discussions about software and hardware implementations in relation with state of the art machine learning and NPU (Neural Processing Units) are also presented on the poster.