Roger K. Moore
Prof. Roger K. Moore is Chair of Spoken Language Processing in the Speech and Hearing Research Group at the University of Sheffield, and Visiting Professor at the Bristol Robotics Laboratory and Psychology/Language Sciences at University College London. He has over forty years experience in speech technology R&D and has authored and co-authored over 150 scientific publications. He is well known for investigating the similarities and differences between human and machine spoken language behaviour, and has championed a unified theory of spoken language processing known as PRESENCE (PREdictive SENsorimotor Control and Emulation) that weaves together accounts from a wide variety of disciplines with a view to breathing new life into a new generation of spoken language processing systems - especially human-robot interaction. His recent work in this area includes a novel mathematical interpretation of the 'Uncanny Valley' effect which was published in Nature.
The release of Siri - Apple's speech-driven automated personal assistant for the iPhone 4s - in November 2011 heralded a new era in public awareness of (and engagement with) spoken language technology. Before Siri, automatic speech recognition tended to be used in specialist dictation applications (such as transcribing medical notes) and text-to-speech synthesis was beginning to appear in our cars (in the more advanced satellite navigation systems). After Siri, users began to appreciate the potential for more general-purpose interaction in everyday productivity applications and the market began to realise the importance of understanding what was said and conversing appropriately rather than just writing it down or speaking it back. Since 2011, a number of competitors to Siri have arisen from major players such as Google and Microsoft, and there is plenty of research back in the labs, but it is generally acknowledged that there is a long way to go before spoken language interfaces can support the demands of many real-world applications in a consistent and reliable manner.
This talk will address the fundamental issues facing spoken language understanding, and will highlight the need to go beyond the current fashion for using machine learning in a more-or-less blind attempt to train static models on ecologically unrealistic amounts of unrepresentative training data. Rather, the talk will focus on critical developments outside the field of speech and language - particularly in the neurosciences and in cognitive robotics - and will show how insights into the behaviour of living systems in general and human beings in particular could have a direct impact on the next generation of spoken language systems. In particular, it will be suggested that future progress in spoken language understanding might require us to refocus our attention on generative models of spoken language production (derived from models of movement and action), and that traditional learning paradigms using off-line training with static corpora need to be replaced by on-line interactive skill acquisition in real-world situations and environments.
It will be suggested that, not only is such an approach necessary if we are to have technology that can figure out why people are saying what they are saying (and what to do about it), but that progress in this area is essential if we are to move towards a new generation of context-aware intelligent agents that are capable of engaging in genuinely communicative behaviour with their human users.