Towards a greater understanding of the neurophysiology of natural audiovisual speech processing : a system identification approach to EEG
Citation:
Michael J. Crosse, 'Towards a greater understanding of the neurophysiology of natural audiovisual speech processing : a system identification approach to EEG', [thesis], Trinity College (Dublin, Ireland). Department of Electronic & Electrical Engineering, 2016Download Item:
Abstract:
Seeing a speaker’s face as he or she talks can greatly help in understanding what the speaker is saying, especially in adverse hearing conditions – a principle known as inverse effectiveness. This is because the speaker’s facial movements not only relay information about what the speaker is saying, but also, importantly, when the speaker is saying it. Studying how the brain exploits this timing relationship to combine information from continuous auditory and visual speech has traditionally been difficult due to methodological constraints. In contrast, when incongruent auditory and visual information are presented concurrently, it can not only hinder a listener’s perception, but even cause him or her to perceive illusory information that was not presented through either modality. Efforts to determine the neurophysiological underpinnings of this phenomenon have also been hampered by out-dated methodological approaches, as well as inaccessibility to state-of-the-art modelling techniques. Here, we introduce a new system identification (SI) framework for investigating these everyday neural processes using relatively inexpensive and non-invasive scalp recordings. Chapter 3 begins by describing the application SI techniques for studying sensory processing in humans using naturalistic stimuli, specifically in the context of neurophysiology. The aim of this chapter is to introduce a new MATLAB-based SI toolbox, called mTRF Toolbox, developed as part of this research work. Concrete examples demonstrating how to model the relationship between continuous speech stimuli and continuous EEG responses are worked through in full. Several key features of the toolbox are explored and compared to traditional methods and its applications and limitations are discussed. Chapter 4 examines the role of temporal and contextual congruency in audiovisual (AV) speech processing using the mTRF Toolbox. The development of a novel framework for studying multisensory integration is described, yielding new insights into AV speech processing. Specifically, we show that cortical activity tracks the acoustic speech signal more reliably during congruent AV presentation, while incongruent AV stimuli actually inhibit neural entrainment to speech. The enhancement effect produced by congruent AV stimuli is shown to be most prominent at the rate of syllabic information (2-6 Hz). Furthermore, we demonstrate that neural entrainment to auditory speech during silent lipreading is highly predictive of speech-reading accuracy. Chapter 5 examines AV speech processing at an acoustic signal-to-noise ratio that maximizes the perceptual benefit conferred by multisensory processing relative to unisensory processing. Here we show that the influence of visual input on the neural tracking of acoustic speech is significantly greater in noisy than in quiet listening conditions, in line with the principle of inverse effectiveness. While envelope tracking during audio-only speech is shown to be greatly reduced by background noise at an early processing stage, it is markedly restored by the addition of visual speech input. We also find that multisensory integration occurs at much lower frequencies in background noise and is predictive of the multisensory gain in behavioural performance at a time lag of ~250 ms. Critically, we demonstrate that inverse effectiveness in natural audiovisual speech processing relies on crossmodal integration over long temporal windows. Chapter 6 investigates the temporal dynamics of auditory cortical activation associated with silent lipreading by looking at the impact of speech-reading accuracy on neural entrainment to the absent acoustic signal. Specifically, this study provides moderate evidence to suggest that cortical activity in auditory regions is modulated in a way that reflects the temporal dynamics of the absent acoustic information, as if synthesising auditory processing by exploiting correlated visual speech input. While the non-invasive brain imaging technique implemented in this body of research lacks the spatial resolution to definitively elucidate certain aspects of the neural mechanisms that underpin AV speech processing, its high temporal resolution allows for accurate characterisation of the spectrotemporal dynamics of multisensory integration. The findings presented here provide new, valuable insights into this aspect of AV speech processing in the human brain. The application of this novel SI framework for studying AV speech processing is also considered in the context of clinical disorders with impaired multisensory processing and brain-computer interface technology.
Sponsor
Grant Number
Programme for Research in Third-Level Institutions
European Regional Development fund
Author: Crosse, Michael J.
Qualification name:
Doctor of Philosophy (Ph.D.)Publisher:
Trinity College (Dublin, Ireland). Department of Electronic & Electrical EngineeringNote:
TARA (Trinity’s Access to Research Archive) has a robust takedown policy. Please contact us if you have any concerns: rssadmin@tcd.ieType of material:
thesisCollections
Availability:
Full text availableMetadata
Show full item recordLicences: