Browsing School of Engineering by Author "Harte, Naomi"

Attention-based audio-visual fusion for robust automatic speech recognition

Harte, Naomi (2018)

Automatic speech recognition can potentially benefit from the lip motion patterns, complementing acoustic speech to improve the overall recognition performance, particularly in noise. In this paper we propose an audio-visual ...

Automatic recognition of ageing speakers

Kelly, Finnian (Trinity College (Dublin, Ireland). Department of Electronic & Electrical Engineering, 2014)

Discriminitive Multi-Resolution Sub-Band and Segmental Phonetic Model Combination

Harte, Naomi (2000)

A joint discriminative framework for multi-resolution sub-band HMMs and a hybrid segmental phonetic model is presented which combines independent likelihood scores using class dependent weightings trained to a minimum ...

Few-Shot Learning and Learnable Frontends for Remote Monitoring of Bird Populations

Anderson, Mark William (Trinity College Dublin. School of Engineering. Discipline of Electronic & Elect. Engineering, 2024)

In response to changing ecological and environmental factors, automated monitoring of bird populations has become imperative for conservation efforts and as an indicator of change in its own right. This is particularly ...

Mapping theoretical and methodological perspectives for understanding speech interface interactions

Harte, Naomi (2019)

The use of speech as an interaction modality has grown considerably through the integration of Intelligent Personal Assistants (IPAs- e.g. Siri, Google Assistant) into smartphones and voice based devices (e.g. Amazon Echo). ...

Multimodal continuous turn-taking prediction using multiscale RNNs

Harte, Naomi (2018)

In human conversational interactions, turn-taking exchanges can be coordinated using cues from multiple modalities. To design spoken dialog systems that can conduct fluid interactions it is desirable to incorporate cues ...

Objective Assessment of Perceptual Audio Quality Using ViSQOLAudio

Harte, Naomi; Kokaram, Anil (2017)

Digital audio broadcasting services transmit substantial amounts of data that is encoded to minimize bandwidth whilst maximizing user quality of experience. Many large service providers continually alter codecs to improve ...

On Parsing Visual Sequences with the Hidden Markov Model

Harte, Naomi; Lennon, Daire; Kokaram, Anil (2009)

Hidden Markov Models have been employed in many vision applications to model and identify events of interest. Their useis common in applications where HMMs are used to classify previously divided ...

Perceived Audio Quality for Streaming Stereo Music.

Harte, Naomi; Kokaram, Anil; Hines, Andrew; Gillen, Eoin; Kelly, Damien; Skoglund, Jan (2014)

Users of audio-visual streaming services expect an ever increasing quality of experience. Channel bandwidth remains a bottleneck commonly addressed with lossy compression schemes for both the video and audio streams. ...

Predicting speech intelligibility

Hines, Andrew (Trinity College (Dublin, Ireland). Department of Electronic & Electrical Engineering, 2012)

Segmental evaluation of Text-to-Speech synthesis

Pandey, Ayushi (Trinity College Dublin. School of Engineering. Discipline of Electronic & Elect. Engineering, 2024)

Advancements in speech synthesis technology have mandated the need for reliable methods for its evaluation. Present day evaluation, dominated by subjective listening tests, provides at best, a general overall picture of ...

Survival at the museum: A cooperation experiment with emotionally expressive virtual characters

Harte, Naomi; Mc Donnell, Rachel (2018)

Correctly interpreting an interlocutor's emotional expression is paramount to a successful interaction. But what happens when one of the interlocutors is a machine? The facilitation of human-machine communication and ...

Towards predicting dialog acts from previous speakers' non-verbal cues

Harte, Naomi (2017)

In studies of response times during conversational turn-taking, a modal time of 200 ms has been observed to be a universal value that exists across languages and cross-culturally. This 200 ms value is also seen as the limit ...

Variation of features of interframe dependent HMM for speech recognition

Harte, Naomi (1998)

The effects are explored of using different dynamic features in conjunction with an HMM that permits a dependency on both preceeding and succeeding frames. In particular, features which capture dynamic information are ...

ViSQOLAudio: An objective audio quality metric for low bitrate codecs

Harte, Naomi; Kokaram, Anil; Hines, Andrew; Gillen, Eoin; Kelly, Damien; Skoglund, Jan (2015)

Streaming services seek to optimise their use of bandwidthacross audio and visual channels to maximise the quality of experiencefor users. This letter evaluates whether objective quality metrics can pre-dict the audio ...

Browse

My Account