Tools for analysing the voice : developments in glottal source and quality analysis

This thesis documents a range of research carried out on the topic of glottal source and voice quality analysis. Initially, a review is given of the physiological and acoustic correlates of different vocal settings. This is followed by a discussion of the importance of glottal source and voice quality variation in spoken communication, and the impact of modelling these aspects on speech technology. Despite the potential benefit of acoustic characterisation of the glottal source for speech technology existing algorithms often suffer from a lack of robustness. To address this, the present thesis describes and evaluates a set of novel algorithms aimed at improving the robustness. The algorithms come under two headings; fine-grained, glottal synchronous methods and coarse-grained, voice quality detection methods. In terms of fine-grained methods a new algorithm, SE-VQ, has been developed which is optimised for analysis of a range of voice qualities. While maintaining the precision of the state-of-the-art on neutral speech, the new algorithm is shown to signihcantly improve performance on creaky voice regions. SE-VQ is then utilised as part of a novel LF model based parameterisation method (DyProg-LF) of estimated glottal source signals. The dynamic programming algorithm used in DyProg-LF is shown to avoid the coimnon problem of inconsistencies in parameter trajectories and is shown to provide better parameterisation than the state-of-the-art on both a carefully controlled dataset with manually obtained reference values as well as on a larger speech dataset. For coarse-grained methods, a new parameter, the Maxima Dispersion Quotient (MDQ), is proposed for discriminating breathy to tense voice. MDQ was shown to outperform existing parameters for discriminating the voice qualities, particularly for continuous speech, and also in terms of robustness to additive noise. A new method for detecting creaky voice is also described which utilises two parameters derived from the Linear Prediction-residual signal. These parameters are used as input features to a decision tree classifier which is shown to significantly outperform the state-of-the-art on a range of speech data varying in terms of speaker, gender, language, recording condition and speaking style. Finally, a software package, the Voice analysis toolkit, which contains the algorithms developed as part of this thesis, has been made publicly available. This has been done to encourage usage of the newly developed algorithms in applied work and future algorithm evaluations.

Browse

All of TARA

This Collection

Statistics

Tools for analysing the voice : developments in glottal source and quality analysis

File Type:

Item Type:

Date:

Author:

Access:

Citation:

Download Item:

Abstract:

URI:

Advisor:

Qualification name:

Publisher:

Note:

Type of material:

URI:

Collections:

Availability:

Keywords: