Integrating a voice analysis-synthesis system with a TTS framework for controlling affect and speaker identity
Item Type:Conference Paper
Citation:Muprhy, A., Yanushevskaya, I., Ní Chasaide, A., Gobl, C., Integrating a voice analysis-synthesis system with a TTS framework for controlling affect and speaker identity, 2021 32nd Irish Signals and Systems Conference (ISSC), Athlone, Ireland, 10-11 June 2021, 2021, 1 - 6
Murphy et al. ISSC 2021 camera ready 12 5.pdf (Published (author's copy) - Peer Reviewed) 770.3Kb
This paper reports an experiment exploring how a voice analysis-synthesis system, GlórCáil, can be used to add expressiveness to the synthetic voice in text-to-speech (TTS) systems. This implementation focuses on the Irish ABAIR TTS voices, where such voice control would facilitate many current/envisaged applications. GlórCáil allows voice control of synthesized speech, and for this experiment was integrated into a DNN-based TTS framework. Utterances were generated with f0, voice quality and vocal tract parameter manipulations targeting shifts in speaker identity and in the affective coloring of utterances. Scaling factors used for the manipulations were suggested in an earlier study. They involved global changes without sentence-internal dynamic variation, with a view to ascertain whether such global shifts might alter listeners' perception of speaker identity and affect. Results demonstrate affect shifts compatible with expectations. However, there were confounding factors. The female/child voices were poorly differentiated, which was expected given the similarity in the scaling factors used. The affect transformations suggest the baseline voice used had an intrinsically sad quality so that there is weak differentiation between the sad and no emotion stimuli. Male angry voice was the least successful, suggesting that dynamic, within-utterance variation is essential for the signaling of certain affects.
Other Titles:2021 32nd Irish Signals and Systems Conference (ISSC)
Type of material:Conference Paper
Availability:Full text available