Catharine Oertel, Stefan Scherer, Nick Campbell, On the use of multimodal cues for the prediction of involvement in spontaneous conversation, INTERSPEECH-2011, Interspeech 2011, Florence, Italy, 28-31 August 2011, 2011, 1541-1544
Quantifying the degree of involvement of a group of participants in a conversation is a task which humans accomplish every day, but it is something that, as of yet, machines are unable to do. In this study we first investigate the correlation between visual cues (gaze and blinking rate) and involvement. We then test the suitability of prosodic cues (acoustic model) as well as gaze and blinking (visual model) for the prediction of the degree of involvement by using a support vector machine (SVM). We also test whether the fusion of the acoustic and the visual model im- proves the prediction. We show that we are able to predict three classes of involvement with an reduction of error rate of 0.30 (accuracy =0.68).
Please note: There is a known bug in some browsers that causes an
error when a user tries to view large pdf file within the browser window.
If you receive the message "The file is damaged and could not be
repaired", please try one of the solutions linked below based on the
browser you are using.
Items in TARA are protected by copyright, with all rights reserved, unless otherwise indicated.