An analysis of content-free dialogue representation, supervised classification methods and evaluation metrics for meeting topic segmentation
Citation:
Jing Su, 'An analysis of content-free dialogue representation, supervised classification methods and evaluation metrics for meeting topic segmentation', [thesis], Trinity College (Dublin, Ireland). School of Computer Science & Statistics, 2011, pp 164Download Item:
Su, Jing_TCD-SCSS-PHD-2011-06.pdf (PDF) 1.444Mb
Abstract:
Automatic topic segmentation in meeting recordings is intensively investigated due to the fact that topic is a salient discourse structure and it indicates natural reference points for contents. Unlike commonly used text-based topic segmentation methods, this thesis investigates content-free topic segmentation methods. Among the reasons for investigating such methods are: understanding the influence of conversational features in the structure of meeting dialogues, avoiding the complexity of transcription, and protecting confidentiality in sensitive recordings. The research reported here encompasses three major components: classifier selection, sample representation and feature selection, and a set of robust evaluation metrics. Classification, as a supervised learning method, is employed to distinguish vocalisations that signal topic boundaries from other vocalisations. The unbalanced nature of such vocalisation data sets poses a challenge to commonly used classifiers. However, adapted proportional threshold naıve Bayes classifiers and Boosting classifiers have been found to perform well with proper combinations of vocalisation features. They exhibit segmentation accuracy competitive with text dependent approaches. Sample representation determines the effectiveness of content-free features. A Vocalisation Event (VE) is proposed as classification unit (instance), in contrast to the fixed length analysis window employed by previous approaches. VE has the advantage of naturally accommodating features such as speaker change, pause, overlap and speaker role. Moreover, VE can be located from audio recordings with speaker segmentation techniques. Experiments show that vocalisation features are more effective than prosody features in topic segmentation. Based on VE, a Vocalisation Horizon (VH) is proposed as a novel feature concept, in order to indicate temporal or sequence information among classification instances. VE is found to increase segmentation accuracy considerably. Although Pk and W D are commonly used segmentation metrics, it was found that Pk and W D alone do not suffice to assess the predicted segmentation. A supplemental metric, balance factor ω, is proposed to gauge the ratio of predicted and reference boundaries. Balance factor ω together with Pk and W D support more reliable judgements of segmentation goodness. These content-free methods were successfully tested on both the Augmented Multiparty Interaction corpus (AMI), which contains simulated meetings, and on the Multidisciplinary Medical Team Meetings (MDTM) corpus, which contains real meetings. MDTMs are better structured meetings than AMI and are segmented with higher accuracy, which indicates the relationship between meeting content and structures
Author: Su, Jing
Advisor:
Luz, SaturninoQualification name:
Doctor of Philosophy (Ph.D.)Publisher:
Trinity College (Dublin, Ireland). School of Computer Science & StatisticsNote:
TARA (Trinity's Access to Research Archive) has a robust takedown policy. Please contact us if you have any concerns: rssadmin@tcd.ieType of material:
thesisCollections:
Availability:
Full text availableLicences: