Show simple item record

dc.contributor.advisorLuz, Saturnino
dc.contributor.authorSu, Jing
dc.date.accessioned2016-11-07T16:30:18Z
dc.date.available2016-11-07T16:30:18Z
dc.date.issued2011
dc.identifier.citationJing Su, 'An analysis of content-free dialogue representation, supervised classification methods and evaluation metrics for meeting topic segmentation', [thesis], Trinity College (Dublin, Ireland). School of Computer Science & Statistics, 2011, pp 164
dc.identifier.otherTHESIS 9508
dc.identifier.urihttp://hdl.handle.net/2262/77667
dc.description.abstractAutomatic topic segmentation in meeting recordings is intensively investigated due to the fact that topic is a salient discourse structure and it indicates natural reference points for contents. Unlike commonly used text-based topic segmentation methods, this thesis investigates content-free topic segmentation methods. Among the reasons for investigating such methods are: understanding the influence of conversational features in the structure of meeting dialogues, avoiding the complexity of transcription, and protecting confidentiality in sensitive recordings. The research reported here encompasses three major components: classifier selection, sample representation and feature selection, and a set of robust evaluation metrics. Classification, as a supervised learning method, is employed to distinguish vocalisations that signal topic boundaries from other vocalisations. The unbalanced nature of such vocalisation data sets poses a challenge to commonly used classifiers. However, adapted proportional threshold naıve Bayes classifiers and Boosting classifiers have been found to perform well with proper combinations of vocalisation features. They exhibit segmentation accuracy competitive with text dependent approaches. Sample representation determines the effectiveness of content-free features. A Vocalisation Event (VE) is proposed as classification unit (instance), in contrast to the fixed length analysis window employed by previous approaches. VE has the advantage of naturally accommodating features such as speaker change, pause, overlap and speaker role. Moreover, VE can be located from audio recordings with speaker segmentation techniques. Experiments show that vocalisation features are more effective than prosody features in topic segmentation. Based on VE, a Vocalisation Horizon (VH) is proposed as a novel feature concept, in order to indicate temporal or sequence information among classification instances. VE is found to increase segmentation accuracy considerably. Although Pk and W D are commonly used segmentation metrics, it was found that Pk and W D alone do not suffice to assess the predicted segmentation. A supplemental metric, balance factor ω, is proposed to gauge the ratio of predicted and reference boundaries. Balance factor ω together with Pk and W D support more reliable judgements of segmentation goodness. These content-free methods were successfully tested on both the Augmented Multiparty Interaction corpus (AMI), which contains simulated meetings, and on the Multidisciplinary Medical Team Meetings (MDTM) corpus, which contains real meetings. MDTMs are better structured meetings than AMI and are segmented with higher accuracy, which indicates the relationship between meeting content and structures
dc.format1 volume
dc.language.isoen
dc.publisherTrinity College (Dublin, Ireland). School of Computer Science & Statistics
dc.relation.isversionofhttp://stella.catalogue.tcd.ie/iii/encore/record/C__Rb15116178
dc.subjectComputer Science, Ph.D.
dc.subjectPh.D. Trinity College Dublin
dc.titleAn analysis of content-free dialogue representation, supervised classification methods and evaluation metrics for meeting topic segmentation
dc.typethesis
dc.type.supercollectionrefereed_publications
dc.type.supercollectionthesis_dissertations
dc.type.qualificationlevelDoctoral
dc.type.qualificationnameDoctor of Philosophy (Ph.D.)
dc.rights.ecaccessrightsopenAccess
dc.format.extentpaginationpp 164
dc.description.noteTARA (Trinity's Access to Research Archive) has a robust takedown policy. Please contact us if you have any concerns: rssadmin@tcd.ie


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record