The University of Dublin | Trinity College -- Ollscoil Átha Cliath | Coláiste na Tríonóide
TARA Trinity's Access to Research Archive
Home :: Log In :: Submit :: Alerts ::

TARA >
School of Computer Science and Statistics >
Computer Science >
Computer Science (Scholarly Publications) >

Please use this identifier to cite or link to this item: http://hdl.handle.net/2262/2418

Title: Efficient Ensemble Methods for Document Clustering
Author: Greene, Derek
Cunningham, Pádraig
Keywords: Document clustering
Issue Date: 18-Aug-2006
Publisher: Department of Computer Science, Trinity College Dublin
Citation: Efficient Ensemble Methods for Document Clustering, Derek Greene and Pádraig Cunningham, 18 August 2006,TCD-CS-2006-48
Series/Report no.: Computer Science Department, Technical Report
TCD-CS-2006-48
Abstract: Recent ensemble clustering techniques have been shown to be effective in improving the accuracy and stability of standard clustering algorithms. However, an inherent drawback of these techniques is the computational cost of generating and combining multiple clusterings of the data. In this paper, we present an efficient kernel-based ensemble clustering method suitable for application to large, high-dimensional datasets such as text corpora. To decrease the time required to generate the ensemble members, we employ a prototype reduction scheme that makes use of a density-biased selection strategy to construct a smaller kernel matrix that represents a good proxy for the original data. Evaluations performed on text data demonstrate that this process leads to a significant decrease in running time, while maintaining high clustering accuracy.
URI: https://www.cs.tcd.ie/publications/tech-reports/reports.06/TCD-CS-2006-48.pdf
http://hdl.handle.net/2262/2418
Appears in Collections:Computer Science (Scholarly Publications)

Files in This Item:

File Description SizeFormat
TCD-CS-2006-48.pdf193.78 kBAdobe PDFView/Open


This item is protected by original copyright


Please note: There is a known bug in some browsers that causes an error when a user tries to view large pdf file within the browser window. If you receive the message "The file is damaged and could not be repaired", please try one of the solutions linked below based on the browser you are using.

Items in TARA are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! DSpace Software Copyright © 2002-2010  Duraspace - Feedback