Efficient Ensemble Methods for Document Clustering
File Type:
PDFItem Type:
Technical ReportDate:
2006-08-18Citation:
Efficient Ensemble Methods for Document Clustering, Derek Greene and Padraig Cunningham, 18 August 2006,TCD-CS-2006-48Download Item:
Abstract:
Recent ensemble clustering techniques have been shown to
be effective in improving the accuracy and stability of standard clustering
algorithms. However, an inherent drawback of these techniques is
the computational cost of generating and combining multiple clusterings
of the data. In this paper, we present an efficient kernel-based ensemble
clustering method suitable for application to large, high-dimensional
datasets such as text corpora. To decrease the time required to generate
the ensemble members, we employ a prototype reduction scheme that
makes use of a density-biased selection strategy to construct a smaller
kernel matrix that represents a good proxy for the original data. Evaluations
performed on text data demonstrate that this process leads to
a significant decrease in running time, while maintaining high clustering
accuracy.
Author: Greene, Derek; Cunningham, Padraig
Publisher:
Department of Computer Science, Trinity College DublinType of material:
Technical ReportCollections
Series/Report no:
Computer Science Department, Technical ReportTCD-CS-2006-48
Availability:
Full text availableKeywords:
Document clusteringMetadata
Show full item recordLicences: