Recent ensemble clustering techniques have been shown to
be effective in improving the accuracy and stability of standard clustering
algorithms. However, an inherent drawback of these techniques is
the computational cost of generating and combining multiple clusterings
of the data. In this paper, we present an efficient kernel-based ensemble
clustering method suitable for application to large, high-dimensional
datasets such as text corpora. To decrease the time required to generate
the ensemble members, we employ a prototype reduction scheme that
makes use of a density-biased selection strategy to construct a smaller
kernel matrix that represents a good proxy for the original data. Evaluations
performed on text data demonstrate that this process leads to
a significant decrease in running time, while maintaining high clustering
accuracy.
Please note: There is a known bug in some browsers that causes an
error when a user tries to view large pdf file within the browser window.
If you receive the message "The file is damaged and could not be
repaired", please try one of the solutions linked below based on the
browser you are using.
Items in TARA are protected by copyright, with all rights reserved, unless otherwise indicated.