Consistent Mode-Finding for Parametric and Non-Parametric Clustering
Citation:
Tobin, Joshua, Consistent Mode-Finding for Parametric and Non-Parametric Clustering, Trinity College Dublin, School of Computer Science & Statistics, Statistics, 2022Download Item:
Abstract:
Density peaks clustering detects modes as points with high density and large distance to points of higher density. To cluster the observed samples, points are assigned to the same cluster as their nearest neighbor of higher density. This efficient and intuitive approach has, in recent years, grown in popularity in applications. Despite its widespread use, little work has been completed aiming at understanding the theoretical properties of the density peaks method, as well as its strengths and limitations when clustering. Here, we provide a detailed analysis of the density peaks clustering algorithm. We demonstrate that it recovers consistent estimates of the modes of the underlying density and correctly clusters the data with high probability. However, deficiencies of the density peaks clustering methodology are also highlighted. Noise in the density estimates can lead to errors when estimating modes and incoherent cluster assignments. Two adaptations of the density peaks clustering approach are proposed to remedy these issues. The first method seeks to detect modal sets rather than point modes in the data. This reduces the sensitivity of the clusterings to fluctuations in the density estimate. The second approach partitions the data into regions mutually separated by areas of low density, before applying the density peaks clustering algorithm. Doing so ensures that the result of the cluster assignment method meets the conceptual understanding of a correct clustering. Both approaches are analyzed theoretically and their superior performance is demonstrated on simulated and real-world datasets. Moreover, they are shown to be suitable for modern clustering applications in computer vision. Model-based clustering methods, where clusters are taken to be unimodal components in a finite mixture model, are then considered. Motivated by the consistent estimates of the modes provided by the density peaks clustering algorithm, a novel model-based clustering method is proposed. This approach uses a set of high density points as initial mean parameters, and iteratively prunes them to return a sequence of nested clusterings. The method outperforms popular model-based clustering methods. To conclude, the contributions of the thesis are used to motivate suggestions for future research.
Sponsor
Grant Number
Government of Ireland
Description:
APPROVED
Author: Tobin, Joshua
Advisor:
Zhang, MimiPublisher:
Trinity College Dublin. School of Computer Science & Statistics. Discipline of StatisticsType of material:
ThesisAvailability:
Full text availableMetadata
Show full item recordLicences:
Related items
Showing items related by title, author, creator and subject.
-
Clusters in Ireland : the Irish dairy processing industry: an application of Porter's cluster analysis
O'Connell, Larry; Van Egeraat, Chris; Enright, Pat (National Economic and Social CouncilIE, 1997-11) -
Creative Clusters : Economic Analysis of the Current Status and Future Clustering Potential for the Crafts Industry in Ireland
Crafts Council of Ireland; Indecon International Economic Consultants (Crafts Council of IrelandIE, 2013-10) -
Forward-Stagewise Clustering: An Algorithm for Convex Clustering
Zhang, Mimi (2019)This paper proposes an exceptionally simple algorithm, called forward-stagewise clustering, for convex clustering. Convex clustering has drawn recent attention since it nicely addresses the instability issue of traditional ...