StatisticsStatisticshttp://hdl.handle.net/2262/762022-05-19T04:38:10Z2022-05-19T04:38:10ZSaddlepoint Approximation for the Generalized Inverse Gaussian Levy ProcessZhang, Mimihttp://hdl.handle.net/2262/983062022-04-01T07:45:39Z2022-01-01T00:00:00ZSaddlepoint Approximation for the Generalized Inverse Gaussian Levy Process
Zhang, Mimi
The generalized inverse Gaussian (GIG) Lévy process is a limit of compound Poisson processes, including the stationary gamma process and the stationary inverse Gaussian process as special cases. However, fitting the GIG Lévy process to data is computationally intractable due to the fact that the marginal distribution of the GIG Lévy process is not convolution-closed.
The current work reveals that the marginal distribution of the GIG L ́evy process admits a simple yet extremely accurate saddlepoint approximation. Particularly, we prove that if the order parameter of the GIG distribution is greater than or equal to -1, the marginal distribution can be approximated accurately – no need to normalize the saddlepoint density. Accordingly, maximum likelihood estimation is simple and quick, random number generation from the marginal
distribution is straightforward by using Monte Carlo methods, and goodness-of-fit testing is undemanding to perform. Therefore, major numerical impediments to the application of the GIG Lévy process are removed. We demonstrate the accuracy of the saddlepoint approximation via various experimental setups.
PUBLISHED
2022-01-01T00:00:00ZDCF: An Efficient and Robust Density-Based Clustering MethodZhang, Mimihttp://hdl.handle.net/2262/971422022-03-11T08:33:19Z2021-01-01T00:00:00ZDCF: An Efficient and Robust Density-Based Clustering Method
Zhang, Mimi
Density-based clustering methods have been shown to achieve promising results in modern data mining applications.
A recent approach, Density Peaks Clustering (DPC), detects modes as points with high density and large distance to points of higher density, and hence often fails to detect low-density clusters in the data. Furthermore, DPC has quadratic complexity. We here develop a new clustering algorithm, aiming at improving the applicability and efficiency of the peak-finding technique. The improvements are threefold: (1) the new algorithm is applicable to large datasets; (2) the algorithm is capable of detecting clusters of varying density; (3) the algorithm is competent at deciding the correct number of clusters, even when the number of clusters is very high. The clustering performance of the algorithm is greatly enhanced by directing the peak-finding technique to discover modal sets, rather than point modes. We present a theoretical
analysis of our approach and experimental results to verify that our algorithm works well in practice. We demonstrate a potential application of our work for unsupervised face recognition.
PUBLISHED
2021-01-01T00:00:00ZSemantic image segmentation based on spatial relationships and inexact graph matchingDahyot, Rozennhttp://hdl.handle.net/2262/957312021-03-17T18:01:55Z2020-01-01T00:00:00ZSemantic image segmentation based on spatial relationships and inexact graph matching
Dahyot, Rozenn
We propose a method for semantic image segmentation, combining a deep neural network and spatial relationships between image regions, encoded in a graph representation of the scene. Our proposal is based on inexact graph matching, formulated as a quadratic assignment problem applied to the output of the neural network. The proposed method is evaluated on a public dataset used for segmentation of images of faces, and compared to the U-Net deep neural network that is widely used for semantic segmentation. Preliminary results show that our approach is promising. In terms of Intersection-over-Union of region bounding boxes, the improvement is of 2.4% in average, compared to U-Net, and up to 24.4% for some regions. Further improvements are observed when reducing the size of the training dataset (up to 8.5% in average).
2020-01-01T00:00:00ZAn Integrated Framework for Estimating the Number of Classes with Application for Species EstimationAl-Ghamdi, Asmaahttp://hdl.handle.net/2262/953852021-02-25T18:02:20Z2021-01-01T00:00:00ZAn Integrated Framework for Estimating the Number of Classes with Application for Species Estimation
Al-Ghamdi, Asmaa
The two most common approaches for estimating the number of distinct classes
within a population are either to use sampling data directly with combinatorial
arguments or to extrapolate historical discovery data. However, in the former
case, such detailed sampling data is often unavailable, while the latter approach
makes assumptions on the form of parametric curves used to fit the discovery
data, that are often lacking in theoretical justification. Instead, we propose an
integrated transdisciplinary framework that dissolves the boundaries between the
above two approaches. This is achieved by directly describing the samplingdiscovery
process in parallel with describing a co-variate latent e↵ort process,
where we have historical discovery data for the former process and some proxy
data for the latent process. The linkage between these two processes allows one to
form data on sampling records by forcing some constraints on how many samples
were taken over time. Due to the nature of the constrained data, many inference
techniques become infeasible. However, simulation-based methods such as
Approximate Bayesian Computation remain available. Our proposed approach
is demonstrated and analysed through many simulation experiments, and finally
applied in the ecology field to estimate the number of species as an example of
the number of classes problem.
APPROVED
2021-01-01T00:00:00Z