DSpace Academic/Research Unit: StatisticsStatisticshttp://hdl.handle.net/2262/762014-02-26T18:37:55Z2014-02-26T18:37:55ZConsiderations on the UK Re-Arrest Hazard Data Analysis (How Model Selection Can Alter Conclusions for Policy Development)WILSON, SIMON PAULHOULDING, BRETThttp://hdl.handle.net/2262/673672013-11-25T15:46:25Z2011-01-01T00:00:00ZTitle: Considerations on the UK Re-Arrest Hazard Data Analysis (How Model Selection Can Alter Conclusions for Policy Development)
Author: WILSON, SIMON PAUL; HOULDING, BRETT
Abstract: The offence risk posed by individuals who are arrested, but where subsequently no charge or caution is administered, has been used as an argument for justifying the retention of such individuals’ DNA and identification profiles. Here we consider the UK Home Office arrest-to-arrest data analysis, and find it to have limited use in indicating risk of future offence. In doing so, we consider the appropriateness of the statistical methodology employed and the implicit assumptions necessary for making such inference concerning the re-arrest risk of a further individual. Additionally, we offer an alternative model that would provide an equally accurate fit to the data, but which would appear to have sounder theoretical justification. Finally, we consider the implications of using such statistical inference in formulating national policy, and highlight a number of sociological factors that could be taken into account so as to enhance the validity of any future analysis.
Description: PUBLISHED2011-01-01T00:00:00ZBayesian spatiotemporal model of fMRI data using transfer functionsWILSON, SIMON PAULhttp://hdl.handle.net/2262/673662014-01-09T11:33:38Z2010-01-01T00:00:00ZTitle: Bayesian spatiotemporal model of fMRI data using transfer functions
Author: WILSON, SIMON PAUL
Abstract: This research describes a new Bayesian spatiotemporal model to analyse
BOLD fMRI studies. In the temporal dimension, we describe the shape of
the hemodynamic response function (HRF) with a transfer function model.
In the spatial dimension, we use a Gaussian Markov random field prior on
the parameter indicating activations that embody our prior knowledge that
evoked responses are spatially contiguous. The proposal constitutes an extension
of the spatiotemporal model presented in a previous approach [Quir´os,
A., Montes Diez, R. and Gamerman, D. (2010). Bayesian spatiotemporal
model of fMRI data, Neuroimage, 49: 442-456.], o↵ering more flexibility in
the estimation of the HRF and computational advantages in the resulting
MCMC algorithm. Simulations from the model are performed in order to
ascertain the performance of the sampling scheme and the ability of the posterior
to estimate model parameters, as well as to check the model sensitivity
to signal to noise ratio. Results are shown on synthetic data and on a real
data set from a block-design fMRI experiment, showing good performance
in the detection of activity and significant flexibility in the estimation of the
Description: PUBLISHED2010-01-01T00:00:00ZPredicting the number of known and unknown species in European seas using rates of descriptionWILSON, SIMON PAULhttp://hdl.handle.net/2262/673652013-12-11T12:40:21Z2011-01-01T00:00:00ZTitle: Predicting the number of known and unknown species in European seas using rates of description
Author: WILSON, SIMON PAUL
Abstract: Aim In this paper, we compare species description rates to predict the numbers of undescribed species. These data are used to discuss the merits of various attempts to estimate species richness in the oceans.
Location European marine areas.
Methods Predictions of how many species may exist on Earth have lacked an inventory of how many have been described, except for a few small taxa. The ocean is a good place to start an inventory because it includes all but one of the phyla and most classes of life on Earth. The European Register of Marine Species (ERMS) was compiled by taxonomic experts, covered all marine taxa, and accounted for synonyms. Reflecting taxonomic history, Europe's species are the best described in the world.
Results ERMS listed 29,713 species of animals, plants and protists, but excluded bacteria and viruses. An estimated 6500 described species were not included. The best prediction of the number of species remaining to be described was 5613. Plots of years when species were first described showed no decrease in the rate of description for any taxa except birds, mammals and krill. If taxonomic effort has increased, whether due to more resources globally or greater efficiencies of productivity, then description rates per unit effort may be declining and the number of undescribed species may be lower than predicted. However, apart from reduced rates of description during the World Wars, there were no changes in description rates that could be easily attributed to such factors.
Conclusions There are about 36,000 species described from European seas, and we predict that 40,000 to 48,000 may exist. This comprises 15% of the estimated 230,000 described marine species. However, this area is well known compared with other seas and the proportion of species yet to be discovered will be higher elsewhere.
Description: PUBLISHED2011-01-01T00:00:00ZShort-term traffic flow forecasting with A-SVARMAWILSON, SIMON PAULhttp://hdl.handle.net/2262/673642013-11-25T15:47:34Z2012-01-01T00:00:00ZTitle: Short-term traffic flow forecasting with A-SVARMA
Author: WILSON, SIMON PAUL
Abstract: Short-term Traffic Flow Forecasting (STFF), the process of predicting future traffic conditions based on historical and real-time observations, is an essential aspect of Intelligent Transportation Systems (ITS). The existing well-known algorithms used for STFF include time-series analysis based techniques, among which the seasonal Autoregressive Moving Average (ARMA) model is one of the most precise methods used in this field. The effectiveness of STFF in an urban transport network can be fully be realized only in its multivariate form where traffic flow is predicted at multiple sites simultaneously. In this paper, this concept in explored utilizing an Additive Seasonal Vector ARMA (A-SVARMA) model to predict traffic flow in short-term future considering the spatial dependency among multiple sites. The Dynamic Linear Model (DLM) representation of the A-SVARMA model has been used here to reduce the number of latent variables. The parameters of the model have been estimated in a Bayesian inference framework employing a Markov Chain Monte Carlo (MCMC) sampling method. The efficiency of the proposed prediction algorithm has been evaluated by modelling real-time traffic flow observations available from a certain junction in the city centre of Dublin.
Description: ACCEPTED2012-01-01T00:00:00ZA probability model of system downtime with implications for optimal warranty designWILSON, SIMON PAULHOULDING, BRETThttp://hdl.handle.net/2262/673632013-11-25T15:53:30Z2009-01-01T00:00:00ZTitle: A probability model of system downtime with implications for optimal warranty design
Author: WILSON, SIMON PAUL; HOULDING, BRETT
Abstract: Traditional approaches to modeling the availability of a system often do not formally take into account
uncertainty over the parameter values of the model. Such models are then frequently criticised because the observed
reliability of a system does not match that predicted by the model. Instead this paper extends a recently published
segregated failures model so that, rather than providing a single figure for the availability of a system, uncertainty
over model parameter values are incorporated and a predictive probability distribution is given. This predictive
distribution is generated in a practical way by displaying the uncertainties and dependencies of the parameters
of the model through a Bayesian network. Permitting uncertainty in the reliability model then allows the user to
determine whether the predicted reliability was incorrect due to inherent variability in the system under study, or
instead due to the use of an inappropriate model. Furthermore, it is demonstrated how the predictive distribution
can be used when reliability predictions are employed within a formal decision-theoretic framework.
Use of the model is illustrated with the example of a high-availability computer system with multiple recovery
procedures. A Bayesian network is produced to display the relations between parameters of the model in this case
and to generate a predictive probability distribution of the system’s availability. This predictive distribution is then
used to make two decisions under uncertainty concerning offered warranty policies on the system: a qualitative
decision, and an optimisation over a continuous decision space.
Description: PUBLISHED2009-01-01T00:00:00ZThe magnitude of global marine species diversityWILSON, SIMON PAULhttp://hdl.handle.net/2262/673622013-11-25T15:54:30Z2012-01-01T00:00:00ZTitle: The magnitude of global marine species diversity
Author: WILSON, SIMON PAUL
Abstract: Background
The question of how many marine species exist is important because it provides a metric for how much we do and do not know about life in the oceans. We have compiled the first register of the marine species of the world and used this baseline to estimate how many more species, partitioned among all major eukaryotic groups, may be discovered.
Results
There are ∼226,000 eukaryotic marine species described. More species were described in the past decade (∼20,000) than in any previous one. The number of authors describing new species has been increasing at a faster rate than the number of new species described in the past six decades. We report that there are ∼170,000 synonyms, that 58,000–72,000 species are collected but not yet described, and that 482,000–741,000 more species have yet to be sampled. Molecular methods may add tens of thousands of cryptic species. Thus, there may be 0.7–1.0 million marine species. Past rates of description of new species indicate there may be 0.5 ± 0.2 million marine species. On average 37% (median 31%) of species in over 100 recent field studies around the world might be new to science.
Conclusions
Currently, between one-third and two-thirds of marine species may be undescribed, and previous estimates of there being well over one million marine species appear highly unlikely. More species than ever before are being described annually by an increasing number of authors. If the current trend continues, most species will be discovered this century.
Description: PUBLISHED2012-01-01T00:00:00ZBayesian kernel projections for classification of high dimensional dataWILSON, SIMON PAULhttp://hdl.handle.net/2262/673612013-11-25T15:45:01Z2011-01-01T00:00:00ZTitle: Bayesian kernel projections for classification of high dimensional data
Author: WILSON, SIMON PAUL
Abstract: A Bayesian multi-category kernel classification method is proposed. The algorithm performs the classification of the projections of the data to the principal axes of the feature space. The advantage of this approach is that the regression coefficients are identifiable and sparse, leading to large computational savings and improved classification performance. The degree of sparsity is regulated in a novel framework based on Bayesian decision theory. The Gibbs sampler is implemented to find the posterior distributions of the parameters, thus probability distributions of prediction can be obtained for new data points, which gives a more complete picture of classification. The algorithm is aimed at high dimensional data sets where the dimension of measurements exceeds the number of observations. The applications considered in this paper are microarray, image processing and near-infrared spectroscopy data.
Description: PUBLISHED2011-01-01T00:00:00ZPredicting total global species richness using rates of species description and estimates of taxonomic effortWILSON, SIMON PAULHOULDING, BRETThttp://hdl.handle.net/2262/673572013-09-04T09:39:19Z2012-01-01T00:00:00ZTitle: Predicting total global species richness using rates of species description and estimates of taxonomic effort
Author: WILSON, SIMON PAUL; HOULDING, BRETT
Abstract: We found that trends in the rate of description of 580,000 marine and terrestrial species, in the taxonomically
authoritative World Register of Marine Species and Catalogue of Life databases, were similar until the 1950s. Since then,
the relative number of marine to terrestrial species described per year has increased, reflecting the less explored nature
of the oceans. From the mid-19th century, the cumulative number of species described has been linear, with the highest
number of species described in the decade of 1900, and fewer species described and fewer authors active during the World
Wars. There were more authors describing species since the 1960s, indicating greater taxonomic effort. There were fewer
species described per author since the 1920s, suggesting it has become more difficult to discover new species. There was no
evidence of any change in individual effort by taxonomists. Using a nonhomogeneous renewal process model we predicted
that 24–31% to 21–29% more marine and terrestrial species remain to be discovered, respectively. We discuss why we
consider that marine species comprise only 16% of all species on Earth although the oceans contain a greater phylogenetic
diversity than occurs on land. We predict that there may be 1.8–2.0 million species on Earth, of which about 0.3 million are
marine, significantly less than some previous estimates.
Description: PUBLISHED2012-01-01T00:00:00ZConsiderations on the UK re-arrest hazard rate analysisWILSON, SIMON PAULHOULDING, BRETThttp://hdl.handle.net/2262/673552013-11-25T15:43:32Z2011-01-01T00:00:00ZTitle: Considerations on the UK re-arrest hazard rate analysis
Author: WILSON, SIMON PAUL; HOULDING, BRETT
Abstract: The
offence risk posed by individuals who are arrested, but where subsequently no charge or cau-
tion is administered, has been used as an argument for justifying the retention of such individuals’
DNA and identification profiles. Here we consider the UK Home Office arrest-to-arrest data analy-
sis, and find it to have limited use in indicating risk of future offence. In doing so, we consider the
appropriateness of the statistical methodology employed and the implicit assumptions necessary for
making such inference concerning the rearrest risk of a further individual. Additionally, we offer an
alternative model that would provide an equally accurate fit to the data, but which would appear to
have sounder theoretical justification and suggest alternative policy direction. Finally, we consider
the implications of using such statistical inference in formulating national policy, and highlight a
number of sociological factors that could be taken into account so as to enhance the validity of any
future analysis
Description: PUBLISHED2011-01-01T00:00:00ZDependent Gaussian mixture models for source separationWILSON, SIMON PAULhttp://hdl.handle.net/2262/673542013-11-25T15:55:18Z2012-01-01T00:00:00ZTitle: Dependent Gaussian mixture models for source separation
Author: WILSON, SIMON PAUL
Abstract: Source separation is a common task in signal processing and is often analogous to factor analysis. In this study, we look at a factor analysis model for source separation of multi-spectral image data where prior information about the sources and their dependencies is quantified as a multivariate Gaussian mixture model with an unknown number of factors. Variational Bayes techniques for model parameter estimation are used. The development of this methodology is motivated by the need to bring an efficient solution to the separation of components in the microwave radiation maps that are being obtained by the satellite mission Planck which has the objective of uncovering cosmic microwave background radiation. The proposed algorithm successfully incorporates a rich variety of prior information available to us in this problem in contrast to many previous solutions that assume completely blind separation of the sources. Results on realistic simulations of Planck maps and on Wilkinson microwave anisotropy probe fifth year images are shown. The technique suggested is easily applicable to other source separation applications by modifying some of the priors.
Description: PUBLISHED2012-01-01T00:00:00Z