Statistics (Theses and Dissertations)
http://hdl.handle.net/2262/201
Statistics (Theses and Dissertations)2024-05-20T00:13:16ZStatistical Methods to Extrapolate Time-To-Event Data
http://hdl.handle.net/2262/107280
Statistical Methods to Extrapolate Time-To-Event Data
Cooney, Philip
This thesis investigates methods used to predict long-term survival of observations (typically survival times) beyond the time at which data follow-up is available. Current practice is to use parametric survival models; however, different models can produce different survival predictions, particularly if the lifetimes of many of the observations are censored.
We focus on applying novel statistical techniques to improve existing methods to predict survival. One existing predictive approach assumes that after a certain timepoint, the hazards are approximately constant, and a constant hazard after this timepoint is used to estimate long-term survival. The choice of this timepoint is arbitrary and subject to considerable uncertainty. To improve on this methodology we estimate a statistical model known as a change-point survival model. This model allows the observed data to inform the timepoint after which the constant hazard is appropriate. Statistical goodness of fit measures can identify if the addition complexity associated with the inclusion of a change-point is warranted. We also estimate other more complex change-point survival models which allow us to model multiple treatments.
Another topic which was investigated is the incorporation of expert opinion with statistical models. In the case of survival predictions, even if the survival is not observed at a timepoint, there are often opinions on the plausible ranges that these
values may take. In this thesis, we investigate how these opinions can be incorporated in a robust manner, allowing for the predicted survival to take account of the precision of the expert's opinion and the sample size of the observed data. We also estimate how to quantify the strength of an expert's opinion to allow for appropriate calibration of their opinions at the elicitation stage.
We found that the change-point model we estimated can robustly detect the timepoint at which a constant hazard is appropriate. In several real-world applications, it provided the closest predictions to the follow-up survival data. The proposed method for incorporation of expert opinion allowed for the straightforward
synthesis of different types of expert opinions with data. We demonstrate by way of a simulation study that including expert opinion can more accurate survival predictions, even when the expert's belief is biased away from the true estimate. By numerically quantifying the strength of expert's beliefs, we more easily identify situations where expert's opinions are overconfident, allowing for re-calibration of their beliefs.
The key methods from the thesis are implemented as open-source software packages to allow the methods to be used in practical applications. The ideas in this thesis can also be extended and improved upon in future research. We believe that the methods
illustrated in this work will improve the ability of decision makers to model hypotheses relating to the prediction of long-term survival outcomes.
APPROVED
2024-01-01T00:00:00ZBayesian Tree Regression within a Streaming Context
http://hdl.handle.net/2262/104157
Bayesian Tree Regression within a Streaming Context
Ferreira, Michael Antonio
Regression in a statistical streaming environment. Explore either large amounts of data or data that is continually being generated in a meaningful way. The streaming setting is challenging because either the proportion of data to be analysed far exceeds the available resources or the rate at which the data is arriving and the timeliness of the inference on that data are at odds with each other.
Bayesian methods for streaming regression analysis have focused on using particle filtering and sequential Markov chains. Bayesian regression trees have been used as particles because they offer a tractable approach to nonlinear regression by providing conditional basis functions that can be both smoothed over and still allow for sudden changes in the data to be modelled. The Kalman filter, arguably the progenitor of SMC methods, epitomises the Bayesian methodology for analysis by using data to confirm beliefs which then become the prior beliefs for new data.
MCMC methods have been largely ignored in the statistical streaming setting because ergodic averaging over Markov chains requires that stationarity of the chains of sample measurements be established. Introducing new data invalidates the claim of stationarity requiring that new chains of measures be sampled to re-establish stationarity. What has not been shown is whether MCMC can be used in the streaming setting if one is willing to accept that, at least temporarily, the theoretical requirements for certainty of stationarity be set aside in favour of reaching a target distribution that is, for all intents and purposes, either the same as or very close to the ``true'' target distribution.
This document sets out to show that, using Bayesian regression trees to provide a collection of conditional filters, MCMC can be used in the streaming setting for nonlinear, nonstationary regression.
A tree filter based on the Kalman filter is developed. This initial stepping stone shows that the explanatory variables are only necessary to indicate a refinement of a partition created by the tree within which a filter provides an estimate and prediction for the level of the signal in that refinement. Thus there is no need to store neither explanatory variables nor observations because, by the Markov assumption, all histories of the processes are retained in the previous state of the latent process at the refinements and in the tree model. This fixed tree filter is then developed into an on-the-fly adaptive learning model that searches the space of tree models for possible models as new data is provided. It is shown that by using Markov chain Monte Carlo it is possible to get sufficiently close to the target distribution having only seen each new data point once. A single tree represents only a single chain limiting the search of the model space so an ensemble of chains of tree measures is provided so that a more comprehensive search of the distribution of trees can be carried out. An approximation to the probability distribution of the trees is provided by this ensemble. A mixture of tree models over this distribution allows for tree model weighted predictions for the observations and estimates of the state along with their uncertainty estimates to be made on-the-fly.
Showing that MCMC can be used in the streaming setting opens up a whole gamut of MCMC methods for Bayesian statistical analysis that will broaden the scope of problems that could be tackled over large and streaming data sets. This method can be adapted to existing Bayesian tree regression methods and extended to cover variable selection. The independent nature of the trees and the fact that the algorithm has constant complexity with respect to the stream of data means that the size of the ensemble is only limited by available resources and is amenable to both parallel and concurrent computation. Almost any size problem can be explored using this method and, because the Kalman filter can handle vectors with ease, the dimension of the response is of concern only with respect to local (to the leaf filter) matrix manipulation. The model provides a method for autoregressive, on-the-fly Gaussian process regression but is also extendable to multi-output Gaussian process regression
APPROVED
2023-01-01T00:00:00ZDistributed Lag Regression Methods and Compartmental Models for Analysis of Disease Progression
http://hdl.handle.net/2262/102514
Distributed Lag Regression Methods and Compartmental Models for Analysis of Disease Progression
Dempsey, Daniel
ANCA vasculitis is an autoimmune disease characterised by relapses, or flares, that can have a severe detrimental impact on patient health. Flares can be prevented by suppressing the immune system but this exposes the patient to infection. It is hard to prepare patients for flares since clinicians are still unclear on how to predict flare events. Some attention has been given on uncovering any environmental predictors but so far results have been inconclusive. Investigating this for ourselves is the main focus of this thesis.
We construct a distributed lag / MIDAS model to analyse the accumulation of environmental exposure over time in a parsimonious manner, and how that may impact the probability of a flare occurring. Our model employs Bayesian variable selection and adjustment for imbalanced response data using latent variable representation and reversible-jump MCMC. The construction of this model is the primary novel contribution of this thesis.
The method is validated via simulation study, and then applied to real data comprising of clinical information for flare events and satellite data that tracks weather and pollution indices for the region of residence of each patient. Despite our focus on vasculitis, we believe this model is applicable to many similar research problems.
We also look at a compartmental model to estimate the effect of lockdowns of combating the COVID--19 pandemic in Dublin, Ireland. The compartments are split into age groups and the flow between/within each compartment is adjusted to account for non-homogeneous age mixing between/within age groups. Uncertainty estimates are constructed using parametric bootstraps. With these, we can create projections of compartmental growth under different lockdown measures; a proof-of-concept app is discussed to demonstrate this.
APPROVED
2023-01-01T00:00:00ZIncorporating Ignorance within Game Theory: An Imprecise Probability Approach
http://hdl.handle.net/2262/101972
Incorporating Ignorance within Game Theory: An Imprecise Probability Approach
Fares, Bernard
Ignorance within non-cooperative games, reflected as a player's uncertain preferences towards a game's outcome, is examined from a probabilistic point of view. This topic has had scarce treatment in the literature, which emphasises exogenous uncertainties caused by other players or nature and not by players themselves. That is primarily because a player's endogenous uncertainty over an outcome poses significant challenges and complex sequences of reciprocal expectations. Therefore, it is often ignored, and preferences are either assumed from a continuous domain or set using introspection.
Decisions under ignorance could be optimised by permitting a player to compute rational strategies with respect to elicited lower and upper expectations of an uncertain outcome, allowing them to update these strategies when new observations are available, and helping them assess the impact and value of acquired information. Therefore, this dissertation aims to develop a complete framework for decision optimisation within strategic settings that include uncertainty. We explore a solution concept based on recent research in imprecise probabilities and de Finetti's approach to defining subjective probabilities, which utilises bets to assess beliefs.
An in-depth literature review of game theory and imprecise probabilities is provided, focusing on existing normative theories and their plausible generalisations. The motivation behind a solution permitting ignorance is presented, and foundational issues related to existing approaches are argued. Afterwards, we introduce a framework that allows a risk-neutral player with constant marginal utilities for money to incorporate and dynamically learn about uncertain outcomes. This framework is then generalised to cover risk-averse players whose marginal utilities across outcomes are state-dependent.
The resulting framework is proposed as a possible solution to the problem of utility induction and decision-making in game-theoretic settings that include uncertainty. It is analysed and demonstrated through motivating examples modified to include uncertainty. Each example's correlated equilibria's convex polytope is computed and compared to its uncertainty-free equivalent. Exceptional cases such as extreme ignorance are also examined and assessed through a Monte Carlo simulation where we demonstrate that, in repeated games, vacuous lower and upper previsions converge to one linear value that reflects the true expected preference over the uncertain outcome.
Moreover, inadequate value of information under uncertainty is considered, and a model to assess the impact of information patterns on strategic interactions is proposed. This model enables a player to compute their expected and actual values of a piece of information with respect to a Pareto-efficient strategy. We showcase it within a game that includes uncertainty by applying utility diagnostics to two types of players, pessimistic and optimistic.
Finally, since the foundations of the normative game theory introduced by Von Neumann and Morgenstern assume that all outcomes are known, the consistency of its axiomatic rules under ignorance is reviewed. We show that uncertainty can alter relevant games' zero-sum and symmetry properties and propose an approach to force these properties.
APPROVED
2023-01-01T00:00:00Z