Statistics electronic theses and dissertations (MU)

Permanent URI for this collection

The items in this collection are the theses and dissertations written by students of the Department of Statistics. Some items may be viewed only by members of the University of Missouri System and/or University of Missouri-Columbia. Click on one of the browse buttons above for a complete listing of the works.

Browse

Recent Submissions

Now showing 1 - 5 of 115
  • Item
    A Bayesian approach to discovery of latent dependency in point-referenced data
    (University of Missouri--Columbia, 2024) Wang, Shuwan; Wikle, Christopher K.; Micheas, Athanasios C.
    In spatial statistics where data usually was observed as point-referenced, classical and parametric spatial models were assumed and used to describe real-world phenomena. This is generally due to the large number of spatial locations that the spatial models should cover. However, such classical methods usually rely strongly on unrealistic assumptions that real-world data do not follow. Due to this limitation, this dissertation focuses on developing statistics models that are more flexible and lead to a better understanding and explanation for the latent dependency in the real-world point-referenced data. This dissertation begins with a statistical model accounting for collective animal movement. The model highlights how to incorporate the ecological perspective into the hierarchical modeling structure, motivating the need of considering the underlying ecological structure to better understand the driving dynamics in the animal behaviors. Then, a statistical model is proposed to explain a real-world phenomenon, lightning strikes. Starting with an exploratory analysis, we found that the lightning strikes data do not follow classical assumptions in spatial models. Thus, a data-driven statistics approach is proposed, where the latent spatio-temporal dependency considered in the Log-Gaussian Cox process (LGCP) is not only non-stationary but also time-varying. The proposed method relaxes the standard assumption in spatial models (stationarity and isotropy) and thus is able to better account for the latent spatio-temporal dependency for the real-world data. Last, we propose using a novel neural-network method to overcome the computational burden in LGCP computations for posterior inference. The proposed neural-networkbased method provides faster and accurate parameter estimation as well as reliable uncertainty quantification for inference.
  • Item
    Dynamic spatio-temporal models integrating physics for extreme environmental processes
    (University of Missouri--Columbia, 2024) Yoo, Myungsoo; Wikle, Christopher K.
    Spatio-temporal processes are ubiquitous and prevalent across disciplines. Understanding the mechanisms underlying processes and integrating this information into models is of great interest, as it can improve forecasting accuracy and align with scientific motivation. Examples of such models include Partial Differential Equation (PDE) Models or Physics-Informed Neural Network Models in the applied mathematics or deep learning community, respectively. However, these models often overlook uncertainty quantification despite its crucial role, considering that real-world processes necessarily involve inherent errors that physical laws cannot fully explain. Dynamic Spatio-Temporal Models (DSTMs) offer a flexible and effective approach by embedding physics laws within the Bayesian Hierarchical Model (BHM) framework and accounting for dependencies in space and time conditionally. This dissertation explores integrating physics laws while accounting for uncertainty within BHMs and neural network models for complex environmental processes. To start, a novel approach utilizing a level-set method and low-rank representation within a BHM is developed to model the evolution of wildfire boundaries in the presence of uncertainty in data and a lack of knowledge about the boundaries. Subsequently, a hybrid model that nests an echo state network within a level-set method to accommodate nonlinearity is developed. This model is computationally efficient and includes calibrated uncertainty quantification. Lastly, a new class of DSTMs, capable of accommodating both high and low extremes through a regime-switching scheme of stable distributions with varying tail indices, is presented. This last method is illustrated on fine particulate matter (PM2.5) observations emanating from wildfires in the prairie region of the US.
  • Item
    Bayesian and machine learning models for dependent data with applications to official statistics and survey methodology
    (University of Missouri--Columbia, 2023) Nandy, Saikat; Holan, Scott H.
    Small Area estimation has garnered much interest in recent times by both private entities as well government agencies as means of public policy guidance, formulating programs for regional and national planning, allocating government funds, and advocating investments. Small-area models are comprised of area-level models that relate direct estimators to area-specific covariates, and the alternative unit-level models that directly model survey responses. Both types of models have their advantages and their unique sets of challenges. In this dissertation, we address some of these challenges. Modern complex surveys collect and report data on a variety of topics, and often the response types are not continuous numeric responses but can be categorical, count-valued, or completely non-numeric like text and functional responses. Here we concentrate on multiple types of numeric responses. First, we extend the measurement error modeling paradigm to areal non-Gaussian data, that can be distributed from one or multiple classes of distributions (e.g. Gaussian, Poisson, Binomial etc.), when the covariates are measured with error and spatially correlated. Survey responses are prone to measurement error which can result from a host of different sources and failing to address this error can lead to a biased and erroneous inference. Second, traditional areal-level models tend to model spatial dependence structures based on the nearest-neighbor approach. These models assume that geographically connected areal units exhibit strong correlations and that the strength of this connection grows weaker as we move farther away. We propose a new neighborhood network where the probability of connection between two areal units is determined by not just their geographical proximity, but also their socio-demographic similarity. We embed this in a traditional spatial model, e.g., Conditional Autoregressive (CAR) model. Third, we address the issue of computational efficiency while working with data from complex surveys which can include millions of individuals. We introduce Bayesian data sketching algorithms to compress high-dimensional survey data at both the area level and the unit level onto a lower-dimension subspace using random projection matrices. This framework relies on the Bayesian pseudo-likelihood to accommodate the survey design, as well as the Bayesian hierarchical framework to model various dependence structures. We motivate the applications of our proposed frameworks with applications to public use data from American Community Survey (ACS).
  • Item
    Modeling chronic wasting disease using Gaussian Process Boosting
    (University of Missouri--Columbia, 2023) Emanuel, Joseph Alexander; Chakraborty, Sounak
    Chronic Wasting Disease (CWD) is a fatal neurological condition that affects cervids (white tail deer, elk, mule deer, etc.). Veterinary epidemiologists at the state and federal level are interested in methods to accurately predict the presence of CWD in free-range cervids, and to provide inferences about how location and other environmental factors could affect the spread of CWD. The data for this project was provided by Dr. Ram Raghavan at the University of Missouri School of Veterinary Medicine and was originally collected by researchers in Kansas. Each observation notes the presence of CWD in the cervid carcass along with the coordinates and certain soil measurements from the locations where hunters harvested them. Understanding the spread of CWD is key, since government officials, scholars, and farmers who are in the business of captive breeding are interested in developing methods to contain it geographically, eradicate it, and ultimately preserve the health of existing herds. To this purpose, machine learning models, such as gradient boosting, offer flexibility and predictive accuracy, but lack interpretability. in order to make inferences regarding the features, we use SHapley Additive exPlanations (SHAP), which quantifies the influence of each feature on predictions. Spatial dependence between locations is not accounted for with gradient boosting, but can be modeled with Gaussian Processes. In this project, we use a recently developed spatial modeling method known as Gaussian Process Boosting, which preserves the flexibility and accuracy of gradient boosting while capturing spatial random effects with Gaussian Processes. Results show an 87.5 percent prediction accuracy on a binary response, and that the prediction contributions from the spatial random effects helped accuracy. SHAP values allowed for useful inferences to be made regarding the features.
  • Item
    Modeling spatio-temporal data using a Bayesian probabilistic cellular automata framework
    (University of Missouri--Columbia, 2023) Grieshop, Nicholas James; Wikle, Christopher K.
    Regularly gridded, or cellular, discrete-valued spatio-temporal data are common in many application areas. Such data can be considered from many perspectives, including deterministic or stochastic cellular automata, where local rules govern the transition probabilities that describe the evolution of the state of the cells across space and time. One implementation of a stochastic cellular automata for such data is with a spatio-temporal generalized linear model (or mixed model), with the local rule covariates being included in the transformed mean response. However, in real-world applications, we seldom have a complete understanding of the local rules and it is helpful to augment the transformed linear predictor with a latent spatio-temporal dynamic process. This dissertation considers new approaches to augment latent processes to improve model predictions. To start, a novel approach utilizing a dynamic neighborhood structure with a latent process linked to the spatial domain via the use of empirical orthogonal functions is developed. An alternative augmentation strategy is developed that considers techniques from machine learning. This approach considers traditional Bayesian modeling techniques in conjunction with an echo state network to further improve model predictions. In addition to the echo state augmentation, symbolic regression is used to learn the functional form of available covariates for improved model accuracy and exploration of high dimensional interactions. A novel model weighting strategy is used in this echo state network augmentation approach, and prediction probability uncertainties are fully captured.
Items in MOspace are protected by copyright, with all rights reserved, unless otherwise indicated.