Statistics electronic theses and dissertations (MU)

Permanent URI for this collection

https://hdl.handle.net/10355/5244

The items in this collection are the theses and dissertations written by students of the Department of Statistics. Some items may be viewed only by members of the University of Missouri System and/or University of Missouri-Columbia. Click on one of the browse buttons above for a complete listing of the works.

Browse

Now showing 1 - 5 of 118

Bayesian hierarchical unit-level models for longitudinal survey data under informative sampling
(University of Missouri--Columbia, 2025) Vedensky, Daniel; Holan, Scott H.
[EMBARGOED UNTIL 08/01/2026] This dissertation makes a number of contributions to the unit-level modeling literature. First, the connection between preferential sampling in ecological statistics and informative sampling, which arises in complex surveys for official statistics, is described. Solutions to the problems are reviewed and compared and suggestions for bridging the gap between the two disciplines are made. Second, longitudinal models for Gaussian, binary, and count data in the context of complex surveys are developed and applied to the Household Pulse Survey. Third, a cross-sectional ordinal model that addresses informative sampling is introduced, as well as longitudinal ordinal and nominal models. In addition, efficient variational Bayesian algorithms are presented. Lastly, an extension to wealth data is considered alongside an application to the Survey of Income and Program Participation.
Regression analysis of semi-competing risk data and precision medicine
(University of Missouri--Columbia, 2025) Zheng, Tiange; Sun, Jianguo
[EMBARGOED UNTIL 05/01/2026] Right-censored failure time data commonly occur in various fields, including economics, medical studies and public health, and a great deal of literature on their analyses has been established. However, there still exist some problems related to their analyses that have not been investigated. In this dissertation, we will discuss three such topics and provide some statistical methods. The first part of this dissertation focuses on semi-competing risk data problem, which is often encountered in clinical studies when there are related endpoints. The data structure consists of a terminal event and a non-terminal event, where the terminal event may censor the non-terminal event, but not vice versa. Such relationship between the two events brings difficulty to estimation procedure and has been well studied in the past twenty years. However, most of the existing methods, either the copula models or the illness-death models, require the specification of the underlying correlation between the non-terminal event and terminal event. In Chapter 2, we propose an alternative conditional approach, which is more attractive and natural if the non-terminal event is of main interest. In the proposed method, a class of flexible additive and multiplicative models and the additive hazards model are employed to model the non-terminal and terminal events, respectively. For inference, an estimating equation-based procedure is developed and the asymptotic properties of the resulting estimators are established. In addition, a model checking procedure is provided. The numerical results indicate that the proposed methodology works well in practical situations and it is applied to a real set of data that motivated this study. The second and third part of this dissertation focus on precision medicine. There are two types of heterogeneity considered in this dissertation: the same treatment can have different effect for different patients in a clinical study, or the effect can be heterogeneous on the same patient across different quantiles of the survival time. The first type heterogeneity is referred to as the subgroup analysis, and the second type is referred to as the quantile regression. There has been a great deal of literature for subgroup analysis methods for censored data and quantile regression for censored data, but there has been no method considering the case when both of the heterogeneity exists. In Chapter 3, to address such double heterogeneity, we propose a pairwise fusion penalty approach that can identify the subgroup structure and estimate the covariate effects simultaneously. It is in the similar spirit of regularized variable selection, but the penalized term is the pairwise difference between the coefficients of subjects. For the implementation of the proposed method, an alternating direction method of multipliers algorithm is developed and the asymptotic properties of the resulting estimators are established. To assess its empirical performance of the proposed methodology, a simulation study is performed and indicates that it works well in practical situation. Finally, it is applied to the well-known Stanford heart transplant data and suggests the possible existence of a threshold with respect to the diagnostic effect of the T5 mismatch score. In Chapter 4, we focus on censored quantile regression (CQR) and propose a prediction method for CQR with high-dimensional covariates. Instead of variable selection, we adopt the model averaging framework since we are more interested in prediction. Unlike with variable selection method, model averaging method do not select the best model, but assign different weights to a group of candidate models, so that the prediction accuracy is increased, especially when the noise level is high and impacts the model selection. We use the jackknife criterion to search the optimal weights for each submodels. To evaluate the prediction performance of the proposed method, we conduct a simulation study and apply it to a real data example, and compare the prediction error with other variable selection methods established for high-dimensional CQR.
Advanced statistical methods for failure time data : variable selection, subgroup analysis, and conformal inference
(University of Missouri--Columbia, 2025) Wu, Yuxiang; Sun, Jianguo
[EMBARGOED UNTIL 05/01/2026] Failure time data, often subject to censoring, arise frequently in biomedical and clinical research. This dissertation develops advanced statistical methodologies to address key challenges in analyzing such data, particularly in the presence of interval censoring, high-dimensional covariates, and heterogeneity across subjects. The work is organized around three main contributions. First, we propose a group variable selection procedure for the Cox model with interval-censored data, using a penalized sieve maximum likelihood approach and establishing its oracle properties. Second, to account for heterogeneity in treatment effects and patient characteristics, we develop a method for simultaneous subgroup identification and variable selection in high-dimensional survival settings via penalized fusion and model averaging. Third, we introduce a novel nonparametric conformal inference framework for comparing two conditional survival distributions, accommodating both regular and high-dimensional covariates under right censoring. Each method is supported by theoretical justification, extensive simulation studies, and real data applications, including analyses of Alzheimer's disease and cancer genomics datasets. The proposed techniques advance the toolkit for survival analysis, with significant implications for precision medicine and high-dimensional data modeling.
A Bayesian approach to discovery of latent dependency in point-referenced data
(University of Missouri--Columbia, 2024) Wang, Shuwan; Wikle, Christopher K.; Micheas, Athanasios C.
In spatial statistics where data usually was observed as point-referenced, classical and parametric spatial models were assumed and used to describe real-world phenomena. This is generally due to the large number of spatial locations that the spatial models should cover. However, such classical methods usually rely strongly on unrealistic assumptions that real-world data do not follow. Due to this limitation, this dissertation focuses on developing statistics models that are more flexible and lead to a better understanding and explanation for the latent dependency in the real-world point-referenced data. This dissertation begins with a statistical model accounting for collective animal movement. The model highlights how to incorporate the ecological perspective into the hierarchical modeling structure, motivating the need of considering the underlying ecological structure to better understand the driving dynamics in the animal behaviors. Then, a statistical model is proposed to explain a real-world phenomenon, lightning strikes. Starting with an exploratory analysis, we found that the lightning strikes data do not follow classical assumptions in spatial models. Thus, a data-driven statistics approach is proposed, where the latent spatio-temporal dependency considered in the Log-Gaussian Cox process (LGCP) is not only non-stationary but also time-varying. The proposed method relaxes the standard assumption in spatial models (stationarity and isotropy) and thus is able to better account for the latent spatio-temporal dependency for the real-world data. Last, we propose using a novel neural-network method to overcome the computational burden in LGCP computations for posterior inference. The proposed neural-networkbased method provides faster and accurate parameter estimation as well as reliable uncertainty quantification for inference.
Dynamic spatio-temporal models integrating physics for extreme environmental processes
(University of Missouri--Columbia, 2024) Yoo, Myungsoo; Wikle, Christopher K.
Spatio-temporal processes are ubiquitous and prevalent across disciplines. Understanding the mechanisms underlying processes and integrating this information into models is of great interest, as it can improve forecasting accuracy and align with scientific motivation. Examples of such models include Partial Differential Equation (PDE) Models or Physics-Informed Neural Network Models in the applied mathematics or deep learning community, respectively. However, these models often overlook uncertainty quantification despite its crucial role, considering that real-world processes necessarily involve inherent errors that physical laws cannot fully explain. Dynamic Spatio-Temporal Models (DSTMs) offer a flexible and effective approach by embedding physics laws within the Bayesian Hierarchical Model (BHM) framework and accounting for dependencies in space and time conditionally. This dissertation explores integrating physics laws while accounting for uncertainty within BHMs and neural network models for complex environmental processes. To start, a novel approach utilizing a level-set method and low-rank representation within a BHM is developed to model the evolution of wildfire boundaries in the presence of uncertainty in data and a lack of knowledge about the boundaries. Subsequently, a hybrid model that nests an echo state network within a level-set method to accommodate nonlinearity is developed. This model is computationally efficient and includes calibrated uncertainty quantification. Lastly, a new class of DSTMs, capable of accommodating both high and low extremes through a regime-switching scheme of stable distributions with varying tail indices, is presented. This last method is illustrated on fine particulate matter (PM2.5) observations emanating from wildfires in the prairie region of the US.

Browse

Recent Submissions