Data combining using mixtures of g-priors with application on county-level female breast cancer prevalence
As more and more data are available, data synthesis has become an indispensable task for researchers. From a Bayesian perspective, this dissertation includes three related projects and aims at quantifying the benefits of combining data under various scenarios in terms of the theoretical properties including biases, frequentist variances, and mean squared errors. In the first project, data combining of linear models with the classical mixtures of g-priors is investigated. We calculate and compare the posterior estimates and the frequentist properties of the Bayesian estimator from the model with individual and combined data. To resolve the newly identified conditional Lindley paradox and relax constraints on design matrix, data combining with independent mixtures of g-prior is explored, where a different scale is used for each group of coefficients. We not only perform a posterior variance analysis, but also offer a conditional asymptotic analysis of the Bayesian estimators. We also apply the corresponding results in the comparison of models for individual and combined data. Furthermore, to reflect how the use of sample size impact the estimates in a data combining context, we compare the Zellner-Siow prior to its adjustment with the effective sample size. At last, an application on data combining of the 2016 county-level female breast cancer prevalence is presented using data from the Missouri Cancer Registry and Research Center, and the Missouri County-level Study. To provide a broader scope of the data combining framework, we study the linear mixed model and generalized linear mixed model with a conditional autoregressive prior serving as random effects.