Bayesian and machine learning models for dependent data with applications to official statistics and survey methodology
No Thumbnail Available
Authors
Meeting name
Sponsors
Date
Journal Title
Format
Thesis
Subject
Abstract
Small Area estimation has garnered much interest in recent times by both private entities as well government agencies as means of public policy guidance, formulating programs for regional and national planning, allocating government funds, and advocating investments. Small-area models are comprised of area-level models that relate direct estimators to area-specific covariates, and the alternative unit-level models that directly model survey responses. Both types of models have their advantages and their unique sets of challenges. In this dissertation, we address some of these challenges. Modern complex surveys collect and report data on a variety of topics, and often the response types are not continuous numeric responses but can be categorical, count-valued, or completely non-numeric like text and functional responses. Here we concentrate on multiple types of numeric responses. First, we extend the measurement error modeling paradigm to areal non-Gaussian data, that can be distributed from one or multiple classes of distributions (e.g. Gaussian, Poisson, Binomial etc.), when the covariates are measured with error and spatially correlated. Survey responses are prone to measurement error which can result from a host of different sources and failing to address this error can lead to a biased and erroneous inference. Second, traditional areal-level models tend to model spatial dependence structures based on the nearest-neighbor approach. These models assume that geographically connected areal units exhibit strong correlations and that the strength of this connection grows weaker as we move farther away. We propose a new neighborhood network where the probability of connection between two areal units is determined by not just their geographical proximity, but also their socio-demographic similarity. We embed this in a traditional spatial model, e.g., Conditional Autoregressive (CAR) model. Third, we address the issue of computational efficiency while working with data from complex surveys which can include millions of individuals. We introduce Bayesian data sketching algorithms to compress high-dimensional survey data at both the area level and the unit level onto a lower-dimension subspace using random projection matrices. This framework relies on the Bayesian pseudo-likelihood to accommodate the survey design, as well as the Bayesian hierarchical framework to model various dependence structures. We motivate the applications of our proposed frameworks with applications to public use data from American Community Survey (ACS).
Table of Contents
DOI
PubMed ID
Degree
Ph. D.
