Bayesian and machine learning models for dependent data with applications to official statistics and survey methodology

Nandy, Saikat

Bayesian and machine learning models for dependent data with applications to official statistics and survey methodology

Files

NandySaikatResearch.pdf (6.63 MB)

Authors

Nandy, Saikat

Date

2023

Format

Thesis

Abstract

Small Area estimation has garnered much interest in recent times by both private entities as well government agencies as means of public policy guidance, formulating programs for regional and national planning, allocating government funds, and advocating investments. Small-area models are comprised of area-level models that relate direct estimators to area-specific covariates, and the alternative unit-level models that directly model survey responses. Both types of models have their advantages and their unique sets of challenges. In this dissertation, we address some of these challenges. Modern complex surveys collect and report data on a variety of topics, and often the response types are not continuous numeric responses but can be categorical, count-valued, or completely non-numeric like text and functional responses. Here we concentrate on multiple types of numeric responses. First, we extend the measurement error modeling paradigm to areal non-Gaussian data, that can be distributed from one or multiple classes of distributions (e.g. Gaussian, Poisson, Binomial etc.), when the covariates are measured with error and spatially correlated. Survey responses are prone to measurement error which can result from a host of different sources and failing to address this error can lead to a biased and erroneous inference. Second, traditional areal-level models tend to model spatial dependence structures based on the nearest-neighbor approach. These models assume that geographically connected areal units exhibit strong correlations and that the strength of this connection grows weaker as we move farther away. We propose a new neighborhood network where the probability of connection between two areal units is determined by not just their geographical proximity, but also their socio-demographic similarity. We embed this in a traditional spatial model, e.g., Conditional Autoregressive (CAR) model. Third, we address the issue of computational efficiency while working with data from complex surveys which can include millions of individuals. We introduce Bayesian data sketching algorithms to compress high-dimensional survey data at both the area level and the unit level onto a lower-dimension subspace using random projection matrices. This framework relies on the Bayesian pseudo-likelihood to accommodate the survey design, as well as the Bayesian hierarchical framework to model various dependence structures. We motivate the applications of our proposed frameworks with applications to public use data from American Community Survey (ACS).

URI

https://hdl.handle.net/10355/97065
https://doi.org/10.32469/10355/97065

Degree

Ph. D.

Thesis Department

Statistics (MU)

Collections

2023 MU Dissertations - Freely available online
Statistics electronic theses and dissertations (MU)

Full item page

Bayesian and machine learning models for dependent data with applications to official statistics and survey methodology

Files

Authors

Meeting name

Sponsors

Date

Journal Title

Format

Subject

Research Projects

Organizational Units

Journal Issue

Abstract

Table of Contents

URI

DOI

PubMed ID

Degree

Thesis Department

Rights

License

Collections