Two sample inference for high dimensional mean with application to gene expression data
Metadata[+] Show full item record
[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT AUTHOR'S REQUEST.] Motivated by the gene expression analysis from biological science, we consider the two sample problem, where the number of variables is much larger than the sample size. Due to technology advances, high dimensional data have been increasingly encountered in many applications of statistics over the past few decades. Classical inference procedures from multivariate statistical analysis such as the Hotelling's T-squared test, however, cannot be directly applied to such high dimensional data sets. To tackle the challenge arising in high dimension, several testing procedures have been recently proposed in the literature. We briefly review the sum-of-square type tests and maximum type tests as powerful alternatives to the Hotelling's T-squared test in the high dimensional setting. We then provide an extension which aims to combine the strength of the maximum type test with that of the sum-of-square test. Furthermore, we propose a bootstrap caliberation for the maximum type test which allows the data vectors to be temporally dependent. Simulation studies are conducted to compare and contrast the finite sample performance of these tests. We apply these methods to test significance of sets of genes in a real data example.
Access to files is limited to the University of Missouri--Columbia.