[-] Show simple item record

dc.contributor.advisorSun, Jianguo (Tony)eng
dc.contributor.authorXie, Jingeng
dc.date.issued2021eng
dc.date.submitted2021 Springeng
dc.description.abstractThe advent of next-generation sequencing (NGS) technology has facilitated the recent development of RNA sequencing (RNA-seq), which is a novel mapping and quantifying method for transcriptomes. By RNA-seq, one can measure the expression of different features such as gene expression, allelic expression, and intragenic expression in the forms of read counts. These features have provided new opportunities to study and interpret the molecular intricacy and variations that are potentially associated with the occurrence of specific diseases. Therefore, there has been an emerging interest in statistical method to analyze the RNA-seq data from different perspectives. In this dissertation, we focus on three important challenges: identifying allele specific expression (ASE) on the gene level and single nucleotide polymorphism (SNP) level simultaneously, the detection of ASE regions in the control group and regions of ASE alterations in case group simultaneously, and detecting genes whose expression levels are significantly different across treatment groups (DE genes). In Chapter 2, we propose a method to test ASE of a gene as a whole and variation in ASE within a gene across exons separately and simultaneously. A generalized linear mixed model is employed to incorporate variations due to genes, SNPs, and biological replicates. To improve reliability of statistical inferences, we assign priors on each effect in the model so that information is shared across genes in the entire genome. We utilize the Bayes factor to test the hypothesis of ASE for each gene and variations across SNPs within a gene. We compare the proposed method to competing approaches through simulation studies that mimicked the real datasets. The proposed method exhibits improved control of the false discovery rate and improved power over existing methods when SNP variation and biological variation are present. Besides, the proposed method also maintains low computational requirements that allows for whole genome analysis. As an example of real data analysis, we apply the proposed method to four tissue types in a bovine study to de novo detect ASE genes in the bovine genome, and uncover intriguing predictions of regulatory ASEs across gene exons and across tissue types. In Chapter 3, we propose a new and powerful algorithm for detecting ASE regions in a healthy control group and regions of ASE alterations in a disease/case group compared to the control. Specifically, we develop a bivariate Bayesian hidden Markov model (HMM) and an expectation-maximization inferential procedure. The proposed algorithm gains advantages over existing methods by addressing their limitations and by recognizing the complexity of biology. First, the bivariate Bayesian HMM detects ASEs for different mRNA isoforms due to alternative splicing and RNA variants. Second, it models spatial correlations among genomic observations, unlike existing methods that often assume independence. At last, the bivariate HMM draws inferences simultaneously for control and case samples, which maximizes the utilization of available information in data. Real data analysis and simulation studies that mimic real data sets are shown to illustrate the improved performance and practical utility of the proposed method. In Chapter 4, we present a new method to detect DE genes in any sequencing experiment. The number of read counts for different treatment groups are modelled by two Negative Binomial distributions which may have different means but share the same dispersion parameter. We propose a mixture prior model for the dispersion parameters with a point mass at zero and a lognormal distribution. The mixture model allows shrinkage across genes within each of the two mixture components, thus prevents the overcorrection resulting from shrinkage across all genes. The simulation studies demonstrate that the proposed method yields a better dispersion estimation and FDR control, and a higher accuracy in gene ranking. In addition, the proposed method exhibits robustness to the misspecification of the bimodal distribution for the dispersion parameters, thus is exible and can be easily generalized.eng
dc.description.bibrefIncludes bibliographical references.eng
dc.format.extentxiii, 112 pages : illustrations (color)eng
dc.identifier.urihttps://hdl.handle.net/10355/90104
dc.identifier.urihttps://doi.org/10.32469/10355/90104eng
dc.languageEnglisheng
dc.publisherUniversity of Missouri--Columbiaeng
dc.titleStatistical methods to deflect allele specific expression, alterations of allele specific expression and differential expressioneng
dc.typeThesiseng
thesis.degree.disciplineStatistics (MU)eng
thesis.degree.levelDoctoraleng
thesis.degree.namePh. D.eng


Files in this item

[PDF]

This item appears in the following Collection(s)

[-] Show simple item record