Large-scale soybean genome-wide variation workflow and association analysis using deep learning
Abstract
[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT REQUEST OF AUTHOR.] With the advances in next-generation sequencing technology and significant reduction in sequencing costs, it is now possible to sequence large collections of germplasm in crops for detecting genome-scale genetic variations, and apply the knowledge towards improvements in traits. To facilitate large-scale NGS resequencing data analysis of genomic variations efficiently, we developed a systematic solution using high-performance computing environment, cloud data storage resources and graphics processing unit computing with cutting-edge deep learning approach. The solution contains an integrated and optimized variant calling workflow called 'PGen', a quantitative phenotype prediction model using convolutional neural network and an algorithm to study genome-wide association study based on deep convolutional neural network model. We reviewed and compared studies of statistical and deep learning genomic selection and genome-wide association methods, present our work on thousands of lines of soybean sequencing dataset, summarized ongoing progress of large-scale genome-associated studies and discussed the future work and development.
Degree
Ph. D.
Thesis Department
Rights
Access to files is limited to the University of Missouri--Columbia.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License.