Deep learning methods for protein prediction problem
Metadata[+] Show full item record
[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT AUTHOR'S REQUEST.] Computational protein structure prediction is very important for many applications in bioinformatics. Many prediction methods have been developed, including Modeller, HHpred, I-TASSER, Robetta, and MUFOLD. In the process of predicting protein structures, it is essential to accurately assess the quality of generated models. Consensus quality assessment (QA) methods, such as Pcons-net and MULTICOM-refine, which are based on structure similarity, performed well on QA tasks. The drawback of consensus QA methods is that they require a pool of diverse models to work well, which is not always available. More importantly, they cannot evaluate the quality of a single protein model, which is a very common task in protein predictions and other applications. Although many single-model quality assessment methods, such as ProQ2, MQAPmulti, OPUS-CA, DOPE, DFIRE, and RW, etc. have been developed to address that problem, their accuracy is not good enough for most real applications. In this dissertation, based on the idea of using C-[alpha] atoms distance matrix and deep learning methods, two methods have been proposed for assessing quality of protein structures. First, a novel algorithm based on deep learning techniques, called DL-Pro, is proposed. From training examples of distance matrices corresponding to good and bad models, DL-Pro learns a stacked autoencoder network as a classifier. In experiments on selected targets from the Critical Assessment of Structure Prediction (CASP) competition, DL-Pro obtained promising results, outperforming state-of-the-art energy/scoring functions, including OPUS-CA, DOPE, DFIRE, and RW. Second, a new method DeepCon-QA is developed to predict quality of single protein model. Based on the idea of using protein vector representation and distance matrix, DeepCon-QA was able to achieve comparable performance with the best state-of-the-art QA method in our experiments. It also takes advantage the strength of deep convolutional neural networks to “learn” and “understand” the input data to be able to predict output data precisely. On the other hand, this dissertation also proposes several new methods for solving loop modeling problem. Five new loop modeling methods based on machine learning techniques, called NearLooper, ConLooper, ResLooper, HyLooper1 and HyLooper2 are proposed. NearLooper is based on the nearest neighbor technique; ConLooper applies deep convolutional neural networks to predict Cα atoms distance matrix as an orientation-independent representation of protein structure; ResLooper uses residual neural networks instead of deep convolutional neural networks; HyLooper1 combines the results of NearLooper and ConLooper while HyLooper2 combines NearLooper and ResLooper. Three commonly used benchmarks for loop modeling are used to compare the performance between these methods and existing state-of-the-art methods. The experiment results show promising performance in which our best method improves existing state-of-the-art methods by 28% and 54% of average RMSD on two datasets while being comparable on the other one.
Access to files is limited to the University of Missouri--Columbia.