Template-based methods for protein model quality assessment
Abstract
[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT REQUEST OF AUTHOR.] Protein structure prediction is an important open problem in the bioinformatics filed. One of the difficulties of solving this problem is to develop an effective approach to evaluate the quality of the models been generated. Various of quality assessment (QA) methods have been developed and tested in CASP (Critical Assessment of Techniques for Protein Structure Prediction) competition. But most of them are either not accurate enough to be useful or not robust enough for different kinds of model sets. In pursuit of a balance between high accuracy and robustness, two QA methods have been developed: MUfoldQA_S and MUfoldQA_C. MUfoldQA_S is a quasi-single model QA method. It assesses the quality of a predicted model based on the structures of proteins with similar sequence. These similar proteins, called templates, are found from the PDB database by using sequence search. We calculate the pairwise GDT-TS between the input model and the templates. Then, for each c-alpha position of the model, a score is calculated as the weighted average of the template GDT-TS values, weighted by a BLOSUM-based heuristic. Finally, the model score is the average of all c-alpha position scores. The other method, MUfoldQA_C, is a 2-stage multi-model QA method combining the idea behind MUfoldQA_S and consensus. Stage 1 evaluates the quality of each c-alpha position of the reference models based on their similarity to the templates. Stage 2 evaluates the quality of the given predicted model based on its similarity to the reference models and the quality of the reference models. Both methods have been tested on CASP 11 dataset. MUfoldQA_S performs significantly better than ProQ2 and MUfoldQA_C also outperforms the naive consensus method.
Degree
M.S.
Thesis Department
Rights
Access to files is limited to the University of Missouri--Columbia.