New consensus-based algorithms for quality assessment in protein structure prediction
Metadata[+] Show full item record
Two of the essential tasks in protein tertiary structure prediction are predicting quality and selecting the best quality model from given model structures. Finding solutions to these problems are fundamental to understanding the nature of proteins and advancing in protein research area. In this thesis, we present efficient algorithms that tackle both problems effectively. The algorithms are developed on the well-known consensus-based idea that has been continuously successful since CASP6. For assessing the quality of structures, we develop several new methods based on the idea of removing redundant structures and outliers. The algorithms aims at finding suitable reference sets in computing the consensus-score in order to improve the existing algorithms. The methods can use any suitable pair-wise similarity measurement between a pair of models such as GDT-TS and Q score. We also develop a very efficient method for computing Q score for large size problem. In our experimental results, the algorithms are applied to CASP8 dataset and have achieved the superior performance over existing state-of-the-art methods including the top1 method in the QA category of CASP8. For the selecting the best model structure, our new methods are effective and perform better than other best-performing scoring functions by up to 7.6% based on the actual GDT-TS of top1 selected model to the native structure. The selection result is obtained by our method using Q score are slightly worse than those obtained using GDT-TS, but using pair-wise Q score method is in general about 15 times faster than using pair-wise GDT-TS method.