Predicting protein model quality and quaternary structures

No Thumbnail Available

Meeting name

Sponsors

Date

Journal Title

Format

Thesis

Subject

Research Projects

Organizational Units

Journal Issue

Abstract

[EMBARGOED UNTIL 12/01/2025] Protein structure prediction is a critical challenge in bioinformatics. One of the difficulties is evaluating model quality. In this dissertation, we present several quality assessment (QA) methods we have developed over the years, including MUfoldQA_S, MUfoldQA_C, MUfoldQA_M, MUfoldQA_T, and MUfoldQA_G. These methods achieved top rankings in various stages of the CASP competitions, demonstrating their effectiveness. MUfoldQA_S is a quasi-single-model QA method that assesses model quality based on known protein structures with similar sequences. This algorithm can be directly applied to protein fragments without the necessity of building a full structural model. MUfoldQA_C extends this approach to a multi-model framework. During CASP 12, MUfoldQA_S ranked No.1 in stage 1 average GDT-TS difference. MUfoldQA_C ranked No.1 in top 1 model GDT-TS loss, No.2 in average GDT-TS difference for both stage 1 and stage 2. MUfoldQA_M and MUfoldQA_T both use a set of reference models to score each candidate model but they differ in how the reference models are selected and how the scores of a candidate model are calculated and weighted. During CASP 13, MUfoldQA_M and MUfoldQA_T combined have achieved 2 No.1, 5 No.2, and 4 No. 3. MUfoldQA_G is based on two of our new algorithms: MUfoldQA_Gp and MUfoldQA_Gr. It combines their information in a way that achieves good performance in terms of both Pearson correlation and average GDT-TS difference. In CASP 14, MUfoldQA_G ranked No. 1 in terms of Pearson correlation and No. 2 in terms of average GDT-TS difference. Additionally, we designed and implemented the public PSICA web server that generates 1) predicted global GDT-TS score; 2) interactive comparison between the model and other known protein structures; 3) visualization of the predicted local quality of the model; and 4) JSmol rendering of the model for a given predicted 3D model. Our work on quaternary structure prediction mainly focuses on cyclic protein oligomers, which play vital roles in many biological processes and have captured the attention of protein designers. An important step of protein design is to predict the quaternary structure for a given primary sequence of a protein complex. In recent years, results from deep- learning-based methods for protein structure prediction are becoming more and more accurate. Unfortunately, these methods are usually slow and require a large amount of memory, which can become extremely problematic for researchers especially when facing the reverse problem: to find a sequence that folds to a required structure. In this work, we present a new method, AF2Ring, that achieves accurate prediction with moderate memory requirement for predicting quaternary structures of cyclic protein assemblies with repeating subunits. AF2Ring utilizes AlphaFold2 with reduced number of chains as input, and then use the partial results to construct full quaternary structure. Our experimental results show that AF2Ring significantly reduced memory consumption and computation time, while achieving prediction accuracy comparable to directly running AlphaFold2, when possible, on full protein sequences. This method facilitates protein design methods by providing large scale validation of different designs and longer length allowance.

Table of Contents

DOI

PubMed ID

Degree

Ph. D.

Thesis Department

Rights

License