Computer Science electronic theses and dissertations (MU)

Permanent URI for this collection

The items in this collection are the theses and dissertations written by students of the Department of Computer Science. Some items may be viewed only by members of the University of Missouri System and/or University of Missouri-Columbia. Click on one of the browse buttons above for a complete listing of the works.

Browse

Recent Submissions

Now showing 1 - 5 of 353
  • Item
    Predicting protein model quality and quaternary structures
    (University of Missouri--Columbia, 2024) Wang, Wenbo; Shang, Yi
    [EMBARGOED UNTIL 12/01/2025] Protein structure prediction is a critical challenge in bioinformatics. One of the difficulties is evaluating model quality. In this dissertation, we present several quality assessment (QA) methods we have developed over the years, including MUfoldQA_S, MUfoldQA_C, MUfoldQA_M, MUfoldQA_T, and MUfoldQA_G. These methods achieved top rankings in various stages of the CASP competitions, demonstrating their effectiveness. MUfoldQA_S is a quasi-single-model QA method that assesses model quality based on known protein structures with similar sequences. This algorithm can be directly applied to protein fragments without the necessity of building a full structural model. MUfoldQA_C extends this approach to a multi-model framework. During CASP 12, MUfoldQA_S ranked No.1 in stage 1 average GDT-TS difference. MUfoldQA_C ranked No.1 in top 1 model GDT-TS loss, No.2 in average GDT-TS difference for both stage 1 and stage 2. MUfoldQA_M and MUfoldQA_T both use a set of reference models to score each candidate model but they differ in how the reference models are selected and how the scores of a candidate model are calculated and weighted. During CASP 13, MUfoldQA_M and MUfoldQA_T combined have achieved 2 No.1, 5 No.2, and 4 No. 3. MUfoldQA_G is based on two of our new algorithms: MUfoldQA_Gp and MUfoldQA_Gr. It combines their information in a way that achieves good performance in terms of both Pearson correlation and average GDT-TS difference. In CASP 14, MUfoldQA_G ranked No. 1 in terms of Pearson correlation and No. 2 in terms of average GDT-TS difference. Additionally, we designed and implemented the public PSICA web server that generates 1) predicted global GDT-TS score; 2) interactive comparison between the model and other known protein structures; 3) visualization of the predicted local quality of the model; and 4) JSmol rendering of the model for a given predicted 3D model. Our work on quaternary structure prediction mainly focuses on cyclic protein oligomers, which play vital roles in many biological processes and have captured the attention of protein designers. An important step of protein design is to predict the quaternary structure for a given primary sequence of a protein complex. In recent years, results from deep- learning-based methods for protein structure prediction are becoming more and more accurate. Unfortunately, these methods are usually slow and require a large amount of memory, which can become extremely problematic for researchers especially when facing the reverse problem: to find a sequence that folds to a required structure. In this work, we present a new method, AF2Ring, that achieves accurate prediction with moderate memory requirement for predicting quaternary structures of cyclic protein assemblies with repeating subunits. AF2Ring utilizes AlphaFold2 with reduced number of chains as input, and then use the partial results to construct full quaternary structure. Our experimental results show that AF2Ring significantly reduced memory consumption and computation time, while achieving prediction accuracy comparable to directly running AlphaFold2, when possible, on full protein sequences. This method facilitates protein design methods by providing large scale validation of different designs and longer length allowance.
  • Item
    Machine learning prediction of interchain contacts and structural model quality for protein complexes
    (University of Missouri--Columbia, 2024) Roy, Raj Shekhor; Cheng, Jianlin
    [EMBARGOED UNTIL 12/01/2025] Protein is the building block of life. It takes part in various biochemical reactions to sustain life. The role that a protein will play in a reaction predominantly depends on its structure. As a result, determining the structure of the proteins could unravel the mystery of life. Hence it has become a field of principal interest to researchers worldwide. Traditional approaches used to determine the structure of protein are time-consuming and expensive. Therefore a computational based method must be employed to speed up the process. In the last decade deep learning has established the impact of the computational based method. Now deep learning is extensively used in predicting the interaction and the structure of protein, quality assessment and also in determining the cryo electron density map(EDM) from 2D micrographs. This dissertation presents 3 contributions. First DRCON, introduces a deep dilated deep residual network to predict the interchain contact map of homodimers by features utilizing its sequence information only . Secondly, Multicom_qa, it uses a hybrid approach utilizing pairwise structural similarity and interface contact predicted by deep learning tools to estimate protein complex model accuracy . In the CASP15,it was ranked top in estimating the global structure accuracy of assembly models. Thirdly, Benchmarking Techniques for Electron Density Map Reconstruction from Imaging Data: A Comparative Study. Currently, cryo-em has become very popular for the study of large protein molecules but still it has not received much attention. Hence we provided a comprehensive overview of current methodologies and their effectiveness in 3D cryo-EM EDM reconstruction from images, offering insights for future research and development.
  • Item
    Improving protein structure prediction with multicom
    (University of Missouri--Columbia, 2024) Liu, Jian; Cheng, Jianlin
    [EMBARGOED UNTIL 12/01/2025] Proteins play essential roles in biological processes and understanding their three-dimensional (3D) structures is vital for revealing their complex functions. This study addresses the challenge of accurately predicting protein tertiary and quaternary structures from amino acid sequences, which offers an efficient and cost- effective alternative to experimental methods such as X-ray crystallography and cryo-electron microscopy. Despite advancements in deep learning, including methods like AlphaFold2, challenges remain in predicting protein tertiary and quaternary structures and assessing model quality due to limited evolutionary data and complex inter-chain interactions in multimers. This research aims to enhance protein structure prediction by integrating template-based and template-free methods, improving AlphaFold2 for tertiary structure predictions, and enhancing AlphaFold-Multimer for complex quaternary structures. Other key contributions include exploring the use of AlphaFold3 to predict protein complex stoichiometry and developing GATE, a novel quality assessment tool based on graph transformers and pairwise similarity graphs, for more precise model selection from large pools of predicted tertiary structures and quaternary structures. Results from the recent Critical Assessment of Structure Prediction (CASP) experiments (e.g., CASP14, CASP15 and CASP16) demonstrate the impact of these advancements, and the study concludes with potential future directions for protein structure prediction and quality assessment improvements.
  • Item
    New methods to improve detection and classification on waterfowl and tree in aerial images
    (University of Missouri--Columbia, 2024) Zhang, Yang; Shang, Yi
    Monitoring waterfowl populations and their classification is vital for wetland conservation, and deep learning has recently demonstrated significant potential in detecting waterfowl from aerial images. This dissertation introduces several innovative methods to enhance waterfowl detection and classification. First, the Self-Rectification Network (SRN) was developed to improve detection performance in complex backgrounds by reducing false positives. SRN employs a novel sampling mechanism that ensures a balanced ratio of positive and negative proposals in each training batch and prioritizes challenging cases, achieving a 10.1 percent improvement on hard-to-detect images. Second, a Size-Related Non-Maximum Suppression (NMS) technique was introduced to eliminate false detections of objects with abnormal sizes, leveraging the consistent pixel dimensions of waterfowl in aerial imagery. This approach improved detection accuracy by approximately 3 percent on specific datasets. Finally, to address the labeling burden, a class-balanced sampling strategy was devised, enabling effective training with 50 percent labeled data and significantly reducing the time and effort required for annotation. Tree canopy classification, another focus of this dissertation, holds great po- tential for monitoring environmental diversity and stability. While trees can be categorized by family, genus, and species, existing classifiers often overlook the hi- erarchical relationships among these categories. To address this gap, ConsistentNet was developed to ensure consistency in predictions across these hierarchical levels. A novel consistency loss function supervises these relationships during training, resulting in superior performance on datasets with class-subclass structures. Consis- tentNet demonstrated a 5 percent improvement in accuracy over baseline models on tree canopy classification datasets and a 2 percent improvement on the CIFAR-100 dataset. Together, these advancements underscore the promise of deep learning for ecolog- ical monitoring and classification tasks, offering robust solutions for challenging scenarios and hierarchical datasets.
  • Item
    Leveraging deep learning for change detection in bi-temporal remote sensing imagery
    (University of Missouri--Columbia, 2024) Alshehri, Mariam S; Hurt, J. Alex
    Deforestation in the Brazilian Amazon poses significant threats to global cli- mate stability, biodiversity, and local communities. This dissertation presents ad- vanced deep learning approaches to improve deforestation detection using bi-temporal Sentinel-2 satellite imagery. We developed a specialized dataset capturing deforesta- tion events between 2020 and 2021 in key conservation units of the Amazon.We first adapted transformer-based change detection models to the deforestation context, leveraging attention mechanisms to analyze spatial and temporal patterns. While these models showed high accuracy, limitations remained in effectively captur- ing subtle environmental changes. To address this, we introduce DeforestNet, a novel deep learning framework that integrates advanced semantic segmentation encoders within a siamese architecture. DeforestNet employs cross-temporal interaction mechanisms and temporal fusion strategies to enhance the discrimination of true deforestation events from background noise. Experimental results demonstrate that DeforestNet outperforms existing mod- els, achieving higher precision, recall, and F1-scores in deforestation detection. Ad- ditionally, it generalizes well to other change detection tasks, as evidenced by its performance on the LEVIR-CD urban building change detection dataset. This research contributes a robust and efficient framework for accurate change detection in remote sensing imagery, offering valuable tools for environmental moni- toring and aiding global efforts in sustainable forest management and conservation.
Items in MOspace are protected by copyright, with all rights reserved, unless otherwise indicated.