[-] Show simple item record

dc.contributor.advisorCheng, Jianlineng
dc.contributor.authorChen, Cheneng
dc.date.embargountil8/1/2023
dc.date.issued2022eng
dc.date.submitted2022 Summereng
dc.description.abstractProteins are large, complex molecules that perform most essential functions within organisms. In this work, we mainly focus on two important aspects that determine their functional properties: the tertiary structure of the proteins and their interaction patterns with the genome. Understanding these properties brings valuable insights on the fundamentals of biology and result in new applications in areas such as agriculture, precision medicine, and drug discovery. The recent developments of bioinformatics and structural biology, machine learning, in particular deep learning has proven to be extremely powerful in inference and interpretation of experimental observations by taking advantage of the large amount data publicly available today. We aim to propose novel machine learning frameworks that can both extract information from higher-level features, and provide explainability for meaningful insights beyond the predictions as well. However, due to the volatility of biology phenomena, the design of data processing and modeling need to be extensive for features from the the proteins. Also, the different geophysical measurements (1D, 2D and 3D) of the protein properties bring new challenges for the selection of model architectures that can effectively leverage different forms of data structure. In this dissertation, four major contributions are described. First, DeepGRN, is a method for transcription binding site prediction using 1D transformer-based network. Second, GNET2, is a data-assisted method to infer the interactions between proteins and genes from gene expression data using decision tree and information theory. Third, ATTContact, is a tool for protein contact prediction based on 2D residual neural networks with attention mechanism. Finally, EnQA, a method based on 3D equivariant graph networks for protein model quality assessment and selection of the most accurate model as the final protein structure prediction. All the methods described have been released as open source software, and are freely available to the scientific community.eng
dc.description.bibrefIncludes bibliographical references.eng
dc.format.extentxiv, 143 pages : illustrations (color)eng
dc.identifier.urihttps://doi.org/10.32469/10355/94056eng
dc.identifier.urihttps://hdl.handle.net/10355/94056
dc.languageEnglisheng
dc.publisherUniversity of Missouri--Columbiaeng
dc.relation.ispartofcommunityUniversity of Missouri--Columbia. Graduate School. Theses and Dissertationseng
dc.titleProtein-DNA interaction prediction and protein structure modeling by machine learningeng
dc.typeThesiseng
thesis.degree.disciplineComputer Science (MU)eng
thesis.degree.grantorUniversity of Missouri--Columbiaeng
thesis.degree.levelDoctoraleng
thesis.degree.namePh. D.eng


Files in this item

[PDF]

This item appears in the following Collection(s)

[-] Show simple item record