Data analysis and prediction of protein posttranslational modification

Yao, Qiuming

URI

https://hdl.handle.net/10355/63994
https://doi.org/10.32469/10355/63994

dc.contributor.advisor	Xu, Dong, 1965-	eng
dc.contributor.author	Yao, Qiuming	eng
dc.date.issued	2014	eng
dc.date.submitted	2014 Fall	eng
dc.description.abstract	[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT AUTHOR'S REQUEST.] Protein posttranslational modification (PTM) occurs broadly after or during protein biosynthesis, to assist folding or activate function during the protein lifetime. Among all types of PTMs, protein phosphorylation is widely recognized as the most pervasive, enzyme-catalyzed post-translational modification in eukaryotes. In particular, plants have higher magnitude of this signaling mechanism in terms of the protein kinase frequency within the genome compared to other eukaryotes. Phosphorylation site mapping using high-resolution mass spectrometry has grown exponentially. In Arabidopsis alone there are thousands of experimentally-determined phosphorylation sites. Likewise, other types of post translational modification data are rapidly increasing too. Acetylation proteome is another big data set in PTM kingdom. To provide an easy access of these modification events in a user-intuitive format we have developed P3DB, The Plant Protein Phosphorylation Database (p3db.org). This database is a repository for plant protein phosphorylation site data. These data can be queried for a protein-of-interest using an integrated BLAST function to search for similar sequences with known phosphorylation sites among the multiple plants currently investigated. Thus, this resource can help identify functionally-conserved phosphorylation sites in plants using a multi-system approach. Centralized by these phosphorylation data, multiple related data and annotations are provided, including protein-protein interaction (PPI), gene ontology, protein tertiary structures, orthologous sequences, kinase/phosphatase classification and Kinase Client Assay (KiC Assay) data. P3DB thus is not only a repository, but also a context provider for studying phosphorylation events. In addition, P3DB incorporates multiple network viewers for the above features, such as PPI network, kinase-substrate network, phosphatase-substrate network, and domain co-occurrence network to help study phosphorylation from a systems point of view. Furthermore, P3DB reflects a community-based design through which users can share data sets and automate data depository processes for publication purposes. Since P3DB is a comprehensive, systematic, and interactive platform for phosphoproteomics research, many data analyses can be done based on it. For example, the disorder analysis and the sequence conservation can be done based on the P3DB datasets. Many researchers downloaded and did some meaningful analysis based on P3DB infrastructure. Although with the development of the high-resolution mass spectrometry protein phosphorylation sites can be reliably identified, the experimental approach is time-consuming and resource-dependent. Furthermore, it is unlikely that an experimental approach could catalog an entire phosphoproteome. Computational prediction of phosphorylation sites provides an efficient and flexible way to reveal potential phosphorylation sites, facilitate experimental phosphorylation site identification and provide hypotheses in experimental design. Musite is a powerful tool that we developed to predict phosphorylation sites based solely on protein sequence. Musite integrates data preprocessing, feature extraction, machine-learning method, and prediction models into one comprehensive tool. Musite (http://musite.net) can be extended to all types of post translational modification study, as long as the dataset contains sufficient modification sites. To further improve the performance of Musite, a generalized motif tree applying fuzzy logic is introduced to compensate the machine learning based prediction. On one hand, using a tree based approach and fuzzy variables help to interpret the final rules, in order to help biologists to obtain the significant patterns. On the other hand, its extracted rule sets essentially generalize the motifs and reveal more information. It can be paired with traditional classification method and provide better interpretation, pre-filtering and analyzing power. Comparing to traditional motif extraction, the fuzzy motif decision tree is able to borrow more information from the observations and thus it may extract more novel motifs or more comprehensive patterns. It can be applied on kinase specific phosphorylated peptides to achieve more insights of the phosphorylation events. A comprehensive database (P3DB), a well-developed prediction tool (Musite), and a generalized motif constructor (Fuzzy Motif Tree) combined enable researchers to investigate the phosphorylation and other posttranslational modification events more thoroughly and thus to reveal more underlying biological significance by applying these computational resources.	eng
dc.identifier.uri	https://hdl.handle.net/10355/63994
dc.identifier.uri	https://doi.org/10.32469/10355/63994	eng
dc.language	English	eng
dc.publisher	University of Missouri--Columbia	eng
dc.relation.ispartofcommunity	University of Missouri--Columbia. Graduate School. Theses and Dissertations	eng
dc.rights	Access is limited to the campus of the University of Missouri--Columbia.	eng
dc.title	Data analysis and prediction of protein posttranslational modification	eng
dc.type	Thesis	eng
thesis.degree.discipline	Computer science (MU)	eng
thesis.degree.grantor	University of Missouri--Columbia	eng
thesis.degree.level	Doctoral	eng
thesis.degree.name	Ph. D.	eng

Files in this item

Name:: research.pdf
Size:: 2.481Mb
Format:: PDF

View/Open

Name:: public.pdf
Size:: 2.332Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

2014 MU dissertations - Access restricted to MU
Computer Science electronic theses and dissertations (MU)
The electronic theses and dissertations of the Department of Computer Science.

[-] Show simple item record