Using machine learning approach to predict enzyme family classes by fusing AM-PSE-AAC and PSE-PSSM
Metadata[+] Show full item record
[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT AUTHOR'S REQUEST.] Protein function prediction is one of the most challenging problems in the postgenomic era. One approach for function prediction is to classify a protein into a functional family to promote the functional characterization of newly identified proteins. The early family classification of a newly found enzyme molecule provides detailed information about which specific targets it acts on, as well as what catalytic processes it is involved in. According to the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB), the first Enzyme Commission met in 1961 and set up the classification of enzymes into the following six classes: oxidoreductases, transferases, hydrolases, lyases, isomerases and ligases. Predicting enzyme family class is a typical multi-class classification problem. Here, two automatic enzyme identifiers, the Support Vector Machine (SVM) identifier and Deep Belief Network (DBN) identifier based on the machine learning approach are introduced with feature vectors by fusing Am-Pse-AA and Pse-PSSM extracted from protein amino acid sequences. We validated two methods on an enzyme benchmark dataset, which contains six main enzyme families. The proteins within a same functional class in the dataset have less than 60% sequence identity. In predicted enzyme families, an SVM classifier gives 87.33% accuracy and a DBN classifier obtains 84.55% accuracy. Therefore, the performance of our methods can be compared to some existing methods based on previous works.
Access to files is limited to the University of Missouri--Columbia.