PRO3DCNN : convolutional neural network for mapping protein structure into folds
No Thumbnail Available
Authors
Meeting name
Sponsors
Date
Journal Title
Format
Thesis
Subject
Abstract
Motivation: SCOPe 2.07 is a dataset of 276,231 protein domains that have been partitioned into varying folds according to their shape and function. Since a protein's fold reveals valuable information about it's shape and function, it is important to fin d a mapping between proteins and their folds. There are existing techniques to map a protein's sequence into a fold [2] but none to map a protein's shape into a fold for the entire SCOPe 2.07 dataset. We focus on the topological features of a protein to map it into a fold. We introduce several new techniques that accomplish this. Results: We develop a 2D-convolutional neural network to classify any protein structure into one of 1232 folds. We extract two classes of input features for each protein: distance matrix and persistent homology images. Due to restrictions in our computing resources, we make sample every other point in the carbon alpha chain. We find that it does not lead to significant loss in accuracy. Using the distance matrix, we achieve an accuracy of 90% on the entire dataset. With persistence images of 100x100 resolution, we achieve an accuracy of 54% on SCOP 1.55.
Table of Contents
PubMed ID
Degree
M.S.
Thesis Department
Rights
OpenAccess.
License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License.
