PRO3DCNN : convolutional neural network for mapping protein structure into folds

No Thumbnail Available

Meeting name

Sponsors

Date

Journal Title

Format

Thesis

Subject

Research Projects

Organizational Units

Journal Issue

Abstract

Motivation: SCOPe 2.07 is a dataset of 276,231 protein domains that have been partitioned into varying folds according to their shape and function. Since a protein's fold reveals valuable information about it's shape and function, it is important to fin d a mapping between proteins and their folds. There are existing techniques to map a protein's sequence into a fold [2] but none to map a protein's shape into a fold for the entire SCOPe 2.07 dataset. We focus on the topological features of a protein to map it into a fold. We introduce several new techniques that accomplish this. Results: We develop a 2D-convolutional neural network to classify any protein structure into one of 1232 folds. We extract two classes of input features for each protein: distance matrix and persistent homology images. Due to restrictions in our computing resources, we make sample every other point in the carbon alpha chain. We find that it does not lead to significant loss in accuracy. Using the distance matrix, we achieve an accuracy of 90% on the entire dataset. With persistence images of 100x100 resolution, we achieve an accuracy of 54% on SCOP 1.55.

Table of Contents

PubMed ID

Degree

M.S.

Thesis Department

Rights

OpenAccess.

License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License.