[-] Show simple item record

dc.contributor.advisorShyu, Chi-Reneng
dc.contributor.authorChi, Pin-Hao, 1976-eng
dc.date.issued2007eng
dc.descriptionThe entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file.eng
dc.descriptionTitle from title screen of research.pdf file (viewed on September 19, 2007)eng
dc.descriptionVita.eng
dc.descriptionIncludes bibliographical references.eng
dc.descriptionThesis (Ph. D.) University of Missouri-Columbia 2007.eng
dc.descriptionDissertations, Academic -- University of Missouri--Columbia -- Computer science.eng
dc.description.abstractFunctionally important sites of proteins are potentially conserved to specific three-dimensional structural folds. To understand the structure-to-function relationship, life sciences researchers and biologists have a great need to retrieve similar structures from protein databases and classify these structures into the same protein fold. Traditional protein structure retrieval and classification methods are known to be either computationally expensive or labor intensive. In the past decade, more than 35000 protein structures have been identified. To meet the needs of fast retrieval and classifying high-throughput protein data, our research covers three main subjects: (1) Real-time global protein structure retrieval: We introduce an image-based approach that extracts signatures of three-dimensional protein structures. Our high-level protein signatures are then indexed by multi-dimensional indexing trees for fast retrieval. (2) Real-time global protein structure classification: An advanced knowledge discovery and data mining (KDD) model is proposed to convert high-level protein signature into itemsets for mining association rules. The advantage of this KDD approach is to effectively reveal the hidden knowledge from similar protein tertiary structures and quickly suggest possible SCOP domains for a newly-discovered protein. In addition, we develop a non-parametric classifier, E-Predict, that can rapidly assign known SCOP folds and recognize novel folds for newly-discovered proteins. (3) Efficient local protein structure retrieval and classification: We propose a novel algorithm, namely, the Index-based Protein Substructure Alignment (IPSA), that constructs a two-layer indexing tree to capture the obscured similarity of protein substructures in a timely fashion. Our research works exhibit significantly high efficiency with reasonably high accuracy and will benefit the study of high-throughput protein structure-function evolutionary relationships.eng
dc.identifier.merlinb59624255eng
dc.identifier.oclc173147500eng
dc.identifier.urihttp://hdl.handle.net/10355/4817
dc.languageEnglisheng
dc.publisherUniversity of Missouri--Columbiaeng
dc.relation.ispartofcollectionUniversity of Missouri--Columbia. Graduate School. Theses and Dissertationseng
dc.subject.lcshProteins -- Databaseseng
dc.subject.lcshAlgorithmseng
dc.subject.lcshInformation storage and retrieval systems -- Life scienceseng
dc.subject.lcshData miningeng
dc.titleEfficient protein tertiary structure retrievals and classifications using content based comparison algorithmseng
dc.typeThesiseng
thesis.degree.disciplineComputer science (MU)eng
thesis.degree.grantorUniversity of Missouri--Columbiaeng
thesis.degree.levelDoctoraleng
thesis.degree.namePh. D.eng


Files in this item

[PDF]
[PDF]
[PDF]

This item appears in the following Collection(s)

[-] Show simple item record