[-] Show simple item record

dc.contributor.advisorKazic, Toni Marieeng
dc.contributor.authorVatsa, Avimanyou Kumareng
dc.date.issued2017eng
dc.date.submitted2017 Summereng
dc.description.abstractRecently emerging approaches to high-throughput phenotyping have become important tools in unraveling the biological basis of agronomically and medically important phenotypes. These experiments produce very large sets of either low or high-dimensional data. Finding clusters in the entire space of high-dimensional data (HDD) is a challenging task, because the relative distances between any two objects converge to zero with increasing dimensionality. Additionally, real data may not be mathematically well behaved. Finally, many clusters are expected on biological grounds to be "natural" -- that is, to have irregular, overlapping boundaries in different subsets of the dimensions. More precisely, the natural clusters of the data could differ in shape, size, density, and dimensionality; and they might not be disjoint. In principle, clustering such data could be done by dimension reduction methods. However, these methods convert many dimensions to a smaller set of dimensions that make the clustering results difficult to interpret and may also lead to a significant loss of information. Another possible approach is to find subspaces (subsets of dimensions) in the entire data space of the HDD. However, the existing subspace methods don't discover natural clusters. Therefore, in this dissertation I propose a novel data preprocessing method, demonstrating that a group of phenotypes are interdependent, and propose a novel density-based subspace clustering algorithm for high-dimensional data, called Dynamic Locally Density Adaptive Scalable Subspace Clustering (DynaDASC). This algorithm is relatively locally density adaptive, scalable, dynamic, and nonmetric in nature, and discovers natural clusters.eng
dc.description.bibrefIncludes bibliographical references (pages 62-73).eng
dc.description.statementofresponsibilityDr. Toni Kazic, Dissertation Supervisor.|Includes vita.eng
dc.format.extent1 online resource (x, 85 pages) : illustrations (chiefly color)eng
dc.identifier.merlinb121363326eng
dc.identifier.oclc1022949418eng
dc.identifier.urihttps://hdl.handle.net/10355/62341
dc.identifier.urihttps://doi.org/10.32469/10355/62341eng
dc.languageEnglisheng
dc.publisherUniversity of Missouri--Columbiaeng
dc.relation.ispartofcommunityUniversity of Missouri--Columbia. Graduate School. Theses and Dissertationseng
dc.rightsOpenAccess.eng
dc.rights.licenseThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License.
dc.subject.FASTPhenotypeeng
dc.subject.FASTFood securityeng
dc.subject.FASTCluster analysiseng
dc.titleAn approach to clustering biological phenotypes /eng
dc.typeThesiseng
thesis.degree.disciplineComputer science (MU)eng
thesis.degree.grantorUniversity of Missouri--Columbiaeng
thesis.degree.levelDoctoraleng
thesis.degree.namePh. D.eng


Files in this item

[PDF]
[PDF]

This item appears in the following Collection(s)

[-] Show simple item record