An approach to clustering biological phenotypes /

Vatsa, Avimanyou Kumar

Vatsa, Avimanyou Kumar

View/Open

research.pdf (53.63Mb)

public.pdf (15.30Kb)

Date

2017

Format

Thesis

Metadata

[+] Show full item record

Abstract

Recently emerging approaches to high-throughput phenotyping have become important tools in unraveling the biological basis of agronomically and medically important phenotypes. These experiments produce very large sets of either low or high-dimensional data. Finding clusters in the entire space of high-dimensional data (HDD) is a challenging task, because the relative distances between any two objects converge to zero with increasing dimensionality. Additionally, real data may not be mathematically well behaved. Finally, many clusters are expected on biological grounds to be "natural" -- that is, to have irregular, overlapping boundaries in different subsets of the dimensions. More precisely, the natural clusters of the data could differ in shape, size, density, and dimensionality; and they might not be disjoint. In principle, clustering such data could be done by dimension reduction methods. However, these methods convert many dimensions to a smaller set of dimensions that make the clustering results difficult to interpret and may also lead to a significant loss of information. Another possible approach is to find subspaces (subsets of dimensions) in the entire data space of the HDD. However, the existing subspace methods don't discover natural clusters. Therefore, in this dissertation I propose a novel data preprocessing method, demonstrating that a group of phenotypes are interdependent, and propose a novel density-based subspace clustering algorithm for high-dimensional data, called Dynamic Locally Density Adaptive Scalable Subspace Clustering (DynaDASC). This algorithm is relatively locally density adaptive, scalable, dynamic, and nonmetric in nature, and discovers natural clusters.

URI

https://hdl.handle.net/10355/62341
https://doi.org/10.32469/10355/62341

Degree

Ph. D.

Thesis Department

Computer science (MU)

Rights

OpenAccess.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License.