Shared more. Cited more. Safe forever.
    • advanced search
    • submit works
    • about
    • help
    • contact us
    • login
    View Item 
    •   MOspace Home
    • University of Missouri-Columbia
    • Graduate School - MU Theses and Dissertations (MU)
    • Theses and Dissertations (MU)
    • Theses (MU)
    • 2018 Theses (MU)
    • 2018 MU theses - Access restricted to UM
    • View Item
    •   MOspace Home
    • University of Missouri-Columbia
    • Graduate School - MU Theses and Dissertations (MU)
    • Theses and Dissertations (MU)
    • Theses (MU)
    • 2018 Theses (MU)
    • 2018 MU theses - Access restricted to UM
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.
    advanced searchsubmit worksabouthelpcontact us

    Browse

    All of MOspaceCommunities & CollectionsDate IssuedAuthor/ContributorTitleSubjectIdentifierThesis DepartmentThesis AdvisorThesis SemesterThis CollectionDate IssuedAuthor/ContributorTitleSubjectIdentifierThesis DepartmentThesis AdvisorThesis Semester

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular AuthorsStatistics by Referrer

    An in depth evaluation of internal criteria for determining the number of clusters in a data set

    Stevens, Jordan
    View/Open
    [PDF] research.pdf (377.4Kb)
    Date
    2018
    Format
    Thesis
    Metadata
    [+] Show full item record
    Abstract
    [ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT AUTHOR'S REQUEST.] Clustering procedures partition data sets into groups, minimizing the distance within clusters while maximizing the distance between clusters. Clustering can be a powerful tool for data reduction or hypothesis generation, however, there is a strong assumption that the number of clusters is known before clustering can occur. In order to combat this issue, many clustering indices have been developed to aid researches in identifying the number of clusters in a data set. Milligan and Cooper (1985) evaluated 30 clustering indices in 1985 and identified the Calinski and Harabasz (1974) index as the best index for determining the number of clusters and it has been the most popular clustering index ever since. With the increase in computing power, many researchers have developed new indices that have demonstrated superiority over the CH index. Over 30 clustering indices have been developed since the Milligan and Cooper study with no clear indication of when these clustering indices should be used. This paper sought to replicate and expand upon the findings of Milligan and Cooper by examining the performance of 66 clustering indices. Clusters we regenerated with hierarchical and non-hierarchical clustering procedures with varying degrees of cluster overlap. The Friedman and Rubin (1967) index was found to have the greatest degree of accuracy when cluster overlap was present. Boot strapping procedures were performed in order to examine the performance of the Friedman index when conditions are less than ideal. Finally, an application of the clustering index was demonstrated with a nationally representative data set assessing symptoms of Alcohol Use Disorder (AUD).
    URI
    https://hdl.handle.net/10355/66214
    Degree
    M.A.
    Thesis Department
    Psychology (MU)
    Rights
    Access is limited to the campuses of the University of Missouri.
    This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License.
    Collections
    • 2018 MU theses - Access restricted to UM
    • Psychological Sciences electronic theses and dissertations (MU)

    Send Feedback
    hosted by University of Missouri Library Systems
     

     


    Send Feedback
    hosted by University of Missouri Library Systems