Shared more. Cited more. Safe forever.
    • advanced search
    • submit works
    • about
    • help
    • contact us
    • login
    View Item 
    •   MOspace Home
    • University of Missouri-Columbia
    • Graduate School - MU Theses and Dissertations (MU)
    • Theses and Dissertations (MU)
    • Theses (MU)
    • 2015 Theses (MU)
    • 2015 MU theses - Access restricted to MU
    • View Item
    •   MOspace Home
    • University of Missouri-Columbia
    • Graduate School - MU Theses and Dissertations (MU)
    • Theses and Dissertations (MU)
    • Theses (MU)
    • 2015 Theses (MU)
    • 2015 MU theses - Access restricted to MU
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.
    advanced searchsubmit worksabouthelpcontact us

    Browse

    All of MOspaceCommunities & CollectionsDate IssuedAuthor/ContributorTitleSubjectIdentifierThesis DepartmentThesis AdvisorThesis SemesterThis CollectionDate IssuedAuthor/ContributorTitleSubjectIdentifierThesis DepartmentThesis AdvisorThesis Semester

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular AuthorsStatistics by Referrer

    Distributed association rule mining using an in-memory cluster computer framework

    Phinney, Michael
    View/Open
    [PDF] public.pdf (2.071Kb)
    [PDF] research.pdf (2.407Mb)
    [PDF] short.pdf (27.99Kb)
    Date
    2015
    Format
    Thesis
    Metadata
    [+] Show full item record
    Abstract
    [ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT REQUEST OF AUTHOR.] Association rule mining is a mature area; however, there is much to be done to take full advantage of the massive distributed computing environments that are available today. Association rule mining is a key data mining technique used in a wide range of application domains. A variety of algorithms have been proposed over the past two decades to efficiently identify frequent patterns. In general, these methods fall into two broad categories: Apriori-based and growth-based. Since the number of potential frequent itemsets is exponential with respect to the number of distinct items, the resources required for computation can quickly exceed the capacity of a single machine. Hence, we must consider distributed approaches to extend our reach. In this thesis, we will discuss the algorithms used in an association rule mining pipeline designed to take advantage of in-memory cluster computing environments. We propose several mechanisms to tailor the Apriori algorithm to distributed computing ecosystems and evaluate the scalability of our approach. We demonstrate significant improvements over an existing frequent pattern mining method, achieving nearly 1000 times speed up on certain datasets. Our proposed association rule mining package provides modularity to promote extensibility and flexibility in terms of rule extraction, filtering, and analysis.
    URI
    https://hdl.handle.net/10355/50166
    Degree
    M.S.
    Thesis Department
    Computer science (MU)
    Rights
    Access to files is limited to the University of Missouri--Columbia.
    Collections
    • 2015 MU theses - Access restricted to MU
    • Computer Science electronic theses and dissertations (MU)

    Send Feedback
    hosted by University of Missouri Library Systems
     

     


    Send Feedback
    hosted by University of Missouri Library Systems