[-] Show simple item record

dc.contributor.advisorShyu, Chi-Reneng
dc.contributor.authorPhinney, Michaeleng
dc.date.issued2017eng
dc.date.submitted2017 Springeng
dc.descriptionField of study: Computer science.eng
dc.descriptionDr. Chi-Ren Shyu, Dissertation Supervisor.eng
dc.descriptionIncludes vita.eng
dc.description"May 2017."eng
dc.description.abstractFrequent pattern mining is a classic data mining technique, generally applicable to a wide range of application domains, and a mature area of research. The fundamental challenge arises from the combinatorial nature of frequent itemsets, scaling exponentially with respect to the number of unique items. Apriori-based and FPTree-based algorithms have dominated the space thus far. Initial phases of this research relied on the Apriori algorithm and utilized a distributed computing environment; we proposed the Cartesian Scheduler to manage Apriori's candidate generation process. To address the limitation of bottom-up frequent pattern mining algorithms such as Apriori and FPGrowth, we propose the Frequent Hierarchical Pattern Tree (FHPTree): a tree structure and new frequent pattern mining paradigm. The classic problem is redefined as frequent hierarchical pattern mining where the goal is to detect frequent maximal pattern covers. Under the proposed paradigm, compressed representations of maximal patterns are mined using a top-down FHPTree traversal, FHPGrowth, which detects large patterns before their subsets, thus yielding significant reductions in computation time. The FHPTree memory footprint is small; the number of nodes in the structure scales linearly with respect to the number of unique items. Additionally, the FHPTree serves as a persistent, dynamic data structure to index frequent patterns and enable efficient searches. When the search space is exponential, efficient targeted mining capabilities are paramount; this is one of the key contributions of the FHPTree. This dissertation will demonstrate the performance of FHPGrowth, achieving a 300x speed up over state-of-the-art maximal pattern mining algorithms and approximately a 2400x speedup when utilizing FHPGrowth in a distributed computing environment. In addition, we allude to future research opportunities, and suggest various modifications to further optimize the FHPTree and FHPGrowth. Moreover, the methods we offer will have an impact on other data mining research areas including contrast set mining as well as spatial and temporal mining.eng
dc.description.bibrefIncludes bibliographical references (pages 121-133).eng
dc.format.extent1 online resource (xi, 135 pages) : illustrations (chiefly color)eng
dc.identifier.merlinb129185243eng
dc.identifier.oclc1099179294eng
dc.identifier.urihttps://hdl.handle.net/10355/63867
dc.identifier.urihttps://doi.org/10.32469/10355/63867eng
dc.languageEnglisheng
dc.publisherUniversity of Missouri--Columbiaeng
dc.relation.ispartofcommunityUniversity of Missouri--Columbia. Graduate School. Theses and Dissertationseng
dc.rightsOpenAccess.eng
dc.rights.licenseThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License.
dc.titleDistributed frequent hierarchical pattern mining for robust and efficient large-scale association discoveryeng
dc.typeThesiseng
thesis.degree.disciplineComputer science (MU)eng
thesis.degree.grantorUniversity of Missouri--Columbiaeng
thesis.degree.levelDoctoraleng
thesis.degree.namePh. D.eng


Files in this item

[PDF]
[PDF]

This item appears in the following Collection(s)

[-] Show simple item record