[-] Show simple item record

dc.contributor.advisorLee, Yugyung, 1960-en
dc.contributor.advisorZheng, Yongjieen
dc.contributor.authorKrishnan, Malathy
dc.date.issued2015
dc.date.submitted2015 Fallen
dc.descriptionTitle from PDF of title page, viewed on March 23, 2016en
dc.descriptionThesis advisors: Yugyung Lee and Yongjie Zhengen
dc.descriptionVitaen
dc.descriptionIncludes bibliographical references (pages 75-76)en
dc.descriptionThesis (M.S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2015en
dc.description.abstractThe open source code base has increased enormously and hence understanding the functionality of the projects has become extremely difficult. The existing approaches of feature discovery that aim to identify functionality are typically semi-automatic and often require human intervention. In this thesis, an innovative framework is proposed for automatic discovery of features and the respective components for any open source project dynamically using Machine Learning. The overall goal of the approach is to create an automated and scalable model which produces accurate results. The initial step is to extract the meta-data and perform pre-processing. The next step is to dynamically discover topics using Latent Dirichlet Allocation and to form components optimally using K-Means. The final step is to discover the features implemented in the components using Term Frequency - Inverse Document Frequency algorithm. This framework is implemented in Spark that is a fast and parallel processing engine for big data analytics. ArchStudio tool is used to visualize the features to class mapping functionality. As a case study, Apache Solr and Apache Hadoop HDFS are used to illustrate the automatic discovery of components and features. We demonstrated the scalabilty and the accuracy of our proposed model compared with a manual evaluation by software architecture experts as a baseline. The accuracy is 85% when compared with the manual evaluation of Apache Solr. In addition, many new features were discovered for both the case studies through the automated framework.eng
dc.description.tableofcontentsIntroduction -- Background and related work -- Framework of feature-based analysis -- Component identification and feature discovery -- Implementation -- Results and evaluation -- Conclusion and future worken
dc.format.extentx, 78 pagesen
dc.identifier.urihttps://hdl.handle.net/10355/48345
dc.subject.lcshMachine learningen
dc.subject.lcshBig dataen
dc.subject.lcshOpen source softwareen
dc.subject.otherThesis -- University of Missouri--Kansas City -- Computer scienceen
dc.titleFeature-based Analysis for Open Source using Big Data Analyticseng
dc.typeThesiseng
thesis.degree.disciplineComputer Science (UMKC)en
thesis.degree.grantorUniversity of Missouri--Kansas Cityen
thesis.degree.levelMastersen
thesis.degree.nameM.S.en


Files in this item

[PDF]

This item appears in the following Collection(s)

[-] Show simple item record