[-] Show simple item record

dc.contributor.advisorLee, Yugyung, 1960-eng
dc.contributor.advisorZheng, Yongjieeng
dc.contributor.authorPunyamurthula, Sravani
dc.date.issued2015
dc.date.submitted2015 Falleng
dc.descriptionTitle from PDF of title page, viewed on March 23, 2016en
dc.descriptionThesis advisors: Yugyung Lee and Yongjie Zhengen
dc.descriptionVitaen
dc.descriptionIncludes bibliographical references (pages 86-87)en
dc.descriptionThesis (M.S)--School of Computing and Engineering. University of Missouri--Kansas City, 2015en
dc.description.abstractOpen source software is quite ubiquitous and caters to most common software needs developers come across. Many open source projects are considered better than their commercial equivalents as a larger pool of developers constantly improve it. However, one of the challenges to using open source is to manually analyze the code and understand the dependencies. Especially, for larger projects it is a very time consuming task. Hence, there is a strong demand for an automated process that could analyze the code and build an accurate model that represents the software system of the open source. The objective of this thesis is to provide a solution to this problem by building a framework that can extract the features, identify components, connectors from the open source and provide the user a way to search functionality. The first step of this process is to extract the metadata and dependency information from the source code using a call graph. A call graph is a directed graph that represents the execution logic of the program and helps with analyzing the relationships between various classes. The extracted data is then transformed using Natural language processing (NLP) [15] techniques like lemmatization. In the second step, the transformed data is semantically analyzed for feature extraction using Term Frequency Inverse Document Frequency (TF-IDF), synonym detection using Word2Vec [3] and component detection using Machine Learning dynamically. The dependency information extracted from the call graph is then used for identifying the connectors between the detected components. Also, the dependency information is used to build a class dependency matrix that is further used for identifying dependency based components. In the final step, ontology is used to represent the features, components, connectors, classes discovered in the previous step and the relationships between them. The generated ontology can be queried to search for functionality using the SPARQL [5] query language. Protégé [4] is used for visualization of the generated ontology. The proposed solution is built on Spark, a parallel processing framework and provides a fully automated and scalable model for representing the software. In this thesis, we have analyzed two open source projects Apache Solr and Apache Lucene as a case study. Apache Solr is built using Apache Lucene core library. The results from Apache Solr analysis are compared to the manual evaluation of software architecture by experts. We have observed that 90% of the features identified in the manual analysis are recovered in the automated approach and also many new features are discovered. This thesis also analyzes the dependencies between the components detected for Apache Solr and Apache Lucene projects. From this analysis of the two systems, we have observed that Apache Solr is highly dependent on Apache Lucene.eng
dc.description.tableofcontentsIntroduction -- Background and related work -- Proposed framework -- Results and evaluation -- Conclusion and future worken
dc.format.extentxi, 88 pagesen
dc.identifier.urihttps://hdl.handle.net/10355/48352
dc.subject.lcshOpen source softwareen
dc.subject.lcshBig dataen
dc.subject.lcshMachine learningen
dc.subject.lcshSoftware architectureen
dc.subject.otherThesis -- University of Missouri--Kansas City -- Computer scienceen
dc.titleDynamic Model Generation and Semantic Search for Open Source Projects using Big Data Analyticseng
dc.typeThesiseng
thesis.degree.disciplineComputer Science (UMKC)en
thesis.degree.grantorUniversity of Missouri--Kansas Cityen
thesis.degree.levelMastersen
thesis.degree.nameM.S.en


Files in this item

[PDF]

This item appears in the following Collection(s)

[-] Show simple item record