Shared more. Cited more. Safe forever.
    • advanced search
    • submit works
    • about
    • help
    • contact us
    • login
    View Item 
    •   MOspace Home
    • University of Missouri-Kansas City
    • School of Graduate Studies (UMKC)
    • Theses and Dissertations (UMKC)
    • Theses (UMKC)
    • 2015 Theses (UMKC)
    • 2015 UMKC Theses - Freely Available Online
    • View Item
    •   MOspace Home
    • University of Missouri-Kansas City
    • School of Graduate Studies (UMKC)
    • Theses and Dissertations (UMKC)
    • Theses (UMKC)
    • 2015 Theses (UMKC)
    • 2015 UMKC Theses - Freely Available Online
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.
    advanced searchsubmit worksabouthelpcontact us

    Browse

    All of MOspaceCommunities & CollectionsDate IssuedAuthor/ContributorTitleSubjectIdentifierThesis DepartmentThesis AdvisorThesis SemesterThis CollectionDate IssuedAuthor/ContributorTitleSubjectIdentifierThesis DepartmentThesis AdvisorThesis Semester

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular AuthorsStatistics by Referrer

    Dynamic Model Generation and Semantic Search for Open Source Projects using Big Data Analytics

    Punyamurthula, Sravani
    View/Open
    [PDF] PunyamurthulaDynModGen.pdf (2.451Mb)
    Date
    2015
    Format
    Thesis
    Metadata
    [+] Show full item record
    Abstract
    Open source software is quite ubiquitous and caters to most common software needs developers come across. Many open source projects are considered better than their commercial equivalents as a larger pool of developers constantly improve it. However, one of the challenges to using open source is to manually analyze the code and understand the dependencies. Especially, for larger projects it is a very time consuming task. Hence, there is a strong demand for an automated process that could analyze the code and build an accurate model that represents the software system of the open source. The objective of this thesis is to provide a solution to this problem by building a framework that can extract the features, identify components, connectors from the open source and provide the user a way to search functionality. The first step of this process is to extract the metadata and dependency information from the source code using a call graph. A call graph is a directed graph that represents the execution logic of the program and helps with analyzing the relationships between various classes. The extracted data is then transformed using Natural language processing (NLP) [15] techniques like lemmatization. In the second step, the transformed data is semantically analyzed for feature extraction using Term Frequency Inverse Document Frequency (TF-IDF), synonym detection using Word2Vec [3] and component detection using Machine Learning dynamically. The dependency information extracted from the call graph is then used for identifying the connectors between the detected components. Also, the dependency information is used to build a class dependency matrix that is further used for identifying dependency based components. In the final step, ontology is used to represent the features, components, connectors, classes discovered in the previous step and the relationships between them. The generated ontology can be queried to search for functionality using the SPARQL [5] query language. Protégé [4] is used for visualization of the generated ontology. The proposed solution is built on Spark, a parallel processing framework and provides a fully automated and scalable model for representing the software. In this thesis, we have analyzed two open source projects Apache Solr and Apache Lucene as a case study. Apache Solr is built using Apache Lucene core library. The results from Apache Solr analysis are compared to the manual evaluation of software architecture by experts. We have observed that 90% of the features identified in the manual analysis are recovered in the automated approach and also many new features are discovered. This thesis also analyzes the dependencies between the components detected for Apache Solr and Apache Lucene projects. From this analysis of the two systems, we have observed that Apache Solr is highly dependent on Apache Lucene.
    Table of Contents
    Introduction -- Background and related work -- Proposed framework -- Results and evaluation -- Conclusion and future work
    URI
    https://hdl.handle.net/10355/48352
    Degree
    M.S.
    Thesis Department
    Computer Science (UMKC)
    Collections
    • 2015 UMKC Theses - Freely Available Online
    • Computer Science and Electrical Engineering Electronic Theses and Dissertations (UMKC)

    Send Feedback
    hosted by University of Missouri Library Systems
     

     


    Send Feedback
    hosted by University of Missouri Library Systems