SAF-DL: Semantic Analysis Framework for Deep Learning Open Source Projects
Metadata[+] Show full item record
There are a lot of open source projects available on the internet. Specifically, due to the increasing interest of Deep Learning (DL), the number of DL open source projects is also increased. This project is motivated by utilizing the existing projects to develop either a new innovative project or create a better-refined version. In addition, these projects can be used to guide software developers or students to perform effective programming in their DL projects. The challenge is how to analyze the functionalities or features that are described in the source code of these projects. It is not easy to understand the semantics of the source code in these projects as the dependencies are intertwined deeply. As the complexity and scale of the projects become huge, it is not scalable to manually analyze the workflow or its semantics of these open source projects. This thesis proposed to build a semantic analytics framework, called SAF-DL, that aims (i) to analyze the sequences of operations and build a graph model, known as call-graph, in a given open source project, (ii) to cluster the similar functional paths in the call graphs using Machine Learning algorithms, (iii) to find the abstractions (clusters) of the function flows, (iv) to identify the semantics of the function flows, (v) to discover the workflow by analyzing their dependencies or similarity between the functional paths and between projects. The SAF-DL pipeline transformation from source code to the semantics of the workflow model was designed with Machine Learning and NLP techniques. In this thesis, Python/TensorFlow/Keras-based open source projects are analyzed in GitHub. A comparative analysis of models used to evaluate the effectiveness of discovery of code abstraction and workflow in the SAF-DL framework. The SAF-DL framework was implemented in Python Scikit-learn and tested using three open source projects. This thesis have demonstrated that the SAF-DL framework can be used in various applications such as search or retrieval of open source projects, source code to source code plagiarism detection, and automatic code or test case generation.
Table of Contents
Introduction -- Background and related work -- Proposed framework -- Results and evaluation -- Applications -- Conclusion and future work