A Semantic Approach for Automatic Recovery of Software Architecture
No Thumbnail Available
Authors
Meeting name
Sponsors
Date
Journal Title
Format
Thesis
Subject
Abstract
Open source projects have been continuously growing in popularity. As a result, a number of open source projects begin to play an important role in current software development. In practice, limited assistance has been provided on searching and reusing open source software systems. The limitation is primarily due to the lack of an automatic approach to recovering architecture models from source code. In particular, the increasing size of most open source systems makes it a challenge to manually recover the architecture from source code. Thus, there is a strong demand for an automatic approach for model building. The recovered model can subsequently offer users the ability to search through large amounts of source code. This thesis presents a semantic approach to automatically recovering the architecture from a source code repository. It leverages the information such as functional similarity between code entities (e.g. classes) and applies a machine learning technique to cluster source code into architecture components – an essential activity in architecture recovery. Specifically, the approach includes three steps: feature extraction, component clustering, and architecture refinement. 1) Feature extraction analyzes source code, identifies functional specification (i.e. features) from the metadata (e.g. names) of code entities, and creates a model that captures the significance (e.g. frequency) of each feature in a specific code entity. 2) Component clustering examines the generated model of feature extraction, and applies K-Means clustering, a machine learning technique to group similar code entities into architecture components. The similarity is calculated based on the features that the code entities are related to. 3) Architecture refinement further modifies the recovered architecture based on the degree to which the extracted components interact. During this step, components merge or split may occur. The overall goal of the approach is to reduce cost and increase accuracy of recovering software architecture in software development. We applied the approach to recover the architecture of the Hadoop Distributed File System as a case study.
Table of Contents
Introduction -- Background and related work -- Proposed model -- Features extraction -- Clustering for software architecture -- Refinement -- Evaluation and results -- Conclusion and future work
DOI
PubMed ID
Degree
M.S.
