Detecting model refactoring opportunities usig heuristic search

Model Driven Engineering (MDE) is an approach to software development by which software is specified, designed, implemented and deployed through a series of models (Bull, 2008). Hence building appropriate models, evolving them and maintaining their quality are key activities when implementing an MDE approach.

Model maintenance is defined as different modifications made on a model in order to improve its quality, adding new functionalities, etc (Brown et al., 1998). This effort needs a lot of time and money from the total project cost. Thus, it is really important to propose automated solutions to improve model quality.

Different automated maintenance solutions were proposed in the literature (Kessentini et al., 2010; Khomh et al., 2009; Liu et al., 2009; Marinescu, 2004; Moha et al., 2010). The majority of these works are concerned with the detection and correction of bad design fragments, called design defects or refactoring opportunities (Fowler and Beck, 1999). Such defects include for example large classes in UML, long parameter list, etc. Detecting and fixing design defects is, to some extent, a difficult, time-consuming, and manual process (Fowler and Beck, 1999).

To insure detection of design defects, several approaches have been already proposed (Khomh et al., 2009; Liu et al., 2009; Marinescu, 2004). The large portions of these studies are based on declarative rule definition. These rules are manually defined to identify the symptoms that characterize a defect. These symptoms are described using metrics, structural, and/or lexical information. For example, large classes have different symptoms like the high number of attributes, relations and methods that can be expressed using quantitative metrics. However, in an exhaustive scenario, the number of possible defects to be manually characterized with rules can be very large. For each defect, rules that are expressed in terms of metric combinations need substantial calibration efforts to find the right threshold value for each metric, threshold above which a defect is said to be detected.

Besides, one can notice the availability of defect repositories in many companies, where defects in projects under development are manually identified, corrected and documented. Despite its availability, this valuable knowledge is not used to mine regularities about defect manifestations. These regularities could be exploited both to detect defects, and to correct them.

Starting from this observation, we propose, in this paper, an approach to overcome some of the above mentioned limitations. Our approach is based on the use of defect examples generally available in defect repositories of software developing companies. In fact, we translate regularities that can be found in such defect examples into detection rule solutions. Instead of specifying rules manually for detecting each defect type, or semi-automatically using defect definitions, we extract these rules from instances of design defects. This is achieved using Genetic Programming (GP). Such proposal is very beneficial because: it does not require to define the different defect types, but only to have some defect examples; it does not require an expert to write rules manually; it does not require to specify the metrics to use or their related threshold values.

Basic concepts

To better understand our contribution, it is important to clearly define some relevant concepts to our proposal, including design defects and software metrics.

Design defects

We focus in this paper on the detection of a specific type of refactoring opportunities to improve model quality: design defects. Design defects, also called design anomalies, refer to design situations that adversely affect the development of models (Brown et al., 1998). Different types of defects, presenting a variety of symptoms, have been studied in the intent of facilitating their detection and suggesting improvement solutions.

In (Fowler and Beck, 1999), they define 22 sets of symptoms of common defects. These include large classes, feature envy, long parameter lists, and lazy classes. Each defect type is accompanied by refactoring suggestions to remove it. Brown et al. (Brown et al., 1998) define another category of design defects that are documented in the literature, and named anti-patterns.

In our approach, we focus on the detection of some defects that can appear in the model level and especially in class diagram. We choose from (Fowler and Beck, 1999) three important defects that can be detected in the model level: 1) Blob which is found in designs where one large class monopolizes the behavior of a system (or part of it), and other classes primarily encapsulate data. 2) Functional decomposition: it occurs when a class is designed with the intent of performing a single function. This is found in model (class diagram) produced by non-experienced object-oriented developers. 3) Poor usage of abstract class: it is happen when abstract classes are not used widely in the application design.

Quality metrics

Quality metrics provide useful information that help assessing the level of conformance of a software system to a desired quality such as evolvability and reusability. Metrics can also help detecting some design defects in software systems. The most widely used metrics are the ones defined by Genero et al. (Genero et al., 2002). For our defect detection process, we select from this list of metrics only those that can be calculated on models (class diagram). These metrics include:
1. Number of associations (Naccoc): the total number of associations.
2. Number of aggregations (Nagg): the total number of aggregation relationships.
3. Number of dependencies (Ndep): the total number of dependency relationships.
4. Number of generalizations (Ngen): the total number of generalisation relationships (each parent-child pair in a generalization relationship).
5. Number of aggregations hierarchies (NAggH): the total number of aggregation hierarchies.
6. Number of generalization hierarchies (NGenH): the total number of generalisation hierarchies.
7. Maximum DIT (MaxDIT): the maximum of the DIT (Depth of Inheritance Tree) values for each class in a class diagram. The DIT value for a class within a generalisation hierarchy is the longest path from the class to the root of the hierarchy.
8. Number of attributes (NA): the total number of attributes.
9. Number of methods (LOCMETHOD): the total number of methods.
10. Number of private attributes (NPRIVFIELD) : number of private attributes in a specific class

Our detection solution selects, from this exhaustive list, the best metrics combination that detects different defect types. In the next section, we emphasize the specific problems that are addressed by our detection approach.

Le rapport de stage ou le pfe est un document d’analyse, de synthèse et d’évaluation de votre apprentissage, c’est pour cela chatpfe.com propose le téléchargement des modèles complet de projet de fin d’étude, rapport de stage, mémoire, pfe, thèse, pour connaître la méthodologie à avoir et savoir comment construire les parties d’un projet de fin d’étude.

Table des matières

INTRODUCTION
CHAPTER 1 LITERATURE REVIEW
1.1 Basic concepts
1.1.1 Design defect
1.1.2 Refactoring
1.2 Detection of defects
1.2.1 Detection in source code level
1.2.2 Detection in model level
1.3 Synthesis on detection
1.4 Correction of design defects
1.4.1 Traditional approaches to software refactoring
1.4.2 Search-based software refactoring approaches
1.5 Synthesis on correction
1.6 Limitations of existing works
CHAPTER 2 DETECTING MODEL REFACTORING OPPORTUNITIES USING
HEURISTIC SEARCH
2.1 Introduction
2.2 Basic concepts
2.2.1 Design defects
2.2.2 Quality metrics
2.3 Problem Statement
2.4 Heuristic Search for Model Refactoring
2.4.1 Overview
2.4.2 Heuristic Search Using Genetic Programming
2.4.3 Heuristic Search Adaptation
2.4.3.1 Individual Representation
2.4.3.2 Generation of an Initial population
2.4.3.3 Genetic Operators
2.4.3.4 Decoding of an Individual
2.5 Validation
2.5.1 Experimental settings
2.5.2 Results
2.6 Related Work
2.7 Conclusion
CHAPTER 3 A DESIGN DEFECT EXAMPLE IS WORTH A DOZEN DETECTION
RULES
3.1 Introduction
3.2 Background and Problem Statement
3.2.1 Design Defects
3.2.2 Software Metrics
3.2.3 Problem Statement
3.3 A Search Based Approach to Detecting Design Defects
3.3.1 Adaptation of the Genetic Algorithm to Design Defects Detection
3.3.2 Individual representation
3.3.3 Genetic Operators
3.3.3.1 Selection
3.3.3.2 Crossover
3.3.3.3 Mutation
3.3.4 Fitness function
3.4 Validation of the Approach
3.4.1 Research Questions
3.4.2 Experimental Setup
3.4.3 Results and discussion
3.5 Related work
3.6 Conclusion
CHAPTER 4 MODEL REFACTORING USING EXAMPLES: A SEARCH BASED
APPROACH
4.1 Introduction
4.2 Basic concepts
4.2.1 Model refactorings
4.2.2 Quality Metrics
4.2.3 Heuristic search
4.3 A heuristic search approach to model refactoring
4.3.1 Overview of the Approach
4.3.2 Adaptation of the genetic algorithm to model refactoring
4.3.3 Individual Representation
4.3.4 Genetic Operators
4.3.5 Decoding of an Individual
4.4 Implementation and experimental settings
4.4.1 Supporting tool
4.4.2 Research questions
4.4.3 Selected projects for the analysis
4.4.4 Measures of precision and recall
4.5 Results and discussion
4.5.1 Precision and recall
4.5.2 Stability
4.5.3 Effectiveness of our approach
4.5.4 Threats to validity
4.6 Related work
4.7 Conclusion and future work
CHAPTER 5 MODEL REFACTORING USING INTERACTIVE GENETIC
ALGORITHM
5.1 Introduction
5.2 Background
5.2.1 Class diagrams refactorings and quality metrics
5.2.2 Interactive Genetic Algorithm (IGA)
5.2.3 Related work
5.3 Heuristic Search Using Interactive Genetic Algorithm
5.3.1 Interactive Genetic Algorithm adaptation
5.3.2 Representing an individual and generating the Initial Population
5.3.3 Evaluating an individual within the Classic GA
5.3.4 Collecting and Integrating the Feedbacks from Designers
5.4 Experiments
5.4.1 Supporting Tool and Experimental Setup
5.4.2 Results and discussions
5.4.3 Threats to Validity
5.5 Conclusion and Future Work
CHAPTER 6 EXAMPLE-BASED MODEL REFACTORING USING MULTI
OBJECTIVE OPTIMIZATION
6.1 Introduction
6.2 Model Refactoring using multi Objective optimization
6.2.1 Approach Overview
6.2.2 NSGA-II for Model refactoring
6.2.2.1 NSGA-II overview
6.2.2.2 NSGA-II adaptation
6.3 Experimentations with the approach
6.3.1 Supporting Tools
6.3.2 Research questions
6.3.3 Experimental Setup
6.3.4 Results and discussion
6.4 Related Work
6.5 Conclusion
CONCLUSION