Character Representation
The main issue in pattern recognition is to compare observations in order to classify models and determine their similarity. The classifier role in a pattern recognition system is to compare unknown observations and measure their similarity to models of known classes. The similarity measure may be a distance, cost or a probability, which is used todecide to which known class the observation belongs to. However, image pixel information has drawbacks and they are not usually applied directly on classifiers. The first aspect is related to the dimensionality curse, the higher the dimensionality of the information used in the classifier for leaming, the less general the models produced are. The second aspect is that the information in the image is vulnerable to rotation, translation and size changes.To overcome these problems we apply mathematical operations to transform the pixel information in the image to other type of information. This process is called feature extraction. Features can be extracted in different ways, Trier et al. presented in [22] asurvey on many feature extraction techniques. Bailey and Srinath presented in a feature extraction methodology based on orthogonal moments, applying it on handwritten digits. Gader discussed in [24] a methodology to extract features using zoning, experimenting also on handwritten digits. Oh and Suen proposed in [25] features based on distance measurements, applying them on handwritten characters. Heutte et al. discussed in [26] a feature set based on statistical and structural features, aiming at handwritten characters recognition.
Feature Subset Selection
Feature subset selection (FSS) aims to optimize the classification process, eliminating irrelevant features from the representation. Three aspects are changed by FSS. It first reduces the cost to extract features from unknown observations during classification. It also may improve classification accuracy, as correlated features may be eliminated. This is supported by the generalization power of smaller feature sets. The third aspect is that by reducing feature number we have a higher reliability on performance estimates. Kudo and Sklansky compared in [35] many techniques for FSS, including sequential forward selection (SFS) [36], sequential forward floating selection (SFFS) , branch and bound (BAB) and genetic algorithms. This work indicates that GA based techniques are suitable for FSS where a large number of features is considered, where large is a value higher than 50, and the algorithm aims to search for a compromise between dimensionality and classification accuracy. This statement is supported by the search mechanism inherent to GAs, which improves population by searching for good13 building blocks in an efficient way. Also, GAs can deal with problems with nonmonotonie functions to evaluate the feature set quality, as they are based on a random approach for initialization and the building blocks help to avoid local optimal solutions. If one compares the GA building blocks search mechanism with a sequential algorithm, such as SFS, this GA property for FSS becomes clear. A sequential method usually treats features one by one, like SFS that searches the feature that will better improve the discriminating power of the current set, until it cannot be further improved. The problem is that this does not address feature correlation in a way that given a set P, 1P 1= n, the best feature f selected for the set P’= P + {/} is not necessarily a good feature for the set P », IP »I = n + 2 . Sorne algorithms include the possibility to solve this problem byallowing the removal of a given number of features after sorne iterations if this improves the set, like SFFS, which allows this backtrack operation.However, this is not true for large feature sets. The backtrack operation may not be able to address completely this problem, as the number of features to manipulate at once may be overwhelming to the algorithm. On the other hand, GA is suitable for this kind of manipulation, as mating two different individuals will result in the inclusion and removal of many features at the same time. As this operation is made on many individuals, which are later selected and mated according to their fitness, i.e., how good the feature set they represent, GAs can find better solutions on large problems. This ability to produce subsets that are very different in just one pass allows GAs to perform better than algorithms like SFFS and SBFS.
Solution Over-Fit
Classifiers that are trained in severa! iterations, such as MLP classifiers, suffer from an effect known as over-fit. Due to over-fit, the classifier memorizes the training data set, instead of generalizing for unknown observations. To avoid the phenomena, a validation stage is performed using unknown observations.The same effect has been observed in the literature when optimizing classification approaches using wrapped classifiers. Thus, solutions in the approximation set are specialized to the optimization data set. Reunanen demonstrated solution over-fit to compare sequential and floating FSS algorithms in [46]. In this context, solutions are evolved based on the accuracy measured over an optimization data set. Sorne solutions have been proposed in the literature to avoid the over-fit phenomena during MOGA based classifier optimization. Loughrey and Cunningham discussed an early stoping approach to remove over-fit from solutions optimized by GAs [47] and simulated annealing [48]. Their iterative approach requires two stages. During the first stage, solutions are optimized and validated to determine at which generation solutions start to over-fit to the optimization data set, i.e., the generation where the algorithm has to stop earlier. In order to avoid solution over-fit, the second stage uses this information to optmize solutions until the previously calculated early stopping generation. Llorà et al. used in [49] an approach that analyzed the optimized Pareto-front to determine a rupture point. The rupture point divides the Pareto-front in two segments, one over-fitted segment, and another segment where solutions provide higher accuracy on unknown observations. The same approach was also used by Mansilla in [50]. Another known solution to this problem is related to a validation process. Instead of selecting solutions using the traditional approach, i.e. based on the accuracy calculated during the optimization process, a validation data set is used to select solutions in the optimized Pareto-front. This approach was used in [51, 9, 52] and provided better results than selecting solutions using the traditional approach.
Parallel MOGAs
To speedup the process and improve the solutions quality, studies have been made to use MOGAs in parallel environments. Three primary models are considered, the masterslave approach, the island mode! and the cellular mode!. These models employ concepts similar to those of parallel GAs. Cantu-paz presented in [64] a complete study on the issues of parallel GAs, where these three models are further detailed. The master-slave model approach [65] is suitable to speed up the search on parallel MOGAs. This is due to the fact that the algorithm does not change, it only assign the evaluation of the objective functions to slave nodes, while the genetic operations are made on a master node. Oliveira et al. in [30] used a master-slave parallel MOGA for feature subset selection. The parallel cellular model is interesting for machines featuring processor arrays. However, the high cost associated to these machines reduces the interest on this specifie research field, which focuses on specifie problems that require such processor organization.The island model, also called distributed mode!, have seen an increase on research effort after the Beowulf cluster computer boom. This type of cluster computer is based on inexpensive workstations that are connected through a local network to exchange information and complete a task. Cantu-Paz discussed in [64] the benefits of the island models, posing questions on the differences between their sequential counterparts. He also presented a survey of these algorithms in [66]. Hiroyasu et al. discuss in [67, 68] two different approaches for MOGAs based on island models. Zhu and Leung proposed in [69] an asynchronous MOGA with a self-adjustable objective space partitioning. A24 similar concept is developed by Streichert et al. in [70], using clustering to partition ther objective function space. Our interest lies on the master-slave model, which provides speedup to the optimization process, with no change to the optimization algorithm behavior
Intelligent Feature Extraction
Human experts are traditionally responsible for choosing the feature vector used in classification systems. This vector is most often determined using domain knowledge and domain context on a trial-and-error basis. This section details the Intelligent Feature Extraction (IFE) methodology, which uses the domain knowledge and domain context in an approach formulated as a MOOP to genetically evolve a candidate solution set. The goal of IFE is to help the human expert define representations (feature vectors) in the context of isolated handwritten symbols, using a wrapper approach with a fast training classifier. IFE models isolated handwritten symbols as features extracted from specifie foci of attention on images using zoning. This is a strategy known to provide better recognition results than the extraction of features from the whole image [28, 27, 76]. Two operators are used to generate representations with IFE: a zoning operator to define foci of attention over images, and a feature extraction operator to apply transformations in zones. The choice of transformations for the feature extraction operator constitutes the domain knowledge introduced by the human expert. To obtain representations for specifie PR problems, such as digits or uppercase letters, the domain context is introduced in the form of actual observations in the optimization data set used to evaluate and compare solutions. Hence, IFE optimizes the zoning operator to the selected feature extraction operator.
|
Table des matières
ABSTRACT
SOMMAIRE
RÉSUMÉ
ACKNOWLEDGMENTS
TABLE OF CONTENTS
LIST OF TABLES
LIST OF FIGURES
LIST OF ALGORITHMS
LIST OF ABREVIATIONS AND NOTATIONS
INTRODUCTION
CHAPTER 1 STATE OF THE ART
1.1 State of the Art in Pattern Recognition
1.2 Character Representation
1.1.1 Zoning
1.1.2 Isolated Character Recognition
1.1.3 Feature Subset Selection
1.1.4 Ensemble of Classifiers
1.1.5 Solution Over-Fit
1.1.6 State ofthe Art in Multi-Objective Genetic Algorithms
1.2.1 Definitions
1.2.2 Multi-Objective Genetic Algorithms
1.2.3 Parallel MOGAs
1.2.4 Performance Metrics
1.2.5 Discussion
CHAPTER 2 CLASSIFICATION SYSTEMS OPTIMIZATION
2.1 System Overview
2.2 Intelligent Feature Extraction
2.2.1 Zoning Operators
2.2.1.1 Divider Zoning Operator
2.2.1.2 Hierarchical Zoning Operator
2.2.2 Feature Extraction Operator
2.2.3 Candidate Solution Evaluation
2.3 Feature Subset Selection
2.4 EoC Optimization
2.5 Optimization Algorithm Requirements
2.6 Discussion
CHAPTER3 MULTI-OBJECTIVE OPTIMIZATION MEMETIC ALGORITHM
3.1 Algorithm Concepts
3.2 Algorithm Discussion
3.3 MOMA Validation Tests with the IFE
3.4 Algorithmic Parameters
3.5 Validation and Tests
3.3.1 IFE Test Results
3.3.2 Discussion
CHAPTER 4 VALIDATION STRATEGIES TO CONTROL SOLUTION
4.1 Adapting Optimization Algorithms
4.2 OVER-FIT
CHAPTER5 STOPPING CRITERION
5.1 Concepts
5.2 Stopping MOGAs
5.3 Discussion
CHAPTER 6 VALIDATION EXPERIMENTS ON ISOLATED HANDWRITTEN DIGITS
6.1 Experimental Protocol
6.2 Divider Zoning Experimental Results
6.2.1 IFE and EoC Experimental Results
6.2.2 FSS Experimental Results
6.2.3 Stopping Criterion Experimental Results
6.2.4 Experimental Results Discussion
6.3 Hierarchical Zoning Experimental Results
6.3.1 IFE and EoC Experimental Results
6.3.2 FSS Experimental Results
6.3.3 Stopping Criterion Experimental Results
6.3.4 Experimental Results Discussion
6.4 Discussion
CHAPTER 7 EXPERIMENTS ON ISOLATED HANDWRITTEN
7.1 UPPERCASE LETTERS
7.2 Experimental Protocol
7.2.1 Experimental Results
7.2.2 IFE and EoC Experimental Results
7.2.3 FSS Experimental Results
7.3 Stopping Criterion Experimental Results
Discussion
CONCLUSIONS
APPENDIX 1 Fast elitist non-dominated sorting genetic algorithm- NSGA II
APPENDIX 2 Statistical analysis
APPENDIX 3 Rejection strategies
BIBLIOGRAPHY
Télécharger le rapport complet