Statistical word recognition system

State-of-the-art word recognition systems are based on a statistical framework. They model the sequential behavior of the handwriting process. More precisely, the input image is decomposed into vertical frames, then fed sequentially to the WRS. Visual features are extracted from each frame . Then, the WRS selects the most probable word from the given lexicon, the goal of the WRS is to find the word wˆ which maximize P (w|O), i.e. the probability of the word w given the input sequence of features O = {o1, o2,…, oT }. This probability can be written in multiple forms using Bayes’s theorem . In the last line, P (w) represents the prior probabilities of a given word w based on the language model. In this work, all words belonging to the lexicon are given the same probability because we are limiting ourselves to single word recognition. Optionally, a lexicon reduction module can be added, to dynamically select specific word hypotheses based on the query word image, in order to improve the recognition rate and/or the processing speed. In the following, we will detail the two most competitive models for handwriting recognition, namely the hidden Markov models (HMM), and the recurrent neural networks (RNN).

Hidden Markov models

Hidden Markov model (HMM) is a statistical model used for sequential data (Fink, 2008). HMM has the ability to both replicate the generation process of the data and to segment it into some meaningful unit. They describe a double stochastic process. The first stage is discrete, a random variable models the state of a system through time and takes on values from a finite number of states. The probability of the future state only depends on its immediate predecessor, therefore, there is only a first order dependency. In the second stage, an emission is generated for every time step, whose probability distribution is dependent only on the current state. The model is named ’hidden’ Markov model because the states are not directly observable.

Recurrent neural networks

Recurrent neural network (RNN) is a class of neural network (NN) where connections between neurons form a directed cycle. This allows the model to keep a ‘memory’ of its previous state and therefore to make use of past context. This ability of the model is important for the task of handwriting recognition, where the context plays an important role. Also, as most of the neural networks, this model is discriminative, unlike standard HMMs which are generative. It therefore outperforms HMMs in many recognition applications. The current state-of-the-art method in most handwriting recognition task is based on the combination of long short-term memory (LTSM) layer (Gers et al., 2003) and the connectionist temporal classification (CTC) layer (Graves et al., 2009).

The LTSM layer is made of nodes with specific architecture called memory block, able to preserve contextual information over a long range of time. Each memory block contains a memory cell, and its interaction with the rest of the network is controlled by three multiplicative gates, namely: an input gate, an output gate and a forget gate. For example, if the input gate is closed, the block input has no influence on the memory cell. Similarly, the output gate has to be opened so the rest of the network can access the cell activation. The forget gate scales the recurrent connection of the cell. The gates behavior is controlled by the rest of the network .

For the specific task of handwriting recognition, the ‘past’ and ‘future’ context is necessary for better performance. Therefore, the bidirectional LSTM (BLSTM) layer is used, where one LSTM layer processes the sequence in the forward direction, while another layer processes it in the backward direction.

Then, the connectionist temporal classification (CTC) layer is plugged at the output of the BLSTM layer. The CTC layer has been designed for sequence labeling task. It is trained to predict the probability P (w|O) of an output character sequence, i.e., a word w, given an input sequence O, making the training discriminative. Its activation function provides the probability to observe each character for each sequence time. One of the features of CTC is its ability to be trained with unsegmented data similarly to HMMs.

Features and strategies for lexicon reduction

Lexicon reduction is a high-level task, where word hypotheses are pruned from the lexicon. As it is used as a pre-processing step before the actual recognition, it must have a low computational overhead. Therefore, most of the methods rely on high-level features to take fast decisions. In the following, LR approaches are detailed for Latin and Arabic scripts, as well as for specific document types.

Latin script

Lexicon reduction can be performed by comparing the optical shapes of the lexicon words to improve recognition speed. When the word’s optical shape is used, the simplest criterion for lexicon reduction, but still efficient, is word length, as this makes it easy to discriminate between long words and short words. More refined knowledge about the word’s shape can also be used. Zimmermann and Mao (1999) propose the concept of key characters, which are characters that can be accurately identified without a full contextual analysis. Character class specific geometrical properties are used, such as the average number of horizontal transitions, normalized vertical position and the normalized height. Lexicon reduction is performed by considering only the lexicon entries that match the regular expression generated by the key characters. They also estimate the letter count in a word using a neural network for further reduction. A similar approach is proposed by Palla et al. (2004), where regular expressions are built from the detection of ascenders and descenders in the query word image .

Bertolami et al. (2008) propose mapping each character of a word to a shape code. There are fewer shape codes than characters, as they only discriminate between characters based on their ascenders/descenders and basic geometry and topology. The mapping is performed by a hidden Markov model (HMM), which outputs the »n » best shape-code sequences for a query word. The lexicon is reduced by considering only the words that correspond to one of the shape-code sequences. Kaufmann et al. (1997) propose a holistic approach, using the quantified feature vectors extracted sequentially from the word image. These vectors are used by the HMM recognizer, so there is no overhead for the extraction of these features. A model is created for each class of the lexicon, and the word hypotheses are ranked according to the distance between their models and the features of the query word. Several other holistic approaches for lexicon reduction extract a string-based descriptor for each shape, which is further matched using dynamic programming, the lexicon entries with the smallest edit distances being considered part of the reduced lexicon. Madhvanath et al. (2001) holistic approach is based on using downward pen-strokes descriptors. These pen strokes are extracted from the word shape using a set of heuristic rules, and categorized according to their positions relative to the baseline. Then, lexicon reduction is performed by matching the word descriptors to the ideal descriptors extracted from the lexicon’s ASCII string. Carbonnel and Anquetil (2004) compared two lexicon-reduction strategies, one based on lexicon indexing and the other on lexicon clustering. Using ascender/descender-based shape descriptors, the indexing approach showed better performance.

Le rapport de stage ou le pfe est un document d’analyse, de synthèse et d’évaluation de votre apprentissage, c’est pour cela chatpfe.com propose le téléchargement des modèles complet de projet de fin d’étude, rapport de stage, mémoire, pfe, thèse, pour connaître la méthodologie à avoir et savoir comment construire les parties d’un projet de fin d’étude.

Table des matières

INTRODUCTION
0.1 Handwriting recognition for paper-based documents
0.2 Document scripts
0.3 Problem statement
0.3.1 Text visual appearance
0.3.2 Simulating the human reading process
0.4 Contributions
0.5 Context of the thesis
0.6 Outline of the thesis
CHAPTER 1 LITERATURE REVIEW
1.1 Statistical word recognition system
1.1.1 Hidden Markov models
1.1.2 Recurrent neural networks
1.2 Features and strategies for lexicon reduction
1.2.1 Latin script
1.2.2 Arabic script
1.2.3 Specific document knowledge
1.3 Features for sequential word recognition
1.3.1 Distribution features
1.3.2 Concavity features
1.3.3 Visual-descriptor-based features
1.3.4 Automatically learned features
1.4 Current limitations
1.4.1 Limitation 1: Lack of descriptors tailored for Arabic script lexicon reduction
1.4.2 Limitation 2: Lack of methods to identity relevant features for handwriting recognition
CHAPTER 2 GENERAL METHODOLOGY
2.1 Research objectives
2.1.1 Objective 1: to design a descriptor for Arabic subword shape with application to LR
2.1.2 Objective 2: to efficiently embed all Arabic word features into a descriptor with application to LR
2.1.3 Objective 3: to efficiently evaluate features for the task of handwriting
recognition
2.2 General approach
2.2.1 Descriptor design for Arabic lexicon reduction
2.3 Feature evaluation for handwriting recognition
CHAPTER 3 ARTICLEI-W-TSV: WEIGHTED TOPOLOGICAL SIGNATURE VECTOR FOR LEXICON REDUCTION IN HANDWRITTEN ARABIC DOCUMENTS
3.1 Introduction
3.2 Features of ancient and modern Arabic documents for lexicon reduction
3.3 Related works
3.4 Weighted topological signature vector (W-TSV)
3.4.1 Background
3.4.2 Generalization to weighted DAG
3.4.3 Stability and robustness of the W-TSV
3.4.4 Proposed fast computation
3.5 Proposed Arabic subword graph representation
3.6 Experiments
3.6.1 Databases
3.6.2 Experimental protocol
3.6.3 Results and discussion
3.6.4 Comparison with other methods
3.7 Conclusion
3.8 Acknowledgments
3.9 Appendix – Archigraphemic subword shape classifier
CHAPTER 4 ARTICLE II – ARABIC WORD DESCRIPTOR FOR HANDWRITTEN
WORD INDEXING AND LEXICON REDUCTION
4.1 Introduction
4.2 Pixel descriptor
4.2.1 Pattern filters and pixel descriptor formation
4.2.2 Structural interpretation
4.3 Structural descriptor
4.4 Arabic Word Descriptor
4.5 Lexicon reduction system
4.5.1 System overview
4.5.2 Performance measure
4.6 Experiments
4.6.1 Databases
4.6.2 Experimental protocol
4.6.3 Lexicon reduction performance
4.6.4 Analysis of the ADW formation steps
4.6.5 Combination with a holistic word recognition system
4.6.6 Combination with an analytic word recognition system
4.6.7 Comparison with other methods
4.7 Conclusion
4.8 Acknowledgments
CHAPTER 5 ARTICLE III – FEATURE EVALUATION FOR OFFLINE HANDWRITING RECOGNITION USING SUPERVISED SYSTEM WEIGHTING
5.1 Introduction
5.2 Related work
5.3 Feature evaluation framework overview
5.4 RNN-based reference recognition system
5.4.1 Long short-term memory (LSTM) layer
5.4.2 Connectionist temporal classification (CTC) layer
5.5 Word image features
5.5.1 Distribution features
5.5.2 Concavity feature
5.5.3 Visual descriptor-based feature
5.5.4 Automatically learned feature
5.6 Feature evaluation using agent combination
5.6.1 Supervised agent weighting
5.6.2 Score definition
5.7 Experimental setup
5.7.1 Databases
5.7.2 Experimental protocol
5.8 Results and discussion
5.8.1 Optimization results
5.8.2 Feature evaluation
5.8.3 Combination comparison
5.9 Conclusion
5.10 Acknowledgments
CHAPTER 6 GENERAL DISCUSSION
6.1 Shape indexing based lexicon reduction framework for Arabic script
6.2 Holistic descriptor of Arabic word shape for lexicon reduction
6.3 Holistic Arabic subword recognition
6.4 Feature evaluation for handwriting recognition
6.5 Benchmarking of popular features for handwriting recognition
GENERAL CONCLUSION