Spatial data Imperfections
ย ย Two types of data are generally used to describe a spatial phenomenon: (1) qualitative data and (2) quantitative data. These data may be vague, imprecise, incomplete, contradictory, etc. (Dutta 1991). Works such as Smithson (1989), Fisher (1999a) and Mowrer (1999) proposed categorizations of the spatial objects as well as definitions and taxonomies of the spatial imperfection types. Other works such as Burrough (1996), Cohn and Gotts (1996a),Clementini and Di Felice (1997), Erwig and Schneider (1997), Tang (2004), Dilo (2006) and Reis et al. (2006) studied the possibilities of modeling the spatial objects with vague shapes and of computing their topological relationships. Finally, some researches such as Pfoser and Jensen (1999), Pfoser and Tryfona (2001) and Pfoser et al. (2005) were interested in modeling the imperfection types in spatio-temporal phenomena. Section 2.2.1 presents the principal taxonomies of spatial imperfection types. Section 2.2.2 focuses on the definition of principal terms used in the literature to express the various types of spatial data imperfections. Sections 2.2.3 and 2.2.4 present the levels of spatial data imperfections and principal strategies to manage it, respectively. Section 2.2.5 relates the spatial data imperfection questions to the transactional spatial databases. In the same way, Section 2.2.6 studied the forms of imperfections in spatial data warehouses. Section 2.2.7 is interested in the relation between the spatial data quality and spatial data imperfections.
Taxonomies of spatial imperfections
ย ย The definition of spatial imperfection types is a very complex question where different disciplines such as philosophy, sciences and technology can overlap each other. The objective of this section is to show the divergence of taxonomies of spatial data imperfections proposed in GIS and the spatial databases domain. These taxonomies refer to the background of any framework aiming at modeling a spatial imperfection type (Dilo 2006). Generally, the taxonomies organize spatial imperfection types by using generalization/specialization relationships. Devillers (2005) reviewed the principal taxonomies in this domain (Smithson 1989, Smets 1996, Worboys 1998a, Fisher 1999a, Hazarika and Cohn 2001, Smith 2001).Smithson (1989) considers the ignorance concept as the origin of any other type of spatial data imperfection (figure 2.1). Such a philosophical point of view finds its roots in the works of Socrate who limited the perfect knowledge to only one certainty: the ignorance. Using the reflexivity property, he considers the ignorance of this basic knowledge as a double ignorance. This idea was also reused by (Bรฉdard 1987) who introduced the notion of โmetauncertaintyโ: the uncertainty about uncertainty (cf. Section 2.2.3). Fisher (1999a) focuses, in his taxonomy, on the notion of uncertainty that appears differently for the well-defined objects and ill-defined ones. Two types of objects have been also distinguished by Smith (2001): bona fide (well-defined) objects and fiat (ill-defined) objects (see section 2.3.1). For the well-defined objects, the uncertainty is often modeled through the probabilities theory such as a confusion matrix which determines whether an object is ill-classified or not (Fisher 1999b). For the ill-defined objects, uncertainty refers to the ambiguity of the object definition as well as of thematic and/or spatial attributes. The latter case relates to a qualitative imperfection which occurs at the conceptual level
Terminology related to spatial data imperfections
ย ย In the literature, several terms have been used to express the different types of spatial data imperfection. In this section, we review the definitions of these terms.
โข Uncertainty: it can characterize the knowledge state about a given assertion (Smets 1996). It refers to the difficulty to determine whether a data is true or false. Uncertainty is considered as a root of different categorizations of spatial data imperfections (Smets 1996, Worboys 1998b, Fisher 1999a). It is presented as a generic imperfection that can be specialized into different forms such as the imprecision for quantitative data and the fuzziness for qualitative data (Bรฉdard 1987, Erwig and Schneider 1997). According to Bรฉdard (1987), the uncertainty can result from the intrinsic limitations of the modeling process (omission of details, omission of compatibility between cognitive and physical level, etc.)). It can also result from the gap between the geographic reality and its description. For example, this gap occurs when fiat spatial objects such as air pollution zones (i.e., regions with broad boundaries in the reality) are presented using crisp polygons. Uncertainty can appear at various levels and in different forms during the development process of a spatial database (see section 2.3.1). Then, the terms โimperfectionโ and โuncertaintyโ can be used interchangeably since the uncertainty includes different types of spatial imperfections. In section 2.3.1, we use the term โuncertaintyโ in order to respect the contributions of Bรฉdard (1987). However, in the remainder of this thesis, the term โimperfectionโ is generally preferred.
โข Error: it refers to the difference between the available value and another one considered as true (Goodchild 1995a, David and Fasquel 1997). The error can result from an inadequate calibration of the measurement device, an inadequate use of this device or an erroneousย application of the procedures using these measurements as input data. Then, erroneous measurements of the spatial phenomena are introduced as true values to be stored in the database. The error is also related to the concept of reliability. The reliability expresses the closeness of collected data to the reality observed (Azouzi 1999).
โข Imprecision: it refers to limitations on the granularity or resolution at which the observation is made, or the information is represented (Worboys 1998b). A data value is imprecise when it corresponds to an interval (e.g., the age of a person is between 35 and 45), a disjunction of values (e.g. the age of Jean can be is 35 or 36) or a negation of a given assertion (e.g. John do not have 35 years old) (Motro 1995). In the context of spatial data, the precision can be statistical when it refers to the dispersion around an average value (Mowrer 1999). It can be also numerical when it corresponds to the number of significant decimals given by a measurement device (Goodchild 1995a, Mowrer 1999). Statistical precision is generally computed through a probabilistic method using available measurements. It can also be given by computing an ellipse of error (Chrisman 1991). The error and imprecision are orthogonal concepts since the level of the first does not affect that of the second (Mowrer 1999, Duckham et al. 2001). For example, the observation โQuebec is in the north of Americaโ is more accurate and, at the same time, less precise than the statement โQuebec is in the United States ยป. The second statement is simply inaccurate.
โข Vagueness: according to Fisher (1999a), the vagueness is an inherent imperfection that characterizes the definitions of some concepts called vague (e.g. young person, bald person, large surface, North, South, etc.). The membership degree to a given vague concept cannot be computed using a binary logic (i.e., 0 or 1) because its definition is partially respected by elements involved in most cases. The vague concepts can be modeled using Fuzzy Logic (Zadeh 1965). Then, a membership degree is expressed as a value (i.e., belonging to the interval [0,1]) computed using a membership function that defines the vague concept. In the spatial domain, the vagueness is an inherent property of geometries of fiat spatial objects such as valleys, or oceans. It relates to the difficulty of distinguishing an object shape from its neighborhood. For example, an air pollution zone is a region with a vague shape because it is surrounded by a broad boundary rather the sharp one. Navratil and Frank (2006) consider that the vagueness of concepts entail ambiguous classification of spatial objects. Spatial vagueness can also characterise bona fide objects when there is an uncertainty about their locations. In this case, Hazarika and Cohn (2001) speak about โlocation vaguenessโ. Nonetheless, an object with a vague shape can be also vaguely located. Hazarika and Cohn (2001) do not correlate the shape vagueness to the difficulty of drawing a linear boundary for a given region (e.g. a lake). They consider the temporal data dimension that may affect certainty about the shapes of spatial objects. Accordingly, it is important to denote that shape vagueness is a more general notion than fuzziness. Fuzziness is generally associated to the problem of drawing linear boundaries for regions (Hazarika and Cohn 2001). However, the shape vagueness can also refer to the broadness of a line interior and/or boundary (Reis et al. 2006). In the same way, shape vagueness may occur for composed geometries that may contain uncertain parts in addition to certain ones (Schneider 1999). In this work, we are interested only in the shape vagueness for simple fiat objects without considering the temporal dimension. We use the term โshape vaguenessโ because it is more exhaustive than fuzziness to describe the shape imperfection of some geographic objects. Moreover, fuzziness is often correlated to the use of Fuzzy Logic (Zadeh 1965) to model the boundary broadness. Using this term can be falsely interpreted by assuming that we use Fuzzy Logic to realize the objectives of this thesis (which is not the case as explained later).
โข Ambiguity: it appears when different results are obtained using different classification methods for the same set of elements. In this context, broad boundaries can be considered as the result of an ambiguity to affect a set of spatial points to different object classes. Nonetheless, it is important to denote that ambiguity results from the classification process and not from an inherent property of the classes. It corresponds to an imperfection type occurred at the conceptual level defined in (Bรฉdard 1987). Ambiguity can affect the identification (being or not being such an entity?) or the categorization (Being an entity of type A or type B?) of a given object.
โข Discord: it appears when different conceptual schemas are proposed by different designers of a same geographic phenomenon. According to Van Oort (2006), each designer uses his proper terminology to define the spatial concepts in the database dictionary. He defines his specific โproduct ontologyโ. The existence of different product ontologies is a first discord type. In the same way, the database users have their specific terminologies and definitions (i.e. their own problem ontologies). Then, the heterogeneities between the product and problem ontologies present a second type of discord.
โข Indeterminacy: it occurs when a spatial object is ill-classified because its definition is ambiguous or coarsely described (Roy and Stell 2001). Indeterminacy is a reflexive, symmetric and transitive relation and is generally modeled through the theory of Rough Sets (Pawlak 1994).
โข Incompleteness: it refers to a lack of some relevant values and/or occurrences of spatial objects involved. It is generally defined as a partial description of a spatial phenomenon.
โข Inconsistency: it relates to the existence of logical contradictions in the same database (Worboys and Duckham 2004). For example, an implicit inconsistency can be deduced from the following premises:
Dijon contains 300000 inhabitants
A city of less 500000 inhabitants is not a big city
Dijon is a big city
Inconsistencies are generally managed through integrity constraints (Kainz 1995, Motro 1995, Cockcroft 1997, Normand 1999, Servigne et al. 2000, Pinet et al. 2004). Inconsistencies arise when integrity constraints are violated. According to Rodriguez (2005), inconsistency is related to what are called primary or secondary forms of error. The primary form of error corresponds to a wrong description of location or characteristics/qualities of spatial objects. For example, if an integrity constraint that states that a given object have only one location, there is an inconsistency derived from a primary type of error if there is more than one location for the involved object. This type of inconsistency occurs because there are differences in data accuracy or precision, but also because many observations of spatial phenomena are essentially vague. For example, the boundaries of forests, mountains, lakes, and oceans cannot be determined with precision; i.e. two observers may draw two different shapes/locations for the same object. A spatial inconsistency related to a secondary error refers to a contradiction between stored data and constraints associated with definitions of geometric primitives. For example, a polygon must be bounded by closed and non self-intersecting polylines that represents its boundary. Inconsistency may also be related to semantic contradictions, such as when a road overlaps a building. These types of inconsistency depend on the spatial domain, and they are captured by rules that should be expressed within the data model.
Management of uncertainty
ย ย Bรฉdard (1987) distinguishes two approaches to manage the uncertainty in spatial databases:
โข Reduction: uncertainty reduction refers to a rigorous definition of modeling rules (i.e. defining the contents of a model, what to observe and how) and communication rules (i.e. defining the model form, the modeling language to use). From a technical point of view, the uncertainty reduction is realized by using specific tools: mathematical procedures to improve the data precision (e.g. statistics with overabundant measurements), Fuzzy Logic to reduce the qualitative uncertainty, inclusion of lineage in digital maps, the use standard specifications and symbols (e.g. ISO standards), etc.ย (Bรฉdard 1987, Hunter 1998).
โข Absorption: uncertainty absorption refers to the risk related to the uncertainty that remains after all reduction means have been used. For example, it may refer to the guarantees made by a database producer in order to compensate the users damaged by poor data. In the same way, the user can absorb the imperfection when he accepts to use non-guaranteed databases. Absorption can also take place when a professional guarantees data (then his professional liability insurances absorb the risk). Bรฉdard (1987) defined the uncertainty absorption as the level of monetary risk in providing or using of a given database. When damages occur, the uncertainty is absorbed by the ones who pay for these damages. This solution is often perceived as a protection against the potential liability claims whether the database entail damages for the users (Hunter 1998). Finally, the reduction and absorption are substantially different. The reduction is ensured through technical tools and methods whereas the absorption is guaranteed through institutional and legal tools. In practice, the imperfection is managed by combining these two approaches.
|
Table des matiรจres
CHAPTER 1: INTRODUCTION
1.1 RESEARCH CONTEXT
1.2 PROBLEM STATEMENT
1.3 OBJECTIVES AND HYPOTHESES OF THE RESEARCH
1.3.1 Objectives
1.3.2 Hypotheses
1.4 METHODOLOGY
1.5 STRUCTURE OF THE THESIS
CHAPTER 2: LITERATURE REVIEW
2.1 INTRODUCTION
2.2 SPATIAL DATA IMPERFECTIONS
2.2.1 Taxonomies of spatial imperfections
2.2.2 Terminology related to spatial data imperfections
2.2.3 Levels of uncertainty
2.2.4 Management of uncertainty
2.2.5 Spatial imperfections in spatial databases
2.2.5.1 Introduction
2.2.5.2 Imperfection aspects in spatial databases modeling
2.2.5.3 Management of imperfections in spatial databases
2.2.6 Spatial imperfections in spatial data warehouses
2.2.7 Spatial data quality and management of imperfections
2.2.7.1 Notion of spatial data quality
2.3 SPATIAL OBJECTS WITH VAGUE SHAPES AND THEIR TOPOLOGICAL RELATIONSHIPSย
2.3.1 Fiat objects vs bona fide objects
2.3.2 Modeling of spatial objects with vague shapes
2.3.2.1 Definitions based on exact models
2.3.2.2 Models based on mathematical approaches2.3.3 Topological relationships between spatial objects with vague shapes
2.4 CONSISTENCY OF SPATIAL DATABASES AND INTEGRITY CONSTRAINTS
2.4.1 Introduction
2.4.2 Classification of integrity constraints
2.4.3 Formal specification of spatial integrity constraints
2.4.3.1 First-order logic based languages
2.4.3.2 Visual specification of spatial integrity constraints
2.4.3.3 Tabular specification
2.4.3.4 Spatial Extension of Object Constraint Language (OCL)
REFERENCES
CHAPTER 3: QUALIFIED TOPOLOGICAL RELATIONSHIPS BETWEEN OBJECTS WITH POSSIBLY VAGUE SHAPES
3.1 RESUME DE LโARTICLE
3.2 ABSTRACT
3.3 INTRODUCTIONย
3.4 PREVIOUS WORKS
3.4.1 Spatial vagueness
3.4.2 Formal definitions of objects with vague shapes
3.5 PROBLEM STATEMENT
3.6 SPATIAL OBJECTS WITH VAGUE SHAPES
3.6.1 Broad point
3.6.2 Line with a vague shape
3.6.3 Region with a broad boundary
3.7 TOPOLOGICAL RELATIONS BETWEEN SPATIAL OBJECTS WITH VAGUE SHAPES
3.7.1 Principles
3.7.2 Topological relations between a region with a broad boundary and a crisp one
3.7.3 Topological relations between regions with broad boundaries
3.8 CLUSTERING OF TOPOLOGICAL RELATIONS BETWEEN REGIONS WITH BROAD BOUNDARIES
3.8.1 Principles
3.8.2 Clustering results
3.8.3 Overlapping clusters
3.9 SPECIFICATION OF SPATIAL QUERIES AND INTEGRITY CONSTRAINTSย
3.11 CONCLUSIONS AND FUTURE WORKS
REFERENCES
CHAPTER 4: QUALITATIVE MIN-MAX MODEL FOR LINES WITH VAGUE SHAPES AND THEIR TOPOLOGICAL RELATIONS
4.1 RESUME DE LโARTICLE
4.2 ABSTRACT
4.3 INTRODUCTION
4.4 SHAPE VAGUENESS FOR LINES
4.5 QMMDEF MODEL FOR LINES WITH VAGUE SHAPES
4.5.1 Evaluation of shape vagueness for linear geometries
4.5.2 Definition of lines with vague shapes
4.6 QMM TOPOLOGICAL RELATIONSHIPS BETWEEN LINES WITH VAGUE SHAPES
4.6.1 Extending of CBM method
4.6.2 Principles of identification of topological relations in the QMMTR model
4.7 CLUSTERING OF TOPOLOGICAL RELATIONS BETWEEN LINES WITH VAGUE SHAPES
4.7.1 Principles
4.8 SPECIFICATION OF TOPOLOGICAL INTEGRITY CONSTRAINTS AND SPATIAL QUERIES FOR LINES WITH VAGUE SHAPES
4.9 CONCLUSION
REFERENCES
CHAPTER 5: REDUCING THE VAGUENESS OF TOPOLOGICAL RELATIONSHIPS IN SPATIAL DATA INTEGRATION
5.1 RESUME DE LโARTICLE
5.2 ABSTRACT
5.3 INTRODUCTION
5.4 PREVIOUS WORKS
5.4.1 Geometric heterogeneities in spatial data integration
5.4.2 Formal specification of objects with vague shapes and their topological relationships
5.5 PROBLEM STATEMENT
5.6 MERGING HETEROGENEOUS POLYGONS THROUGH REGIONS WITH BROAD BOUNDARIES
5.6.1 Regions with broad boundaries resulting from integration
5.6.2 Topological relationships between regions with broad boundaries
5.7 CONTROLLING THE VALIDITY OF TOPOLOGICAL RELATIONSHIPS IN SPATIAL DATA INTEGRATION
5.7.1 The different situations
5.7.2 Characterizing the possible topological relationships for the final geometries when a same topological relationship is specified in the sources
5.7.3 Strategies to reduce the vagueness of topological relationships
5.7.3.1 Principles of the strategies
5.7.3.2 First strategy: modifying the final geometries
5.7.3.3 Second strategy: using an adverbial approach to reduce the vagueness of the topological relationships
5.8 EXAMPLE OF REDUCING THE INTRA-LEVEL TOPOLOGICAL RELATIONSHIPS VAGUENESS IN A SPATIAL DATA WAREHOUSE
5.9 CONCLUSION
REFERENCES
CHAPTER 6: AN ADVERBIAL APPROACH FOR THE FORMAL SPECIFICATION OF TOPOLOGICAL INTEGRITY CONSTRAINTS INVOLVING REGIONS WITH BROAD BOUNDARIES
6.1 RESUME DE LโARTICLE
6.2 ABSTRACT
6.3 INTRODUCTION
6.4 OBJECTS WITH VAGUE SHAPES IN QMM MODEL
6.4.1 Categorization of spatial objects with vague shapes
6.4.2 Regions with broad boundaries and their topological relations
6.5 SPECIFICATION OF TOPOLOGICAL INTEGRITY CONSTRAINTS IN SPATIAL DATABASESย
6.5.1 OCL
6.5.2 Spatial OCL
6.6 ADVERBIAL SPATIAL OCL FOR OBJECTS WITH VAGUE SHAPES (AOCLOVS)ย
6.7 EXAMPLE IN AGRICULTURAL SPREADING ACTIVITIES
6.7.1 Formal expression of constraints
6.7.2 Implementation of AOCLOVS
6.8 CONCLUSION
REFERENCES
CHAPTER 7: CONCLUSIONS AND DISCUSSION
7.1 CONTRIBUTIONS
7.2 DISCUSSION
7.3 FUTURE RESEARCHES
7.4 GENERAL CONCLUSION
REFERENCES
APPENDIX 1: 242 TOPOLOGICAL RELATIONS BETWEEN REGIONS WITH BROAD BOUNDARIES AND REQUIRED RULES TO DEDUCE THEM
APPENDIX 2: RULES OF CONSISTENCY
APPENDIX 3: DEMONSTRATIONS OF THE POSSIBLE TOPOLOGICAL RELATIONSHIPS BETWEEN REGIONS WITH BROAD BOUNDARIES RESULTED FROM AN INTEGRATION PROCESS
APPENDIX 4: EXTRAIT DE LA CONVENTION DE COTUTELLE
Tรฉlรฉcharger le rapport complet