Positioning and State of the Art
Positioning
Grid computing aims at providing transparent access to computing power and data storage from many heterogeneous resources in different geographical locations – this is also called virtualization of resources. Taking advantage of these resources involves both an environment providing the virtualization, and a programming model and deployment framework so that applications may be designed, deployed, and executed. We aim at defining a high-level programming model along with a deployment framework for Grid applications.
Positioning with respect to general Grid architecture
We start by describing our target audience, then we position our contribution with respect to the Grid software layers. From a user point of view, Grid application developers may be classified in three groups, as proposed in [GAN 02]: the first group consists of end users who build packaged Grid applications by using simple graphical or Web interfaces; the second group consists of programmers that know how to build Grid applications by composing them from existing application “components”; the third group consists of researchers and experts that build the individual components. In this work we essentially address the second group of users, by providing them a programming model and a deployment and execution environment. From a software point of view, the development and execution of Grid applications involves three concepts which have to be integrated : virtual organizations, programming models and deployment and execution environments. Virtual organizations are defined in the foundation paper [FOS 01] as:
individuals and/or institutions who share access, in a highly controlled manner with clearly defined policies, to computers, software, data and other resources required for solving computational problems in industry, science and engineering.
Virtual organizations are set up following concertations and agreements between involved partners. Such organizations are becoming popular within the academic community, and they take advantage of large infrastructures of federated resources such as Grid’5000 [CAP 05] and NorduGrid [EER 03] in Europe, TeraGrid in the U.S. [TER], Naregi [NAR] and ChinaGrid [JIN 04] in Asia, etc.. Global interactions between these infrastructures may also be created punctually, resulting in even larger virtual organizations, such as during the Grid PlugTests events [GPT05, GPT]. Programming models provide standards and a basis for developing Grid applications; the definition of standards is a complex and lengthy process, which has to take into account existing standards used in Internet technologies. The execution environments allow access to Grid resources, and usually require complex installation and configuration phases. The concepts of virtual organizations, programming models and deployment and execution environments may be gathered in a layered view of Grid software .
• At the bottom, lies the Grid Fabric, which consists of all the accessible distributed resources. These resources physically consist of CPUs, databases, sensors and specialized scientific instruments, which are federated through a virtual organization. These physical resources are accessed from the software stack thanks to operating systems (and virtual machines), network connection protocols (rsh, ssh, etc.) and clusters schedulers (PBS, LSF, OAR, etc.). The resources of this layer are federated through virtual organizations.
• Above, layer 2, is the Grid middleware infrastructure (sometimes referred to as the Core Grid Middleware), which offers core services such as remote process management and supervision, information registration and discovery, security and resource reservation. Various frameworks are dedicated to these aspects. Some of them are global frameworks providing most of these services, such as the Globus toolkit [FOS 05a] which includes software services and libraries for resource monitoring, discovery, and management, plus security and file management. Nimrod [ABR 95] (a specialized parametric modeling system), Ninf [SEK 96] (a programming framework with remote procedure calls) and Condor [THA 05] (a batch system) take advantage of the Globus toolkit to extend their capabilities towards Grid computing (appending the -G suffix to their name). Unicore [UNI] and Gridlab [ALL 03] services are other examples of global frameworks providing access to federated Grid resources, with services including resource management, scheduling and security. Other frameworks in the Core Grid Middleware layer are more specialized: JXTA [JXT] for example is a peer-to-peer framework which may be integrated within Grids for managing large amounts of data, such as in JuxMem [ANT 05]. GridBus [BUY 04] is a Grid middleware designed to optimize performance/cost ratios, by aggregating the most suitable services and resources for a given task. The deployment and execution environments for Grid entities are provided by the Grid middleware layer.
• Layer 3 is the Grid Programming Layer, which includes programming models and tools. This layer contains high-level programming abstractions facilitating interaction between application and middleware, such as the GGF’s initiative called SAGA [GOO , GGF04] which is inspired by the Grid Application Toolkit (GAT) [ALL 05], and Globus-COG [LAS 01] which enhances the capabilities of the Globus Toolkit by providing workflows, control flows and task management at a high level of abstraction. As reported by some users [BLO 05] however, some of these abstractions still leave the programmer with the burden to deal with explicit brokerage issues, sometimes forcing to use literal host names in the applicative code.
• The top layer is the Grid Application Layer, which contains applications developed using the Grid programming layer as well as web portals, in which users control applications and feed data through web applications. Applications may collaborate in various levels of coupling, from service based interactions to optimized direct communications between internal entities.
Positioning with respect to existing programming models for the Grid
We describe here the main programming models applicable to Grid computing. A number of articles ([LAF 02, PAR 05, FOX 03, LEE 01] present overview of Grid programming models and environments, highlighting usability as a fundamental requirement. In this section our objective is to justify why we consider component based programming as the most suitable paradigm.
Message passing
Message-passing programming models are popular within the scientific community, as they represent an efficient mean to solve computational problems on dedicated clusters, notably using the SPMD programming style. Message passing frameworks such as MPI or PVM [GEI 94] provide an optimized communications layer. Moreover, numerous algorithms are available for further optimizing the communications and distribution of tasks between involved distributed processes. Message-passing is one of the most basic ways to achieve process communication in the absence of shared memory, but it is a low-level programming style, which on one hand provides efficiency, latency management, modularity and global operations, but on the other hand fails to provide the abstractions necessary for Grid computing, such as support for heterogeneity, interoperability, resource management, adaptivity and dynamicity. Nevertheless, in regard of the advantages given by latency management, modularity and global operations, some frameworks have been implemented in order to bring the message-based programming paradigm to Grid computing. They usually rely on existing Grid middleware, such as Globus. This is the case for instance of MPICHG2 [KAR 03], which uses Globus for authentication, authorization, executable staging, process creation, process monitoring, process control, communication, redirection of standard input and output and remote file access. PACX-MPI [KEL 03] is another implementation of MPI geared towards Grid computing, which contains a wide range of optimizations for Grid environments, notably for efficient collective communications. MagPie [KIE 99] also focuses on the efficiency of collective communications: it can optimize a given MPI implementation by replacing the collective communication calls with optimized routines. In conclusion, although not a high-level programming model, message-passing may be used for Grid programming by relying on the services of other Grid middleware, however, direct control over low level communications is attractive only for certain kinds of applications.
|
Table des matières
Contents
Acknowledgements
List of Figures
List of Tables
Glossary
I Thesis
1 Introduction
1.1 Problematics
1.2 Objectives and Contributions
1.3 Overview
2 Positioning and State of the Art
2.1 Positioning
2.1.1 Positioning with respect to general Grid architecture
2.1.2 Positioning with respect to existing programming models for the Grid
2.1.2.1 Message passing
2.1.2.2 Distributed objects
2.1.2.3 Skeletons
2.1.2.4 Remote Procedure Calls for Grids
2.1.2.5 Service Oriented Architectures and workflows
2.1.2.6 Component-based programming
2.2 Component-based programming models and frameworks
2.2.1 Component-based programming
2.2.2 Industrial component models
2.2.2.1 EJBs
2.2.2.2 COM and .NET
2.2.2.3 CCM
2.2.3 Academic research in component models
2.2.4 ODP
2.2.5 OpenCOM
2.2.6 SOFA
2.2.7 Fractal
2.3 Component models for Grid computing
2.3.1 CCA
2.3.2 ICENI
2.3.3 HOC-SA
2.3.4 GridCCM
2.4 Collective communications
2.4.1 Rationale
2.4.2 Common models and frameworks for collective communications
2.4.2.1 Message-based collective communications
2.4.2.2 Invocation-based collective communications
2.4.3 Collective communications in component models
2.4.3.1 Connection-oriented programming
2.4.3.2 ODP
2.4.3.3 ICENI
2.4.3.4 CCA
2.4.3.5 GridCCM
2.5 Discussion
2.6 Context: a model based on active objects
2.6.1 Active objects model
2.6.2 The ProActive library: principles, architecture and usages
2.6.2.1 Implementation language
2.6.2.2 Implementation techniques
2.6.2.3 Semantics of communications
2.6.2.4 Features of the library
2.6.2.5 Deployment framework
2.6.2.6 Large scale experiments and usages
2.7 Conclusion
3 A Component Model for Programming Grid Applications
3.1 Requirements for the component model
3.2 Adequacy of the Fractal model for the Grid
3.2.1 Distribution
3.2.2 Deployment
3.2.3 Heterogeneity
3.2.4 Interoperability and legacy software
3.2.5 High performance
3.2.6 Complexity
3.2.6.1 Hierarchical composition
3.2.6.2 Sharing
3.2.6.3 Parallelism
3.2.7 Dynamicity
3.2.8 Conclusion and role of the underlying middleware
3.3 A component model based on the active object model
3.3.1 A component is an active object
3.3.2 Communication paradigm
3.3.2.1 Asynchronous and synchronous communications
3.3.2.2 Optimizations
3.3.3 Deployment model: a virtualization of the infrastructure
3.3.3.1 General model
3.3.3.2 Cardinality of virtual nodes
3.3.3.3 Controlled deployment through the composition of virtual nodes
3.3.3.4 Collective operations for deployment
3.3.4 Dynamic configuration
3.3.4.1 Locking issues in synchronous systems
3.3.4.2 Locking issues with active components
3.4 Conclusion
4 Collective Interfaces
4.1 Motivations
4.1.1 On the current cardinalities of component interfaces in the Fractal model
4.1.2 Rationale for collective interfaces
4.2 Proposal for collective interfaces
4.2.1 Multicast interfaces
4.2.1.1 Definitions
4.2.1.2 Multicast client interfaces vs multicast server interfaces
4.2.1.3 Signatures of methods
4.2.1.4 Invocation forwarding
4.2.1.5 Distribution of invocation parameters
4.2.1.6 Distribution of invocations
4.2.1.7 Management of results
4.2.1.8 Programming example
4.2.1.9 Application to the master-worker pattern
4.2.2 Gathercast interfaces
4.2.2.1 Definition
4.2.2.2 Gathercast client interfaces vs gathercast server interfaces
4.2.2.3 Bindings to gathercast interfaces
4.2.2.4 Synchronization operations
4.2.2.5 Data operations
4.2.2.6 Implications on dynamicity
4.2.2.7 Programming example
4.3 SPMD-based component programming
4.3.1 SPMD programming concepts
4.3.2 From message-based SPMD to component-based SPMD
4.3.3 Component based SPMD with gathercast and multicast interfaces
4.3.3.1 Principles
4.3.3.2 Usage and benefits
4.4 Towards automatic MxN redistributions
4.5 Conclusion
5 Conclusion
Télécharger le rapport complet