학술논문

Fault tolerant objects in distributed systems using hot replication
Document Type
Conference
Source
Conference Proceedings of the 1996 IEEE Fifteenth Annual International Phoenix Conference on Computers and Communications Computers and communications Computers and Communications, 1996., Conference Proceedings of the 1996 IEEE Fifteenth Annual International Phoenix Conference on. :89-95 1996
Subject
Computing and Processing
Communication, Networking and Broadcast Technologies
Signal Processing and Analysis
Fault tolerant systems
Fault tolerance
Object oriented modeling
Protocols
Computer science
Checkpointing
Distributed computing
Physics computing
Message passing
Design methodology
Language
Abstract
This paper presents a new algorithm for supporting fault tolerant objects in distributed systems. The fault tolerance provided by the algorithm is fully user transparent. The algorithm uses a variation of object replication scheme, which we call the Hot Replication Scheme. The algorithm supports nested object invocations. The chief advantages of the scheme are: a) No action is needed in the case of failure of a secondary replica, b) The time to recover from a primary failure is minimal, c) Separation of replication protocol and reliable communication protocol. To recover from a primary failure the system need to (detect the failure and) select one of the secondaries to become the primary. The designated secondary can become primary once it has made sure that its current state is equivalent to the state of the failed primary (it can do so by processing outstanding requests, if any). This is in contrast with the checkpointing and rollback recovery scheme, where the recovery time can be substantial. Our algorithm exploits the general features and concepts associated with the notion of the objects and object interactions to its advantage.