학술논문
Fault tolerant objects in distributed systems using hot replication
Document Type
Conference
Author
Source
Conference Proceedings of the 1996 IEEE Fifteenth Annual International Phoenix Conference on Computers and Communications Computers and communications Computers and Communications, 1996., Conference Proceedings of the 1996 IEEE Fifteenth Annual International Phoenix Conference on. :89-95 1996
Subject
Language
Abstract
This paper presents a new algorithm for supporting fault tolerant objects in distributed systems. The fault tolerance provided by the algorithm is fully user transparent. The algorithm uses a variation of object replication scheme, which we call the Hot Replication Scheme. The algorithm supports nested object invocations. The chief advantages of the scheme are: a) No action is needed in the case of failure of a secondary replica, b) The time to recover from a primary failure is minimal, c) Separation of replication protocol and reliable communication protocol. To recover from a primary failure the system need to (detect the failure and) select one of the secondaries to become the primary. The designated secondary can become primary once it has made sure that its current state is equivalent to the state of the failed primary (it can do so by processing outstanding requests, if any). This is in contrast with the checkpointing and rollback recovery scheme, where the recovery time can be substantial. Our algorithm exploits the general features and concepts associated with the notion of the objects and object interactions to its advantage.