학술논문

Fault-tolerant replication management in large-scale distributed storage systems

Document Type

Conference

Author

Source

Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems Reliable distributed systems Reliable Distributed Systems, 1999. Proceedings of the 18th IEEE Symposium on. :144-155 1999

Subject

Computing and Processing
Fault tolerant systems
Large-scale systems
Access protocols
Educational institutions
Read only memory
Storage automation
Earthquakes
Storms
Hardware
Fault detection

Language

ISSN

1060-9857

Abstract

Failures of all forms happen: from losing single network packets to site-wide disasters. Since businesses rely heavily on their data, it is imperative that failures require minimal time and effort to repair and that the service interruption during the failure or repair period should be as short as possible. To this end, the ideal system should repair itself relying on humans only when absolutely necessary in the repair process. This paper describes one component of a self-healing storage system: the component that allows for automatic recovery of access to data when the power comes back on after a large-scale outage. Our failure recovery, protocol is part of a suite of modular protocols that make up the Palladio distributed storage system. This protocol guarantees that service will be repaired quickly and automatically when enough failures are repaired.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송