학술논문
Pluribus—An operational fault-tolerant multiprocessor
Document Type
Periodical
Author
Source
Proceedings of the IEEE Proc. IEEE Proceedings of the IEEE. 66(10):1146-1159 Oct, 1978
Subject
Language
ISSN
0018-9219
1558-2256
1558-2256
Abstract
The authors describe the Pluribus multiprocessor system, outline several techniques used to achieve fault-tolerance, describe their field experience to date, and mention some potential applications. The Pluribus system places the major responsibility for recovery from failures on the software. Failing hardware modules are removed from the system, spare modules are substituted where available, and appropriate initialization is performed. In applications where the goal is maximum availability rather than totally fault-free operation, this approach represents a considerable savings in complexity and cost over traditional implementations. The software-based reliability approach has been extended to provide enror-handling and recovery mechanisms for the system software structures as well. A number of Pluribus systems have been built and are currently in operation. Experience with these systems has given us confidence in their performance and maintainability, and leads us to suggest other applications that might benefit from this approach.