학술논문

Adaptive fault recovery for networked reconfigurable systems
Document Type
Conference
Source
11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2003. FCCM 2003. Field-programmable custom computing machines Field-Programmable Custom Computing Machines, 2003. FCCM 2003. 11th Annual IEEE Symposium on. :143-152 2003
Subject
Components, Circuits, Devices and Systems
Computing and Processing
Field programmable gate arrays
Fault tolerant systems
Routing
Network servers
Computer networks
Table lookup
Reconfigurable logic
Fault detection
Design automation
Redundancy
Language
Abstract
The device-level size and complexity of reconfigurable architectures makes fault tolerance an important concern in system design. In this paper, we introduce a fully automated fault recovery system for networked systems, which contain FPGAs (field programmable gate arrays). If a fault is detected hat cannot be addressed locally, fault information is transferred to a reconfiguration server. Following design recompilation to avoid the fault, a new FPGA configuration is returned to the remote system and computation is reinitiated. To illustrate the benefit of this approach, we have implemented a complete fault recovery system, which requires no manual intervention. An important part of the system is a timing-driven incremental router for Xilinx Virtex devices. This router is directly interfaced to Xilinx JBits and uses no CAD tools from the standard Xilinx Alliance tool flow. Our completed system has been applied to three benchmark designs and exhibits complete fault recovery in up to 12x less time than the standard incremental Xilinx PAR flow.