40 Years of Computing at Newcastle

Department Technical Report Series No. 581

Implementing Fail-Silent Nodes for Distributed Systems

F.V. Brasileiro, P.D. Ezhilchelvan, S.K. Shrivastava, N.A. Speirs and S. Tao

University of Newcastle upon Tyne. 1997.

Abstract

A fail-silent node is a self-checking node that either functions correctly or stops functioning after an internal failure is detected. Such a node can be constructed from a number of conventional processors. In a software-implemented fail-silent node, the non-faulty processors of the node need to execute message order and comparison protocols to 'keep in step' and check each other respectively. In this paper the design and implementation of efficient protocols for a two processor fail-silent node are described in detail. The performance figures obtained indicate that in a wide class of applications requiring a high degree of fault-tolerance, software-implemented fail-silent nodes constructed simply by utilising standard 'off-the-shelf' components are an attractive alternative to their hardware-implemented counterparts that do require special-purpose hardware components, such as fault-tolerant clocks, comparator and bus interface circuits.
Department Technical Report Series - 1997
Department Technical Report Series Index
Contents Page - 40 Years of Computing at Newcastle
Technical Report Abstract No. 581, 30 June 1997