40 Years of Computing at Newcastle

Preventing State Divergence in Replicated Distributed Systems

A. Tully

University of Newcastle upon Tyne. 1990

Abstract

N-Modular Redundancy (NMR) is a form of active replication in which each processor is replicated to form a node and each processor replica within the node executes the same set of software components replicas. Communication between nodes, in the form of messages, passes through a voting mechanism by which processor failures are mased. When the degree of replication is three, the technique is known as Triple Modular Redundancy (TMR) and can tolerate the failure of a single node processor. For voting to be successful, non-faulty software component replicas must output identical messages in an identical order. If we assume that software components are deterministic, then we need only ensure that the replicas process identical input messages in an identical order. Such software components conform to the well understood and researched state machine model of active replication. However, most distributed programs employ mechanisms not incorporated in the state machine model such as timeouts and prioritized messages. These potential sources of non-determinism could lead to a divergence of state among software component replicas which could then produce inconsistent responses to identical input messages, thereby defeating the NMR voting mechanism. The main contributions of this thesis are:


List of PhD. Students and Theses Titles - 1990
List of PhD. Students and Theses Titles - Index
Contents Page - 40 years of Computing at Newcastle
Abstract - PhD: Tully, 2 July 1997
Brian Randell