40 Years of Computing at Newcastle

Department Technical Report Series No. 593

Definition and Evaluation of Adaptive Fault-Tolerant Architectures in a Distributed Computing Environment

J. Xu, F. Di Giandomenico, A. Bondavalli and S. Chiaradonna

University of Newcastle upon Tyne. 1997.

Abstract

This paper discusses the issue of providing tolerance to both hardware and software faults in a distributed computing environment. We define several hybrid-fault-tolerant architectures that can co-exist and work simultaneously at the top of the supporting environment, and introduce a systematic method for evaluating their dependability, efficiency and response time.

Most existing studies assume that a fixed amount of hardware and system resources is bound statically to a given architecture. However, in a general purpose distributed system multiple unrelated applications may compete for system resources such as processors, memories and communication devices, thereby exhibiting highly dynamic system characteristics. Our architectural solutions are directed at such systems and thus have an important concern with adaptation - adaptive execution of redundant programs so as to minimize hardware resource consumption and to shorten the response time, as much as possible, for a required level of fault tolerance. In comparison with static architectures, the analytical results under some realistic scenarios show that adaptive (i.e. dynamic) designs generally make more efficient use of available resources without compromising dependability. Dynamic architectures often have a longer worst-case response time, but in some application scenarios they may have a higher probability of making a timely response than static designs.


Department Technical Report Series - 1997
Department Technical Report Series Index
Contents Page - 40 Years of Computing at Newcastle
Technical Report Abstract No. 593, 30 June 1997