fault tolerance


Also found in: Wikipedia.

fault tolerance

[′fȯlt ‚täl·ə·rəns]
(systems engineering)
The capability of a system to perform in accordance with design specifications even when undesired changes in the internal structure or external environment occur.
McGraw-Hill Dictionary of Scientific & Technical Terms, 6E, Copyright © 2003 by The McGraw-Hill Companies, Inc.

fault tolerance

(architecture)
1. The ability of a system or component to continue normal operation despite the presence of hardware or software faults. This often involves some degree of redundancy.

2. The number of faults a system or component can withstand before normal operation is impaired.
This article is provided by FOLDOC - Free Online Dictionary of Computing (foldoc.org)

fault tolerant

The ability to continue non-stop when a hardware failure occurs. A fault-tolerant system is designed from the ground up for reliability by building multiples of all critical components, such as CPUs, memories, disks and power supplies into the same computer. In the event one component fails, another takes over without skipping a beat.

Tandem and Stratus were the first two manufacturers that were dedicated to building fault-tolerant computer systems for the transaction processing (OLTP) market.

High Availability
Many systems are designed to recover from a failure by detecting the failed component and switching to another computer system. These systems, although sometimes called fault tolerant, are more widely known as "high availability" systems, requiring that the software resubmits the job when the second system is available.

Redundant Hardware
True fault tolerant systems with redundant hardware are the most costly because the additional components add to the overall system cost. However, fault tolerant systems provide the same processing capacity after a failure as before, whereas high availability systems often provide reduced capacity. See fault management.


A RAID Array
This RAID II prototype in 1992, which embodies principles of high performance and fault tolerance, was designed and built by University of Berkeley graduate students. Housing 36 320MB disk drives, its total storage was less than the disk drive in the cheapest PC only six years later. (Image courtesy of The Computer History Museum, www.computerhistory.org) See RAID.







Redundancy
Redundancy is the hallmark of fault tolerant systems. This storage server from Xtore Extreme Storage (www.xtore-es.com) contains multiple, hot-swappable power supplies to ensure continued operation.
Copyright © 1981-2019 by The Computer Language Company Inc. All Rights reserved. THIS DEFINITION IS FOR PERSONAL USE ONLY. All other reproduction is strictly prohibited without permission from the publisher.
References in periodicals archive ?
Additional errors are said to include difficult-to-process materials and demanding applications where there is zero fault tolerance. The inherent benefits of the 800 Series are said to be retained in the 800 Series Hybrid, including compact design, low residence time and a common deflector bore that eliminates tolerance stack up.
The movement and wireless features assign MANET more uses in a variety of areas; thus high reliability and fault tolerance ability are required.
This paper considers three well-known algorithms for scheduling such as HEFT, Min-Min and GA for the series of simulations and the performance of DRFT approach is compared with the dynamic group based fault tolerance (DGFT) approach.
Comparison of the storage space and data blocks repair bandwidth in the same fault tolerance capability Comparison parameters erasure codes replication Fault tolerant ability n - k n - k storage space nT/k MB k(n - k + 1)MB Block repair bandwidth k x T/k = TMB T/k MB Table 2.
When a fault is detected the system either recovers its state by luck or by the designated fault tolerance hence this response can ward off any failure.
Automated Fault Tolerance creates a live shadow Cloud server (Failover Cloud Virtual Machine), which is synchronized and always up-to-date with the primary one.
He covers logging and checkpointing, recovery-oriented computing, data and service replication, group communication systems, consensus and the Paxos algorithm, Byzantine fault tolerance, and application-aware Byzantine fault tolerance.
It would appear that fault tolerance is a necessary paradigm that must be taken into consideration for the multi-agent development environment.
With the increasing scientific importance of CubeSat missions fault tolerance is now becoming critical.
The key concept here is "fault tolerance," meaning that the support systems of the data center need to tolerate a fault without disrupting service.
[24] established k-connectivity network to improve fault tolerance by [RN.sub.s] placement; the greedy and distributed algorithms were proposed and their performance had been proven by simulations.