Fault-tolerant systems

Fault-tolerant systems

Systems, predominantly computing and computer-based systems, which tolerate undesired changes in their internal structure or external environment. Such changes, generally referred to as faults, may occur at various times during the evolution of a system, beginning with its specification and proceeding through its utilization. Faults that occur during specification, design, implementation, or modification are called design faults; those occurring during utilization are referred to as operational faults, The use of fault tolerance techniques is based on the premise that a complex system, no matter how carefully designed and validated, is likely to contain residual design faults and to encounter unpreventable operational faults.

Generally, fault tolerance techniques attempt to prevent lower-level errors (caused by faults) from propagating into system failures. By using various types of structural and informational redundancy, such techniques either mask a fault (no errors are propagated to the faulty subsystem's output) or detect a fault (via an error) and then effect a recovery process which, if successful, prevents a system failure. In the case of a permanent internal fault, the recovery process usually includes some form of structural reconfiguration (for example, replacement of a faulty subsystem with a spare or use of an alternate program) which prevents the fault from causing further errors. Typically, a fault-tolerant system design will incorporate a mix of fault tolerance techniques which complement the techniques used for fault prevention. See Software engineering

References in periodicals archive ?
Pradhan, "Introduction to energy-efficient fault-tolerant systems," Energy-Efficient Fault-Tolerant Systems, pp.
It's also important that all data and applications be protected by fault-tolerant systems that are backed up at the file level.
By tracking uncommitted file-system changes and recording the intentions or changes within the journal data structure, FileX fully supports fault-tolerant systems. FileX improves system reliability and prevents data corruption by enabling the recovery of files in the case of a system crash or power failure.
It delivers superior performance per watt, low latency and deterministic packet delivery to enable reliable, fault-tolerant systems. These space-grade solutions are resilient to and can work within the ionizing/radiation-intensive environments of space.
Coverage encompasses background materials, foundational topics, important paradigms, faults and fault-tolerant systems, and real-world issues.
If the failed component depends on other components installed on other sites, the coordinator invites the participants (which are the recovery managers of the fault-tolerant systems of the instances of the proposed framework and which are installed on the other sites) to perform the rollback towards the last saved checkpoint for the components which depend on the failed component.
The services to support replication-based fault-tolerant systems such as replication service, group communication service, and membership service are essential in constructing the infrastructure for supporting fault tolerance policies.
Karimi, "A novel weighted voting algorithm based on neural networks for fault-tolerant systems," in Proceedings of the 3rd IEEE International Conference on Computer Science and Information Technology (ICCSIT '10), vol.
The open-platform ONVIF Profile S-compliant NVR takes full advantage of DVTEL's experience and expertise in high- performance fault-tolerant systems with its proven VMS software architecture and IT/IP capabilities into a new easy-to-use scalable product.
Unlike fault-tolerant systems that literally "tolerate" system failures and continue to operate or notify users when the problem has occurred, the Sleuth's purpose is to avoid outages altogether.
In practical fault-tolerant systems, redundancy in space is very widespread.
That work treats untimed fault-tolerant systems only.