This book addresses thequestion of how system software should be designed to account for faults, andwhich fault tolerance features it should provide for highest reliability. Theauthors first show how the system software interacts with the hardware to toleratefaults. They analyze and further develop the theory of fault tolerance tounderstand the different ways to increase the reliability of a system, withspecial attention on the role of system software in this process. They furtherdevelop the general algorithm of fault tolerance (GAFT) with its three mainprocesses: hardware checking, preparation for recovery, and the recoveryprocedure. For each of the three processes, they analyze the requirements andproperties theoretically and give possible implementation scenarios and systemsoftware support required. Based on the theoretical results, the authors derivean Oberon-based programming language with direct support of the three processesof GAFT. In the last part of this book, they introduce a simulator, usingit as a proof of concept implementation of a novel fault tolerant processorarchitecture (ERRIC) and its newly developed runtime system feature-wise andperformance-wise. The content applies to industries such as military, aviation,intensive health care, industrial control, space exploration, etc.
· Outlinespotential critical faults in the modern computer systems and what is requiredto change them · Explainshow to design and re-design system software for the next generation ofcomputers for wider application domains and greater efficiency and reliability · Presentshow implemented system software support makes maintenance of computer systemsmuch easier, while reliability and performance increases.