Search for contacts, projects,
courses and publications

Optimizing Failure Prediction to Maximize Availability

Additional information

Authors
Kaitovic I., Malek M.
Type
Article in conference proceedings
Year
2016
Language
English
Abstract
Availability of autonomous systems can be enhanced with self-monitoring and fault-tolerance methods based on failures prediction. With each correct prediction, proactive actions may be taken to prevent or to mitigate a failure. On the other hand, incorrect predictions will introduce additional downtime associated with the overhead of a proactive action that may decrease availability. The total effect on availability will depend on the quality of prediction (measured with precision and recall), the overhead of proactive actions (penalty), and the benefit of proactive actions when prediction is correct (reward). In this paper, we quantify the impact of failure prediction and proactive actions on steady-state availability. Furthermore, we provide guidelines for optimizing failure prediction to maximize availability by selecting a proper precision and recall trade-off with respect to penalty and reward. A case study to demonstrate the approach is also presented.
Conference proceedings
13th IEEE International Conference on Autonomic Computing (ICAC)
Month
July
Meeting place
Würzburg, Germany