ReSpec - Characterizing and Using the Intrinsic Redundancy of Software
People
Abstract
We propose to study the redundancy of software systems. To put it simply and informally, a system is redundant when it is capable of performing the same logical computation or state transition through the execution of two or more pieces of code that are at least partially different and idependent. In particular, we are interested in redundancy that is not deliberately incorporated into a system but that is instead an intrinsic feature of the software. We have observed experimentally that this form of redundancy is indeed present to a significant extent in some systems. We now propose to study this redundancy systematically. The starting point of our research will be a formalization of the intuitive notion of redundancy. As we said informally, a software system is redundant when it can do the same thing in different ways. We will develop this notion more formally by defining, on the one hand, what it means for two executions to do the same thing, and on the other hand how two executions can be differentiated and to what extent they can be deemed independent. In particular, our model of redundancy will be based on the notion of observable equivalence of executions combined with various distance metrics over execution traces. This model will then be the basis for a dual-track research project. In the first track we propose to study the prevalence, the nature, and the origin of intrinsic redundancy in modern software systems. In other words, within this track we propose to look at intrinsic redundancy as a phenomenon. We will try to characterize it qualitatively and quantitatively and to correlate it with important features of the design of software systems as well as with other technical or organizational aspects of the software and its development process. For example, we intuitively hypothesize, and will attempt to demonstrate experimentally, that intrinsic redundancy is more prevalent in highly modular software systems, where the functionality of the system results from the integration of component frameworks and libraries. Other aspects of the system that might correlate with intrinsic redundancy are specific architectures or specific architectural components (e.g., event-driven architectures, GUI) and specific application domains (e.g., data management systems, collection libraries) and also technical aspects such as the programming language (more or less object-oriented, dynamic vs. static typing). In a second parallel track we propose to develop formalisms and techniques to capture, express, and ultimately exploit intrinsic redundancy. In other words, within this track we propose to look at intrinsic redundancy as a tool. We will first try to capture and express redundancy in the form of specifications of equivalent behaviors with different implementations. Our intuition is that we can draw from techniques developed in diverse fields such as compiler optimization, refactoring, and also some recent developments in model checking, particularly on simulation distances. Both compiler optimization and refactoring amount to rewriting code through transformations that might lead to desirable non-functional properties while preserving the program semantics. Furthermore, such transformations could be combined with quantitative measures of the difference between the original program and the transformed program so as to inform the specific strategies by which the transformations would be applied. In short, a redundancy specification will identify semantics-preserving code-level transformations that also express and somehow quantify the potential benefits of each transformation. Having developed appropriate forms of redundancy specifications, and assuming that some software systems indeed possess significant levels of intrinsic redundancy, we will study ways to exploit that redundancy. In the context of other research projects, we have used this redundancy to augment systems with a form of self-healing behavior, whereby redundant code is used to work around runtime failures. One rather straightforward extension of this work is to use redundancy to improve performance or to address other non-functional properties. However, we also see and propose to study several and diverse potential uses of intrinsic redundancy. One such use is to implement implicit automatic oracles for generated tests. Another idea that also relates to testing and debugging is to amplify generated tests through a combinatorial substitution of equivalent sequences of calls. Such amplified tests could then be used with many testing and fault-localization techniques. Yet another idea is to use redundancy specifications in model checking, for example to reuse invariants computed for specific transitions with equivalent transitions. In general, we see redundancy specifications as yet another form of behavioral specification that may therefore be useful in several forms of program analyses.