Mutate and Test Your Tests


by Benoit Baudry

I am extremely proud and happy that my talk on mutation testing got accepted as an early bird for EclipseCon Europe 2017.

We will talk a lot about software testing at the project quality day. In this talk, I will focus on qualitative evaluation of a unit test suite. Statement coverage is commonly used to quantify the quality of a test suite: it measures the ratio of source code statements that are executed at least once when running the test suite. However, statement coverage is known to be a rather weak quality indicator. For example, a test suite that covers 100% of the statements and that has absolutely no assertion is a very bad test suite, yet is considered of excellent quality according statement coverage.

Mutation testing is another way of evaluating the quality of a test suite, according to the following intuition: a test suite is good to test a specific program if it is able to detect bugs in this program: so, let's inject bugs in this program and see how much the test suite is able to detect. Mutation testing was proposed back at the end of the 1970's by Richard A. DeMillo, Richard J. Lipton, and Fred G. Sayward, and their idea was quite simple: create multiple mutants of the program under test, i.e., versions of the program with one seeded bug (for example, on off by one error), run the test suite against each mutant, and see how many are detected. Since then, many tools were built for mutation testing, including PIT, a state of the art tool for Java mutation testing.

Considering the example below, PIT will generate one mutant in which the == operand at line 2 is replaced by !=, a mutant in which the return value.hashcode() at line 11 is replaced by return 0, and so on.

public int hashCode() {
    if (value == null) {return 31;}
    if (isIntegral(this)) {
        long value = getAsNumber().longValue();
        return (int) (value ^ (value >>> 32));
    }
    if (value instanceof Number) {
        long value = Double.doubleToLongBits(getAsNumber().doubleValue());
        return (int) (value ^ (value >>> 32));
    }
    return value.hashCode();
  }    

Mutation testing tools generate a large quantities of mutants, and, consequently, the analysis takes a long time to run. In this talk, I will present Descartes: a mutation testing tool that adapts PIT for extreme mutation. Descartes follows the principle of mutation testing, but generates mutants by removing completely the body of a method, instead of injecting multiple small bugs. For example, Descartes generates only two mutants for the 'hashCode()' above:

public int hashCode() {
    return 0;
  }

public int hashCode() {
    return 1;
  }    

In this talk, I will summarize the principles of mutation testing, current state of the art and then will illustrate how it can work with PIT and Descartes. I will conclude with some of the lessons we have learned when running Descartes on open source projects.

About the author

Benoit Baudry is a Professor at the KTH Royal Institute of Technology in Stockholm, Sweden. He currently investigates novel techniques for automatic test amplification in DevOps. His research focuses on program analysis, transformation, testing and diversification. Until August 2017 he was a research scientist at INRIA, France, where he led the DiverSE research group (EPI) since 2013.