Collecting Test Coverage Metrics

Author: Kenneth R. Greene, Siemens Power Corp. (krg@fred.nfuel.com)

One of the most difficult aspects of testing is answering the question, "How good are the tests?". This question is the motivating issue for a considerable amount of research which is taking place around the world today. Smalltalk testing is complicated. One must be concerned with ensuring that the testing cover both domain models and graphical user interfaces. There are multiple perspectives to the topic of testing - ad hoc, acceptance, regressive, white box, black box, component, system, etc. Regardless of the intent, there is almost always a need to be able to quantitatively answer the question, "How good are the tests?".

I propose that part of the answer to this question must include dynamic software metrics based on coverage. If used properly, in conjunction with other software quality tools and good coding style, coverage can provide a reasonable quantitative indication of the quality of a defined set of tests. The value of coverage testing has been hotly debated for a number of years. Those who are testing experts will chastise me if I fail to state that 100% coverage does not ensure that a system is completely tested or free of errors. The opposite, however, is true. One can say that testing is inadequate if 100% coverage has not been achieved.

Acquiring Coverage Data

How does one measure coverage? Coverage metrics are acquired by instrumenting the code to be tested. This is possible in Smalltalk because methods may be instrumented by selectively modifying and recompiling the source code. The instrumented method can be substituted for the original in the method dictionary for the duration of a test.

Instrumentation takes the form of inserting a statement into each method of each class that is to be monitored. A statement of the following form will suffice:

     CoverageInstrument tally: thisContext.

This statement is non-intrusive and should not create undesirable side effects. It uses a global variable, CoverageInstrument, because it must be visible from anywhere in the system. VisualWorks(TM) provides a pseudo variable, thisContext, containing information about the current execution context, that can be used for collecting coverage metrics. This object holds the caller, the receiver, the method, and other execution relevant objects.

This approach has the implicit assumption that the method is the atomic unit of coverage assessment. One can argue about the validity of the assumption. However, if the software to be tested conforms to good object oriented style, abiding by the Law of Demeter (reference 1), then methods tend to be quite small in size and simple in scope. Under these conditions, the assumption is reasonable and the derived metrics may be interpreted as having considerable value. Quite clearly, the assumption breaks down for large complex methods.

The method named tally: understood by CoverageInstrument, is responsible for collecting the data from the executing context. This information is partitioned and collected by class and method. One must be careful to take inheritance into consideration when defining the methods that instances of a class understand.

Instance Based Coverage vs Class Based Coverage

Two different perspectives occur with coverage test metrics and their differences should be kept foremost in mind. For classes being monitored, the actual monitoring is performed on instances of the class. In instance based coverage, we are collecting statistics on which protocols (as defined by the class of the instance and all of its superclasses), have been tested. This is different from class based coverage testing where for a given class, we want to know which of the protocols it defines has been tested by instances of itself or any of its subclasses.

For instance based testing, some but not all of the inherited protocol is exercised by subclasses. Coverage would not normally be expected to be 100%. For class based coverage, the protocols of abstract classes may be difficult to test because subclasses do not exercise the protocol. As you can see, interpretation of coverage is necessary.

Concluding Remarks

Coverage test metrics are the first step in assessing the value of a suite of tests. Coverage metrics, when properly applied and interpreted, provide a first level assessment of the quality of a set of tests.

A tool has been developed at Siemens Power Corp. which collects and reports coverage metrics for Smalltalk based software. This tool is amazingly robust in the sense that it works equally well with domain modelling objects and graphical user interface objects. This tool has helped point out deficiencies in the regression test suites that we maintain for our Smalltalk based software. In addition to coverage, we are also able to extract profiling information and inheritance metrics from the tool.

References

1. Lieberherr, K. and Holland, I., Assuring Good Style for Object Oriented Programs, IEEE Software, Sept. 1989, pp. 38-48.