Tuesday, November 01, 2011

Code coverage tool

Motivation for writing a coverage tool

In Factor, the old code-is-data adage is true and can be used to our advantage when writing runtime tools. Code is stored in what we call a quotation, which is just a sequence of functions (words) and data literals and can be called at runtime. A function (or word) in Factor is simply a named quotation, and typing that word name causes the quotation to run.

Since quotations in Factor do not terminate early unless an exception is thrown, we know that if each quotation gets run, then we have run all of the code. While this won't tell us if there are logic errors, and while we can still have code fail due to unexpected inputs, at least we can be sure that all of the code is getting exercise and there are no paths that never get run. We can use the results of this tool to write better unit tests.

Demo of the coverage tool

As a simple demo, there's a fun algorithm that produces a sequence of numbers from a start number, where the sequence always seems to end at one. For any integer greater than zero, the the Collatz conjecture states that this sequence will eventually reach the number one. The algorithm goes as follows: take any counting number (1, 2, ...) and either multiply by three and add 1, or divide by two, if the number is odd or even, respectively, and record the sequence of numbers until you reach the number one. This conjecture has been found to be true experimentally for every number up to 10^18, but no proof exists that it's true for all possible inputs.

For example, the Collatz sequence for 3 is: { 3 10 5 16 8 4 2 1 }

We can come up with a solution pretty easily (partially taken from the Project Euler #14, in extra/).

We do the command: "collatz" scaffold-work

Click on the link, edit the file ~/factor/extra/collatz/collatz.factor
USING: combinators.short-circuit kernel make math ;
IN: collatz

: next-collatz ( n -- n )
dup even? [ 2 / ] [ 3 * 1 + ] if ;

: collatz-unsafe ( n -- seq )
[ [ dup 1 > ] [ dup , next-collatz ] while , ] { } make ;

ERROR: invalid-collatz-input n ;

: collatz ( n -- seq )
dup { [ integer? ] [ 1 >= ] } 1&&
[ collatz-unsafe ]
[ invalid-collatz-input ] if ;
We're going to be extra careful here to demonstrate the code coverage tool, so I added error checking to make sure it's an integer greater than or equal to one.

Let's write some unit tests to make sure it works.

Run the command: "collatz" scaffold-tests
Now click on the link to edit the file it created, ~/factor/extra/collatz/collatz-tests.factor
USING: tools.test collatz ;
IN: collatz.tests

[ { 1 } ] [ 1 collatz ] unit-test
[ { 2 1 } ] [ 2 collatz ] unit-test
[ { 3 10 5 16 8 4 2 1 } ] [ 3 collatz ] unit-test
If we run "collatz" test, we see that all tests pass.

Now, let's see how well we did with code coverage.

We run the command:
IN: scratchpad USE: tools.coverage "collatz" test-coverage .
Loading resource:work/collatz/collatz-tests.factor
Unit Test: { [ { 1 } ] [ 1 collatz ] }
Unit Test: { [ { 2 1 } ] [ 2 collatz ] }
Unit Test: { [ { 3 10 5 16 8 4 2 1 } ] [ 3 collatz ] }
{
{ next-collatz { } }
{ collatz { [ invalid-collatz-input ] } }
{
invalid-collatz-input
{ [ \ invalid-collatz-input boa throw ] }
}
{ collatz-unsafe { } }
}
What this tells us is we had quotations in the collatz word and the invalid-collatz-input words that did not get called. Of course -- we never passed it anything other than valid inputs. How about passing it a string or a negative integer?

Now the result looks better:
IN: scratchpad "collatz" test-coverage .
Loading resource:work/collatz/collatz-tests.factor
Unit Test: { [ { 1 } ] [ 1 collatz ] }
Unit Test: { [ { 2 1 } ] [ 2 collatz ] }
Unit Test: { [ { 3 10 5 16 8 4 2 1 } ] [ 3 collatz ] }
Must Fail With: { [ "hello world" collatz ] [ invalid-collatz-input? ] }
Must Fail With: { [ -50 collatz ] [ invalid-collatz-input? ] }
{
{ next-collatz { } }
{ collatz { } }
{ invalid-collatz-input { } }
{ collatz-unsafe { } }
}
We can even get a number for how well tests cover a vocabulary:
"collatz" %coverage .
1.0


The implementation of the coverage tool

Every word in Factor stores its definition in the `def' slot. If we examine this slot, we see that it's a quotation that may contain other quotations. Using the annotation vocabulary, we can add code that executes before and after the code in the quotation. What the coverage tool does is adds a container that stores a boolean flag at the beginning of each quotation, and when the quotation gets run, the flag is set to true. This tool can be turned on and off, independent of the containers being in place, with the coverage-on and coverage-off words.

Here's a word that turns a Roman numeral into an integer:
\ roman> def>> .
[ >lower [ roman>= ] monotonic-split [ (roman>) ] map-sum ]
After annotating it, the new definition looks like:
\ roman> add-coverage \ roman> def>> .
[
T{ coverage } flag-covered >lower
[ T{ coverage } flag-covered roman>= ] monotonic-split
[ T{ coverage } flag-covered (roman>) ] map-sum
]
The flags are all set to false right now. After turning on the flag to let enable the coverage code and running the word, we see a change:
coverage-on "iii" roman> drop \ roman> def>> .[
T{ coverage { executed? t } } flag-covered >lower
[ T{ coverage { executed? t } } flag-covered roman>= ]
monotonic-split
[ T{ coverage { executed? t } } flag-covered (roman>) ]
map-sum
]
Notice that all of the coverage containers have been executed. To generate the report, we simply iterate over each word and collect all of the quotations where this flag has not been set -- these quotations never ran.

In writing this article, I realized that each quotation should have a flag on exit as well, in case an exception gets thrown in the middle of executing this quotation and control never reaches the end. Partially-executed quotations will soon be reported by the tool, after I make this fix.

I hope you can use this tool to improve your Factor code. Have fun!