« DaxterDragon Quest 4 »

Use QuickCheck for sanity checks

14/01/10

Permalink 10:25:39 am, 589 words
Categories: Code, Linguistics

Use QuickCheck for sanity checks

Today I am generating diagrams with RuG-L04, aka “the dialectology dissertation prettifier". It generates awesome maps that help to visualise dialect areas.

Anyway, the basic input for dialect distances is distances between all pairs of sites, irrespective of order (assuming that the distance between Bloomington and Martinsville is the same as the distance between Martinsville and Bloomington).

Why don’t you take a minute and write some code to do this: for a list of sites, generate all pairs of sites, irrespective of order. That means if you generate (Bloomington, Martinsville), you are not allowed to also generate (Martinsville, Bloomington).

The reason I ask you to do this is that there are many valid ways to generate all unordered pairs. My code, translated from Python*, is this:

pairwise l = [(x,y) | (i,x) <- zip [1..] l, y <- drop i l]

But the difference matrix format used by RuG-L04 uses the following definition:

FOR i = 2 TO max
    FOR j = 1 TO i - 1
        PRINT diff[i,j]
        NEWLINE

Or, translated from Pascal-esque pseudocode to Haskell:

pairwise' [] = []
pairwise' l = [(x,y) | (i,x) &lt- zip [1..] (tail l), y <- take i l]

This generates the pairs in a different order, meaning that I can’t directly feed my output files into RuG-L04. My code generates the following pairs for the list [1,2,3]: [12, 13, 23], while RuG-L04 requires the pairs [21, 31, 32].

At this point, I have 3 choices to make RuG-L04 work:

  1. Switch my whole experiment to use RuG-L04’s order.
  2. Read my distance file into an order-independent data structure like a map, then write out the distances in the order expected by RuG-L04.
  3. Directly transform the distance file to the difference matrix format, without using an intermediate data structure.

I don’t like option 1 because (1) I’ve already run the first part of my experiment and (more important) (2) my order mimics counting order better. In other words, if you run my code on a list of numbers and concatenate the pairs to a single number (ie (1,2) => 12), it looks like you started counting with the first item in the list instead of the second.

I was getting set to code option 2, but lunch time was coming up, so I took a piece of paper with me and started writing out patterns. Sure enough, 5 minutes later, I came up with a way to transform my sequence to the RuG-L04 sequence: if you reverse the output of my code, it is equal to the output of the RuG-L04 with reversed input.

In other words,

forEvery l = pairwise (reverse l) == reverse (pairwise' l)

I tried it on lists of up to length 5, but I ran out of paper (it was a notepad) and patience (my pizza was ready). I didn’t really trust that it worked for any list. But that code reminded me a whole lot of a QuickCheck property. When I got done with lunch, I sat down to ghci and typed the above definitions in, then said

> Test.QuickCheck.quickCheck forEvery
OK, passed 100 tests.

So I have much greater confidence that I can go ahead with the transform now. The moral of the story? I have not found QuickCheck super useful for complicated code but it is useful for making sure that a transformation is consistent for all kinds of input. You don’t even have to be writing a test suite. Just write up a property and test it at the command line.

*Since Haskell defaults to lists, not vectors, (and doesn’t suck at recursion) it would be more efficient as:

pairwise [] = []
pairwise (x:xs) = map ((,) x) xs ++ pairwise xs

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)
free blog