« Two weeks of HaskellGrab Bag »

Solving Cryptograms, Part 1

21/09/08

Permalink 07:52:58 pm, 386 words
Categories: Code, Linguistics, Games

Solving Cryptograms, Part 1

I’ve never met a newspaper game I didn’t want to solve. So here is a weekly series on building a cryptogram solver. This initial installment presents a naive solver that doesn’t yet worry about input/output.

Cryptograms are complex enough that solving them is helped by standard AI techniques. The paradigm I use here is AI-as-search. My area of AI, computational linguistics (or, natural language processing) does not often use this method, so I’m not an expert on it. You’ll pardon me if I refer to Norvig’s Artificial Intelligence : A Modern Approach.

The AI-as-search model requires the following things:

  1. States
  2. Initial State
  3. Successor function
  4. Goal test
  5. Path cost

I’ll won’t be using a path cost for this problem (yet).

Each state is a substitution. This is simply a list of characters: the first character maps to ‘a’, the second to ‘b’, and so on.

type Substitution = [Char]
type Cryptogram = String

Let’s start with the simplest substitution and cryptogram: identity.

initial = ['a'..'z']
cryptogram = "a clockwork orange"

Later I’ll show a smarter initial state that matches the letter frequency in the cryptogram with a table of English letter frequencies. Now all states are just the permutation of the initial state:

states = permute initial

Since this is a list, the successor function is just cdr. Pretty simple. However, I do need to define permute, which I conveniently have on hand from another project. Its complexity is factorial, so there are 26! = 403291461126605635584000000 states for this problem.
That’s a big number. It’s bigger than the number of McDonald’s in the entire universe, blah blah blah, you’ve heard these comparisons before. Let’s just say that this simple definition won’t give me the answer in my lifetime.

permute [] = [[]]
permute l = [x:ys | (x,xs) <- extractions l, ys <- permute xs]

extractions [] = []
extractions l = extract l []
    where extract [] _ = []
          extract (x:xs) prev = (x, prev++xs) : extract xs (x : prev)

The goal test is harder, since the program can’t truly know when it’s done. Not even a human technically knows when it’s done with a cryptogram, but it does have an advantage in that it can parse the resulting sentence and examine its semantics. Let’s just say that the program is done when the number of correct letters is the same as the number of letters in the cryptogram:

count f = length . filter f
bestScore = count (`elem` ['a'..'z']) cryptogram
done dict crypt sub = eval dict crypt sub == bestScore

But I haven’t defined eval yet. eval translates the cryptogram according to the current substitution and then looks up the resulting words in the dictionary. The final score is the total length of the dictionary words. eval will become important later, when the successor function becomes smarter.

eval dict crypt sub = score dict (subst crypt sub)
score dict text = sum (map (lookup dict) (words text))
    where lookup dict w = if Set.member w dict then length w else 0
subst crypt sub = map (\ c -> Map.findWithDefault c c dsub) crypt
    where dsub = Map.fromAscList (zip ['a'..'z'] sub)

Running the program as is will take forever. So let’s test the code with miniature definitions of cryptogram, states and initial. The message is “a bad cab” and the correct substitution is “dbca".

test = maximumBy (comparing (eval dict crypt)) states
    where dict = Set.fromAscList ["a", "ad", "bad", "cab", "cad"]
          crypt = "d bda cdb"
          initial = "abcd"
          states = permute initial

Now you can test that the program works by saying:

Prelude> :load /Users/zackman/src/Crypto.hs
[1 of 1] Compiling Main             ( /Users/zackman/src/Crypto.hs, interpreted )
Ok, modules loaded: Main.
*Main> test
"dbca"
*Main> subst "d bda cdb" test
"a bad cab"

Next week I will definitely show how to read a real dictionary and start organising the state search so that it is intelligent. If you are a Haskell programmer, please let me know how I can improve my code. Except for type annotations. I don’t like type annotations. Here is the complete code so far.

Edit: I wrote a second entry if you want to read that, too.

2 comments

Comment from: m [Visitor] Email
Great post, however the last link (Crypto.html) does not work.
22/09/08 @ 03:44
Comment from: sandersn [Member]
Fixed. The blog software mangles relative URLs differently between public view and admin view.
22/09/08 @ 07:43

Comments are closed for this post.

powered by b2evolution free blog software