| « Parallelising embarrassingly parallel code | Switch Subversion to Git: a list of links » |
I’ve been using Haskell for a little over a year and I’ve used Python (off and on) since 2003. Both feature heavily in my dissertation experiment. Now that I’ve used Haskell enough, I think I can write idiomatic programs in both languages. Since, as a linguist, one of my prurient interests is comparative linguistics, I enjoy writing the same thing in both Python and Haskell to see which is better.
The annoying part is that the comparison doesn’t work between Python and Haskell. In the old days I could write something in Java and Python and Python would clearly smoke Java. Or I could write a functional program in Python and Scheme and Scheme would win, except for the parts of functional programming that Guido likes and has added to Python. ("Python: As much Lisp as a C programmer can understand.") Even then I considered that Python cheated because I could write list comprehensions in Scheme if I wanted—everybody else had certainly tried.
But then a couple of things happened. Dan Friedman ruined my faith in Scheme AND Python AND any language, really. In Friedman’s class, you have to get into the head of a language designer. That’s different from the approach of a linguist or a programmer. A linguist tries to describe how the language operates. A programmer critiques how well the parts fit together. Neither have to worry about how to make the language work, or the process of building it.
Have I told this story before? I feel like I have. Maybe I should search backwards in my blog and just post a link. Or maybe not. I’m stuck at my parent’s house without internet access, so you get to suffer through it again. This is the Internet, you can leave if you want to.
Anyway, making yourself a language designer inverts everything. Everything else becomes the toy language and Scheme is the bedrock tool. All the language features that programmers rely on and linguists analyse are useless distractions because you’re writing your own language and you need to CONCENTRATE. All the effort other language implementers spent on inheritance or backtracking or whatever is wasted when it could have been better spent eliminating tail calls: making the compiler/runtime handle byzantine generated code. In the end you are basically left with Scheme and C. (Note how they’re both old, simple languages.) Since Scheme is functional, it’s a better starting place than C for building a functional language**.
The other thing that Friedman’s class did was push me ever further into writing pure functional programs. I had been moving that direction since 2002, ever since reading On Lisp, but 2005 was the year that main was the only code with read and write in it. After Friedman’s class, I switched back to Python because of Scheme’s culture and libraries* and because of the IU comp ling group’s Python culture. I wrote really functional programs in Python. And they were super ugly. Slow, too. Not many bugs, though, and once you understood them, very easy to modify.
So here I was, writing purely (?) functional programs in Python. Very dense, very fast to write, but ugly and hard to pick up after 6 months. I knew something was wrong. The two biggest pain points were the complex data structures and ugliness of advanced functional features. The functional code was dense, but hard to parse visually. But I couldn’t switch back to Scheme because it just pretends the outside world doesn’t exist. You could write an HPSG parser (for example) in either language, but a file-munging script only in Python–it would be 1/4 the size of the Scheme script, which would have to use all sorts of partly documented distro-specific functions.
The tipping point was when I wrote a type-checking wrapper. The second time I spent half a day figuring out the inputs and outputs for 6-month-old functions, I wrote down all the types in comments. The syntax was close to Haskell; I could read Haskell ever since the beginning of 2004 when I got a bunch of free books at a conference, so I knew the type annotations. Then I realised that, because classes are first-class Python objects, I could write the types in my program and have them evaluated. My code changed from
# read :: file * [str] -> [int]
def read(f, regions):
:
to
@check(file, [str], [int])
def read(f, regions):
:
Naturally, I knew this was horrifying. And I knew that I should probably just use Haskell. But I didn’t want to switch because I knew Haskell had the same problems as Scheme; no built-ins for scripting and weak libraries besides.
So I didn’t switch that summer. I growled and grumbled and wrote more dense functional Python. Finally, at the end of the summer, I was looking into an alternate measure of distance between trees. It had been a while since I had to write tree code. And you know there is really only one way to write tree code, and that’s recursively. But primitive recursion is so ugly in Python. I couldn’t bring myself to do it, so I wrote out, on paper, the Haskell to implement this measure. It was so nice. Like Scheme, except with destructuring that I could trust! I still had ghc installed on peregrin so I typed it in when I got back to the office.
At that point I decided I had to bite the bullet and switch. I had the perfect project coming up, too: I was starting on a qualifying paper in computational phonology, which is about as far from the outside world as you can get. It’s just complex list processing, which plays to Haskell’s strengths rather than its weaknesses. I had been worried about how to present the algorithm in the paper—I don’t like math notation much, and I didn’t think I could get away with Python. I didn’t want to use Python in any case, because the existing phonology code I had was the same dense, ugly stuff I had been writing for the last few years. Even if I put it in there, I wouldn’t blame my advisors for not wanting (or being able) to read it.
Not only is functional code pretty in Haskell, Haskell has type annotations. So I could annotate each snippet with its type, making it easier (in my opinion) to follow than math notation. I ported the existing algorithms to Haskell and started to work on my new algorithm.
The rest is history. (I always wanted to say that.)
Of course I still write a good bit of Python; my syntax distance code uses it for glue code, but not complex algorithms. And the CL group at IU still uses it as a lingua franca. (Markus and Mike like it and Sandra has learnt it, and is now teaching it, so all the new students know it.) For that reason, when I tried implementing the Chu-Liu-Edmonds dependency parser over Christmas weekend, I made a quick translation to Python after I finished the Haskell draft. I thought, “this will be like the old days when I back-translated code to Java to see how much better Python was".
But it wasn’t like that at all. It was very disappointing. The Python code was longer, but only by 20-30% more lines, and the byte count was nearly the same. Admittedly, the code was a Haskell translation, but I tried to be as idiomatic as I could. I even had a class! More than anything, the languages are just different: things that are short in Haskell are long in Python, and vice versa. The Python is easier to understand—from an imperative standpoint. The Haskell is easier to understand from a functional standpoint. So there’s no clear winner like I was hoping for.
I guess those days really are gone forever. And it’s all Dan Friedman’s fault. He’s the one who forced us all in 511 to see that all languages are broken and inadequate, each in their own way.
*PLT has them, but they’re low quality–everybody else is minimalist because that’s the intention of Scheme. Maybe R6RS will someday fix this. I doubt it. Scheme is perfect the way it was.
**Also it has a REPL.