« Parallelising embarrassingly parallel codeSwitch Subversion to Git: a list of links »

Python vs Haskell : An unsatisfying exercise in comparative code linguistics

05/01/10

Permalink 10:07:49 am, 1371 words
Categories: Code, Linguistics

Python vs Haskell : An unsatisfying exercise in comparative code linguistics

I’ve been using Haskell for a little over a year and I’ve used Python (off and on) since 2003. Both feature heavily in my dissertation experiment. Now that I’ve used Haskell enough, I think I can write idiomatic programs in both languages. Since, as a linguist, one of my prurient interests is comparative linguistics, I enjoy writing the same thing in both Python and Haskell to see which is better.

The annoying part is that the comparison doesn’t work between Python and Haskell. In the old days I could write something in Java and Python and Python would clearly smoke Java. Or I could write a functional program in Python and Scheme and Scheme would win, except for the parts of functional programming that Guido likes and has added to Python. ("Python: As much Lisp as a C programmer can understand.") Even then I considered that Python cheated because I could write list comprehensions in Scheme if I wanted—everybody else had certainly tried.

But then a couple of things happened. Dan Friedman ruined my faith in Scheme AND Python AND any language, really. In Friedman’s class, you have to get into the head of a language designer. That’s different from the approach of a linguist or a programmer. A linguist tries to describe how the language operates. A programmer critiques how well the parts fit together. Neither have to worry about how to make the language work, or the process of building it.

Have I told this story before? I feel like I have. Maybe I should search backwards in my blog and just post a link. Or maybe not. I’m stuck at my parent’s house without internet access, so you get to suffer through it again. This is the Internet, you can leave if you want to.

Anyway, making yourself a language designer inverts everything. Everything else becomes the toy language and Scheme is the bedrock tool. All the language features that programmers rely on and linguists analyse are useless distractions because you’re writing your own language and you need to CONCENTRATE. All the effort other language implementers spent on inheritance or backtracking or whatever is wasted when it could have been better spent eliminating tail calls: making the compiler/runtime handle byzantine generated code. In the end you are basically left with Scheme and C. (Note how they’re both old, simple languages.) Since Scheme is functional, it’s a better starting place than C for building a functional language**.

The other thing that Friedman’s class did was push me ever further into writing pure functional programs. I had been moving that direction since 2002, ever since reading On Lisp, but 2005 was the year that main was the only code with read and write in it. After Friedman’s class, I switched back to Python because of Scheme’s culture and libraries* and because of the IU comp ling group’s Python culture. I wrote really functional programs in Python. And they were super ugly. Slow, too. Not many bugs, though, and once you understood them, very easy to modify.

So here I was, writing purely (?) functional programs in Python. Very dense, very fast to write, but ugly and hard to pick up after 6 months. I knew something was wrong. The two biggest pain points were the complex data structures and ugliness of advanced functional features. The functional code was dense, but hard to parse visually. But I couldn’t switch back to Scheme because it just pretends the outside world doesn’t exist. You could write an HPSG parser (for example) in either language, but a file-munging script only in Python–it would be 1/4 the size of the Scheme script, which would have to use all sorts of partly documented distro-specific functions.

The tipping point was when I wrote a type-checking wrapper. The second time I spent half a day figuring out the inputs and outputs for 6-month-old functions, I wrote down all the types in comments. The syntax was close to Haskell; I could read Haskell ever since the beginning of 2004 when I got a bunch of free books at a conference, so I knew the type annotations. Then I realised that, because classes are first-class Python objects, I could write the types in my program and have them evaluated. My code changed from

# read :: file * [str] -> [int]
def read(f, regions):
    :

to

@check(file, [str], [int])
def read(f, regions):
    :

Naturally, I knew this was horrifying. And I knew that I should probably just use Haskell. But I didn’t want to switch because I knew Haskell had the same problems as Scheme; no built-ins for scripting and weak libraries besides.

So I didn’t switch that summer. I growled and grumbled and wrote more dense functional Python. Finally, at the end of the summer, I was looking into an alternate measure of distance between trees. It had been a while since I had to write tree code. And you know there is really only one way to write tree code, and that’s recursively. But primitive recursion is so ugly in Python. I couldn’t bring myself to do it, so I wrote out, on paper, the Haskell to implement this measure. It was so nice. Like Scheme, except with destructuring that I could trust! I still had ghc installed on peregrin so I typed it in when I got back to the office.

At that point I decided I had to bite the bullet and switch. I had the perfect project coming up, too: I was starting on a qualifying paper in computational phonology, which is about as far from the outside world as you can get. It’s just complex list processing, which plays to Haskell’s strengths rather than its weaknesses. I had been worried about how to present the algorithm in the paper—I don’t like math notation much, and I didn’t think I could get away with Python. I didn’t want to use Python in any case, because the existing phonology code I had was the same dense, ugly stuff I had been writing for the last few years. Even if I put it in there, I wouldn’t blame my advisors for not wanting (or being able) to read it.

Not only is functional code pretty in Haskell, Haskell has type annotations. So I could annotate each snippet with its type, making it easier (in my opinion) to follow than math notation. I ported the existing algorithms to Haskell and started to work on my new algorithm.

The rest is history. (I always wanted to say that.)

Of course I still write a good bit of Python; my syntax distance code uses it for glue code, but not complex algorithms. And the CL group at IU still uses it as a lingua franca. (Markus and Mike like it and Sandra has learnt it, and is now teaching it, so all the new students know it.) For that reason, when I tried implementing the Chu-Liu-Edmonds dependency parser over Christmas weekend, I made a quick translation to Python after I finished the Haskell draft. I thought, “this will be like the old days when I back-translated code to Java to see how much better Python was".

But it wasn’t like that at all. It was very disappointing. The Python code was longer, but only by 20-30% more lines, and the byte count was nearly the same. Admittedly, the code was a Haskell translation, but I tried to be as idiomatic as I could. I even had a class! More than anything, the languages are just different: things that are short in Haskell are long in Python, and vice versa. The Python is easier to understand—from an imperative standpoint. The Haskell is easier to understand from a functional standpoint. So there’s no clear winner like I was hoping for.

I guess those days really are gone forever. And it’s all Dan Friedman’s fault. He’s the one who forced us all in 511 to see that all languages are broken and inadequate, each in their own way.

*PLT has them, but they’re low quality–everybody else is minimalist because that’s the intention of Scheme. Maybe R6RS will someday fix this. I doubt it. Scheme is perfect the way it was.
**Also it has a REPL.

7 comments

Comment from: Greg Buchholz [Visitor] · http://kerneltrap.org/blog/6714
I thought linguists liked Prolog and Perl?

http://www.cs.lth.se/home/Pierre_Nugues/ilppp/

What do you think of Prolog as a foundational language compared to Scheme?
05/01/10 @ 12:57
Comment from: sandersn [Member]
In my experience, Prolog and Perl are used mostly by European linguists. Most of the code I have seen from the US is Python, Java and C++, Common Lisp for the older stuff. (Specifically, Java and C++ for finished code, Python is what I see people prototyping with.)

As a foundational language, Prolog is not all that different from Scheme except for the backtracking with unification. The problem is that backtracking w/unification is only applicable to a certain set of problems. For everything else it just gets in the way.

It's a lot like writing stateful imperative code in Haskell. It's a giant hassle, but I don't need it very much. Writing non-backtracking code looks the same way in Prolog. If this is something *you* need to do a lot, you should probably look at something that lets you right non-backtracking code easily, and provides backtracking as a less convenient library.
05/01/10 @ 16:27
Comment from: ko [Visitor]
have you tried factor?
factorcode.org
it might be the cross you're looking for
06/01/10 @ 05:02
Comment from: John [Visitor]
> Maybe R6RS will someday fix this. I doubt it. Scheme is perfect the way it was.

I believe an "R7RS" is being worked on now: http://www.scheme-reports.org/
06/01/10 @ 09:04
Comment from: Alex Rudnick [Visitor]
I can appreciate what you're saying about both Scheme (re: cross-implementation portability!) and Python.

I do most of my stuff in Python these days, but I've found myself putting in some un-Pythonic type checking. Your method decorator looks a lot cleaner than the asserts I've been using, though.
07/01/10 @ 23:16
Comment from: Mark [Visitor] Email
Clojure is essentially a "functional Python". Given your experience with Scheme, Python, and Haskell, I'd be willing to bet you'd quickly appreciate how Clojure combines many of the best elements from all those languages. I think you'd find that the things that can be written compactly in Python and in Haskell can all be written compactly in Clojure, and you'd have the clear win you were hoping for.
29/04/10 @ 23:01
Comment from: sandersn [Member]
Thanks for the recommendation, but I'm not really looking for a clear win. Dan Friedman convinced me that there is no such thing. If anything, I am looking for something that is useful for my current project. Right now that's a collection of short, complicated data manipulation programs with small amounts of file I/O, so Haskell fits the bill pretty well. My problem is parallel, not concurrent, so all I need to use multiple cores is to start multiple instances of my code.

As for Clojure, I think it would be better to characterise it as a "functional Lisp" rather than a "functional Python". The only way I think Python sneaks into the comparison is that Clojure is good with real-world tasks. But that's just in comparison to Scheme (which sets out to ignore the real world) and Common Lisp (whose conception of the real world is 25 years old).

Also, I've gone to the dark side and prefer typed languages now. Well, let's not call it dark and light. Maybe orange and blue. I've gone to the blue side. I want my language to remember what type my code is so it has built-in API documentation--for research I am too lazy to generate documentation any other way (well, except for writing a paper).
30/04/10 @ 08:39

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)
powered by b2evolution