| « Stack Overflow as social network | Monadic parsing in Python, part 4 » |
Last weekend I translated all the Python code in the previous 4 parts to C#. I wanted to test C#’s abilities in the functional programming arena. C# 3.0, among other things, adds compiler support for monads in the form of LINQ. I was not sure how deep the compiler support is, since I think nearly all of the features in LINQ can be written in C# 2.0, just much more verbosely.
I know that timing code is tricky to get right, so here’s the code if you want to critique it. I didn’t look too hard to see if something like this is built-in either, so suggestions are welcome. In any case, the differences are so great that I don’t expect that accuracy is much of an issue.
private static long TimeIt<A, B>(Func<A, B> f, A arg)
{
long start = DateTime.Now.Ticks;
for(int i = 0; i < 10000; i++)
f(arg);
long end = DateTime.Now.Ticks;
return end - start;
}
The complete C# code is available here. I’m not going to paste any of it to the blog because it is very similar to the Python, except with type annotations. The only difference is that C# still doesn’t support non-local tuples, so I had to define a few custom types up front to hold parser progress. This was a pain but made direct manipulation of parser state a lot prettier.
| Direct Parser Combinators | 781,250 | 1.66x | |
| HOF Monadic Parser Combinators | 15,781,250 | 33.72x | |
| LINQ Monadic Parser Combinators | 1,562,500 | 3.33x | |
| State Machine with Stack | 468,750 | 1.0x |
Unsurprisingly, a state machine loop is still the fastest. With a compiler, there’s not a huge hit for using parser combinators—it’s less than twice as slow. When you start passing functions to functions to functions, though, things slow down a lot. The Higher-Order-Function style is over 20 times slower than the direct style, and over 30 times slower than the state machine. Finally, when you Officially implement the LINQ monad functions, it’s almost exactly twice as slow as direct style.
I guess those numbers aren’t too surprising. Eric Lippert’s blog talks an inordinate amount about overload resolution, including a lot about higher-order functions, so I suspect it is slow. That may not be the main cause, but it’s a guess. (I have yet to find the profiling tools for C#. I suspect they are either for-pay add-ons or part of Visual Studio Chewy Bacon Edition—one of the editions not available for free to college students.)
The most surprising is LINQ, I guess. There should be almost as many higher-order functions flying around in this version as in the explicitly HOF one, but it’s only 3x as slow as the state machine. I can only guess that LINQ gets a lot of attention from the compiler when it optimises. Since the LINQ-compatible parser combinators are the prettiest of the lot, I would recommend it for small parsing tasks that don’t justify use of a yacc/AntLR tool which would generate the state machine.
Well, not MY code, but SOMEBODY’s parser combinator library. This is such a common idea there should be an industrial strength one out there somewhere. LINQ To Parsing, maybe?
All right, I can’t resist showing you a little bit of how monadic parsing looks:
class Sexp : Parser<Tree<string>>
{
public override State<Tree<string>> Parse(Position pos)
{
return new Or<Tree<string>>(
from _1 in new Chr('(')
from l in new SexpList()
from _2 in new SkipSpace()
from _3 in new Chr(')')
select new Tree(l.First.Value.Value, l.Skip(1)),
from a in new Atom() select new Tree<string>(a)).Parse(pos);
}
}
I had to name throwaway variables for some parsers, so I used _1, _2, _3, etc. Also, unlike another C# parser combinator blog post, I didn’t make Or an extension method of Parser. Oops, that would have appeared much more OOP.
PS: I don’t know what ‘ticks’ are, but I think they are microseconds. The Python/C# numbers are incomparable, though, because Python was running on a 1.3 Ghz PPC G4 and C# was running on a 2.0 Ghz Core Duo in a VM with access to only one core.
from pyparsing import Suppress, Forward, Regex, Group, ZeroOrMore
LPAREN = Suppress("(")
RPAREN = Suppress(")")
Sexp = Forward()
Atom = Regex(r"[^() \n]+")
Sexp << ( Group(LPAREN + Atom + ZeroOrMore(Sexp) + RPAREN) | Atom )
from pprint import pprint
pprint( Sexp.parseString(test2).asList() )
[['S',
['VVPS', 'POPPHH'],
['VNDD', 'HVPS'],
['VVIV', 'VVSN'],
['NP',
['CONJP',
['NN', 'PR'],
['IK++', 'PN'],
['NN', 'PR'],
['IK++', 'NNDDSS'],
['NN', 'PORP'],
['IK++', 'POOP'],
['NN', 'VVPT'],
['RO', 'PR']],
['NN__SS', 'PODP']],
['NP', ['PN____GG', 'NNDD'], ['NN', 'IP']]]]