| « Monadic parsing in C#, part 5 | Monadic parsing in Python, part 3 » |
Before I start performance testing, here is a simple state-based s-expression parser. I say state-based, but the states are implicit, although you can at least see the stack. This parser takes about half the code size and a tenth of the run time of the best combinator method, so it’s a good indication that combinators are overkill for s-expression parsing.
def runsexp(s):
stack = [[]]
a = ''
for c in s:
if c=='(':
if a:
stack[-1].append((a, []))
a = ''
stack.append([])
elif c==')':
if a:
stack[-1].append((a, []))
a = ''
l = stack.pop()
stack[-1].append((l[0][0], l[1:]))
elif c==' ':
if a:
stack[-1].append((a, []))
a = ''
else:
a += c
if a:
stack[-1].append((a, []))
return stack[0][0]
I’ll test performance using the built-in module timeit. As you can see, this is Python2.5, which is all that I have on my laptop. The slower machine and slower interpreter mean that I only have to repeat the test 1000 times to get fairly large times. I separated the code for each method into its own module and imported them before the test so that Python’s parsing and bytecode generation wouldn’t cause variation.
import timeit
if __name__=="__main__":
t = '''; test2=""" (S (VVPS POPPHH) (VNDD HVPS) (VVIV VVSN)
(NP (CONJP (NN PR) (IK++ PN) (NN PR) (IK++ NNDDSS)
(NN PORP) (IK++ POOP) (NN VVPT) (RO PR))
(NN__SS PODP)) (NP (PN____GG NNDD) (NN IP))) """'''
print timeit.Timer(setup='import sexp_cmb'+t,
stmt='sexp_cmb.runsexp(test2)').repeat(number=1000)
print timeit.Timer(setup='import sexp_monad'+t,
stmt='sexp_monad.runsexp(test2)').repeat(number=1000)
print timeit.Timer(setup='import sexp_oo'+t,
stmt='sexp_oo.runsexp(test2)').repeat(number=1000)
print timeit.Timer(setup='import sexp_state'+t,
stmt='sexp_state.runsexp(test2)').repeat(number=1000)
$python sexp.py [2.053, 2.053, 2.049] [5.597, 5.585, 5.536] [11.366, 11.315, 11.294] [0.292, 0.297, 0.290]
| Parser Combinators | 2.049 | |
| Monadic Parser Combinators | 5.536 | |
| OO Monadic Parser Combinators | 11.294 | |
| State Machine with Stack | 0.290 |
The documentation of timeit says that the minimum time is probably the correct one, but these times have so little variance it doesn’t really matter. The monadic version of the combinators is almost three times as slow as the bare combinators and the OO version is twice as slow again. The state machine is almost ten times as fast as the fastest combinator parsers.
I don’t want to speculate about what’s causing the slowdown without profiling. Which I should probably do now.
(Later) Well, profiling didn’t reveal much, just that the monadic binding is pure overhead; the basic parsers sexp and sexplist are called exactly the same number of times regardless of module.
One moral of this story may be that, even though monads make nice interfaces, you need compiler support to make them efficient enough to use in some cases. How much compiler support, I don’t know. Does Haskell, for example, perform special tricks to get rid of unncessary binds? Or does it just do its normal thing with respect to elimination of unneccessary code? I do know that C# has special compiler support to generate efficient SQL for its LINQ to SQL monad. But then, I wouldn’t expect the previous version of C# to be particularly good at emitting efficient SQL, so it was probably needed.
Hmm…actually, when I get home, it might be kind of cool to check out C#’s support for all this. Since C# has built-in syntactic support for monads, ad-hoc operator overloading is not necessary there, although I need to find out what the minimal monad implementation is in C#.
OK, so no closing moral. Maybe just some more antics, next time in C#.
Here is another take on monadic parsers in Python: funcparserlib (http://code.google.com/p/funcparserlib/).
The >>= function isn't used so much there in favor of less expressive fmap function (named the >> operator in funcpaserlib). So I guess less new function objects are created during parsing.
Parsing is also rather slow. Nevertheless funcparserlib is OK for experimenting with languages and parsing small DSLs: parsers that use the library are quite readable.