| « Monadic parsing in Python, part 2 | Monadic parsing in Python, part 1 » |
Every time I tell myself I should lighten up and learn to enjoy Python’s lightweight imperative implementation of laziness, a non-local-dependency bug bites me. Here’s the most recent one:
In a script, I changed the following fix-up code from
def fix(node):
return '-'.join(map(str.strip, reversed(''.join(node).split(','))))
to
def fix(node):
snode = ''.join(node)
if snode==')':
return 'RRB'
elif snode=='(':
return 'LRB'
else:
return '-'.join(map(str.strip, reversed(''.join(node).split(','))))
Can you see the bug? No? Well, I do a little work twice, joining a list of chars into a string: snode = ‘’.join[node]. I wasn’t sure that this ad-hoc checking for parentheses would work so I didn’t want to refactor the original expression, which also contains ‘’.join(node).
The problem is that node is not a list of chars, but a generator of chars. So it’s use-once. And I used it twice, but the second time, it was unfortunately empty. Oops. Even less fortunately, my program didn’t crash when fix returned an empty string for most inputs, so I didn’t know about this until my script had run for about 5 minutes.
To diagnose this bug, I had to jump up two functions to the main code, then back down two functions into another module. This code is quite lazy, with yield and generators all over the place, so it doesn’t actually read the individual nodes until much later, when fix is called. I thought laziness was the New Python 3.0 Way, but apparently the New Python Way is susceptible to hairy non-local bugs. This is the essence of laziness, to collapse all your nicely-separated code into one giant executing mess, but running into bugs based on the non-locality of the implementation means that the implementation has holes in its abstraction layer.
The use-case is: I create a generator somewhere, then return, pass it around a lot, and finally use the generator twice somewhere else entirely. Remember, this is Python, so I am not constantly reminding myself about the type of something, and, from previous versions of Python and other languages, I expect to be able to use variables more than once.
For example, C# prevents this “use-once” bug by wrapping an additional layer of abstraction around its enumerators. When you create a generator, you don’t get an Enumerator, you get an Enumerable. Using the Enumerable creates a new Enumerator each time, so my above program would have become twice as slow, but at least it wouldn’t have become incorrect.
This is Bad and Wrong and I would suspect that Python 3.4 or whatever will adopt the C# model, *except* that it’s such a glaring problem that somebody surely noticed it during development of 2.4-3.0. It could be that Guido doesn’t like the use-case I present here.
import operator
def fix(node):
"""
Perform some node fix-up.
@param[in] node Sequence of stuff.
@returns Sequence of fixed stuff.
"""
assert operator.isSequenceType(node)
return ''.join(do_stuff()) Comments are closed for this post.