| « Hacking viper-mode for consistency | Python vs Haskell : An unsatisfying exercise in comparative code linguistics » |
After working at Bing, I understand the scientific computing term “embarrassingly parallel". It means that when you describe your dissertation code to somebody, you’re embarrassed to admit that it’s not parallelised.
My dissertation measures syntactic distances between different villages in Sweden. The distance between each pair of villages is calculated independent of all the rest, so the second question the other Microsofties asked me was always, “How do you parellelise the problem?". Well, I wasn’t. My excuse was that our experimental machine only has two processors and I wanted to leave one free for other people to use.
But it preyed on my mind all through the internship at Bing. I didn’t have time to do anything about it in November because I was busy with my proposal. Finally, running out of things I could do unwired over Christmas vacation at my parents, I looked up my offline copy of the Python documentation and wrote a function to run multiple files. Here it is.
import subprocess
def multirun(n, tasks, files):
processes = [subprocess.Popen(tasks[i], stdout=file(files[i],'w'))
for i in range(n)]
i = n
while processes != []:
subprocess.Popen(['sleep', '1']).wait()
processes, dones = partition(lambda p:p.poll() is None, processes)
if i < len(tasks):
for _ in dones:
print("Starting", ' '.join(tasks[i]))
processes.append(subprocess.Popen(tasks[i],
stdout=file(files[i], 'w')))
i += 1
def partition(f, l):
yes = []
no = []
for x in l:
if f(x):
yes.append(x)
else:
no.append(x)
return (yes,no)
You pass multirun the number of processors to use, a list of tasks, and a list of output files to redirect stdout to. Each task is a list of strings, with the first item the command and subsequent items its arguments. (I sincerely suspect that this is already built-in to Python 3, or at least a library, but I didn’t have internet access, so I just wrote multirun based on the 2.5 documentation on my laptop. If you know the standard equivalent, please tell me in the comments.)
Now that I’m back at work on my experiment, I ran this code on our new 8-core Mac Pro server ("if you have to ask how much, you can’t afford it"). It’s a late 2008 machine, 5 years newer than the 2-core Power Mac that the CL group usually runs experiments on.
The difference is amazing: what took 2 hours is down to 2 minutes. My entire program runs in 4 minutes. The new machine is 60 times faster. 10 times of that difference is the speed of each chip. The other 6 times is the fact that my code uses 6 of the 8 cores. That means, if you write parallel code, chips are still doing better than the popular version of Moore’s law: in 5 years speed doubled 6 times, for a doubling period of 10 months, not 18.
In case you are curious, here is the code that uses multirun. (swediaSites is a list of sites in the swedia corpus, imported from a module of constants.)
def pairwise(l):
return [(x,y) for i,x in enumerate(l) for y in l[i+1:]]
### runner ###
def runner(feature):
params = open('params.h','w')
params.write('#define ITERATIONS 100\n')
params.write('#define SAMPLES 1000\n')
params.write('#define R_MEASURE r')
params.close()
os.system('g++ -O2 -o ctrl.out params.h icectrl.cpp')
suffix = '-' + feature + '.dat'
ctrl = "nice -n 6 ./ctrl.out".split()
pairs = pairwise(swediaSites)
tasks = [ctrl + [sq(fro+suffix), sq(to+suffix)] for (fro,to) in pairs]
files = ['dist-%s-%s-tmp.txt' % (fro,to) for (fro, to) in pairs]
return (tasks,files)
def sq(s):
return "'" + s + "'"
def combine(feature):
"Combine the disparate output files into one"
out = 'dist-100-1000-r-%s-interview.txt' % (feature,)
pairs = pairwise(swediaSites)
files = ['dist-%s-%s-tmp.txt' % (fro,to) for (fro,to) in pairs]
outf = open(out, 'w')
for file in files:
outf.write(open(file).read())
outf.close()
def syntaxDist():
multirun(6, *runner('path'))
combine('path')
multirun(6, *runner('trigram'))
combine('trigram')
multirun(6, *runner('dep'))
combine('dep')