« Hacking viper-mode for consistencyPython vs Haskell : An unsatisfying exercise in comparative code linguistics »

Parallelising embarrassingly parallel code

05/01/10

Permalink 12:06:34 pm, 656 words
Categories: Code, Linguistics

Parallelising embarrassingly parallel code

After working at Bing, I understand the scientific computing term “embarrassingly parallel". It means that when you describe your dissertation code to somebody, you’re embarrassed to admit that it’s not parallelised.

My dissertation measures syntactic distances between different villages in Sweden. The distance between each pair of villages is calculated independent of all the rest, so the second question the other Microsofties asked me was always, “How do you parellelise the problem?". Well, I wasn’t. My excuse was that our experimental machine only has two processors and I wanted to leave one free for other people to use.

But it preyed on my mind all through the internship at Bing. I didn’t have time to do anything about it in November because I was busy with my proposal. Finally, running out of things I could do unwired over Christmas vacation at my parents, I looked up my offline copy of the Python documentation and wrote a function to run multiple files. Here it is.

import subprocess
def multirun(n, tasks, files):
    processes = [subprocess.Popen(tasks[i], stdout=file(files[i],'w'))
                 for i in range(n)]
    i = n
    while processes != []:
        subprocess.Popen(['sleep', '1']).wait()
        processes, dones = partition(lambda p:p.poll() is None, processes)
        if i < len(tasks):
            for _ in dones:
                print("Starting", ' '.join(tasks[i]))
                processes.append(subprocess.Popen(tasks[i],
                                              stdout=file(files[i], 'w')))
                i += 1
def partition(f, l):
    yes = []
    no = []
    for x in l:
        if f(x):
            yes.append(x)
        else:
            no.append(x)
    return (yes,no)

You pass multirun the number of processors to use, a list of tasks, and a list of output files to redirect stdout to. Each task is a list of strings, with the first item the command and subsequent items its arguments. (I sincerely suspect that this is already built-in to Python 3, or at least a library, but I didn’t have internet access, so I just wrote multirun based on the 2.5 documentation on my laptop. If you know the standard equivalent, please tell me in the comments.)

Now that I’m back at work on my experiment, I ran this code on our new 8-core Mac Pro server ("if you have to ask how much, you can’t afford it"). It’s a late 2008 machine, 5 years newer than the 2-core Power Mac that the CL group usually runs experiments on.

The difference is amazing: what took 2 hours is down to 2 minutes. My entire program runs in 4 minutes. The new machine is 60 times faster. 10 times of that difference is the speed of each chip. The other 6 times is the fact that my code uses 6 of the 8 cores. That means, if you write parallel code, chips are still doing better than the popular version of Moore’s law: in 5 years speed doubled 6 times, for a doubling period of 10 months, not 18.

In case you are curious, here is the code that uses multirun. (swediaSites is a list of sites in the swedia corpus, imported from a module of constants.)

def pairwise(l):
    return [(x,y) for i,x in enumerate(l) for y in l[i+1:]]
### runner ###                                                                  
def runner(feature):
    params = open('params.h','w')
    params.write('#define ITERATIONS 100\n')
    params.write('#define SAMPLES 1000\n')
    params.write('#define R_MEASURE r')
    params.close()

    os.system('g++ -O2 -o ctrl.out params.h icectrl.cpp')
    suffix = '-' + feature + '.dat'
    ctrl = "nice -n 6 ./ctrl.out".split()
    pairs = pairwise(swediaSites)
    tasks = [ctrl + [sq(fro+suffix), sq(to+suffix)] for (fro,to) in pairs]
    files = ['dist-%s-%s-tmp.txt' % (fro,to) for (fro, to) in pairs]
    return (tasks,files)
def sq(s):
    return "'" + s + "'"
def combine(feature):
    "Combine the disparate output files into one"
    out = 'dist-100-1000-r-%s-interview.txt' % (feature,)
    pairs = pairwise(swediaSites)
    files = ['dist-%s-%s-tmp.txt' % (fro,to) for (fro,to) in pairs]
    outf = open(out, 'w')
    for file in files:
        outf.write(open(file).read())
    outf.close()
def syntaxDist():
    multirun(6, *runner('path'))
    combine('path')
    multirun(6, *runner('trigram'))
    combine('trigram')
    multirun(6, *runner('dep'))
    combine('dep')

4 comments

Comment from: Bill Mill [Visitor] · http://billmill.org
> I sincerely suspect that this is already built-in to Python 3, or at least a library

Yup, looks to me like you re-invented processing: http://pypi.python.org/pypi/processing .
05/01/10 @ 12:35
Comment from: Bill Mill [Visitor] · http://billmill.org
I should add, that library's for python 2.x, it's called "multiprocessing" in python 3 and is built-in: http://docs.python.org/3.1/library/multiprocessing.html
05/01/10 @ 12:37
Comment from: sandersn [Member]
Thanks, I will look it up.
05/01/10 @ 16:34
Comment from: sandersn [Member]
It looks like the equivalent to my code is

from multiprocessing import Pool
def runctrl(pair):
....os.system("./ctrl.out '%s' '%s'" % pair)
Pool(processes=6).map(runctrl, pairwise(swediaRegions))

Or something like that. That's simplified
05/01/10 @ 16:40

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)
free blog