Thursday, 24 April 2008

Parallel Quicksort in Erlang

Jaksa is doing some posts on parallel graph algorithm in Java using fork/join, and he mentions quicksort:

"The equivalent of the hello world for parallel languages is the quicksort algorithm"

Now I'm tempted to say something like "Oh yea? You know how many lines of code quicksort is in Erlang? 3!" Yes really:


qsort([]) -> [];
qsort([Pivot|Rest]) ->
qsort([ X || X <- Rest, X < Pivot]) ++ [Pivot] ++ qsort([ Y || Y <- Rest, Y >= Pivot]).


But this would be wrong, as the terseness of the quicksort implementation says more about the power of list comprehensions than about Erlang's concurrency-supporting language features.

So then let's see how you could implement a concurrent quicksort in Erlang, which will say something about the way in which we can implement concurrency in Erlang. We'll start with the original Wikipedia quicksort from above, and alter it a little bit to make it more obvious what I'm going to do next:


qsort([]) -> [];
qsort([Pivot|Rest]) ->
Left = [X || X <- Rest, X < Pivot],
Right = [Y || Y <- Rest, Y >= Pivot],
qsort(Left) ++ [Pivot] ++ qsort(Right).


Note the two calls to qsort at the end. Since there is no shared state between these two functions, they can be executed in parallel, and the results can be concatenated to yield the sorted result. Depending on the distribution of the input, each of these calls could be split up into 2 more qsort calls etc. etc. Note that you could do a hybrid recursive-tail-recursive solution which sorts the smallest partition using normal recursion and the largest using tail-recusrion, but that is not the aim here.

Let's assume that we already have a function that executes any input function in a separate Erlang process called "peval". (Note Erlang processes are extremely light-weight and take very little time to construct. They're are not modelled as operating system processes). peval takes two arguments, a function to execute, and the arguments to the function. When called, it looks like a normal function call:


(emacs@21ccw.blogspot.com)38> pqsort:peval(fun(X) -> X*X end, 13).
169


But, in the background, it spawns a process and the function is evaluated on this new process. The result is then returned to the calling process. Using peval, we now have a parallel quicksort:


pqsort([]) -> [];
pqsort([Pivot|Rest]) ->
Left = [X || X <- Rest, X < Pivot],
Right = [Y || Y <- Rest, Y >= Pivot],
peval(fun pqsort/1, Left) ++ [Pivot] ++ peval(fun pqsort/1, Right).


Now there is a significant observation to make. The concurrent and non-concurrent versions look remarkably similar, we only had to substitute the original function evaluations with parallel versions. This is one of the reasons why I think functional languages are better suited to concurrency that OO ones.

Let's look at peval. peval creates a new process using spawn_link. spawn_link() is used instead of spawn(), so that the creator process receives exception when the new child process throws an exception.


peval(Fun, Args) ->
Pid = spawn_link(fun() -> wait() end),
Pid ! {self(), Fun, Args},
receive
{Pid, R} -> R
end.

wait() ->
receive
{From,Fun,Args} ->
From ! {self(), Fun(Args)}
end.


That's it! Parallel quicksort in about 15 lines of Erlang code. If I add a "+" output to console on every entry to peval, and "-" on every exit from peval, you get the following output:


(emacs@21ccw.blogspot.com)18> L1 = lists:map(fun(X) -> trunc(random:uniform()*100) end, lists:seq(1,100)).
[57,33,2,85,59,65,61,71,83,75,94,94,19,5,79,53,31,54,39,61,
61,79,55,6,37,35,86,31,88|...]

(emacs@21ccw.blogspot.com> pqsort:pqsort(L1).
+++-++++-+--++-+++++-+--+--+--+++-+--++-+------+++
+-++-+---+++-+--+---++-+-----+++++-+--++-+---+++++
-+--+--+++-+--+---++-+----++-++-++-+------++++-+--
++++-+--++-+---++++-+--++-+---++++-+--+++-++-+---+
++-+--+----+-----+++-+++-+--+++-++-+---+----++-+--
--
[2,4,5,6,7,9,11,13,14,17,18,19,20,21,22,24,27,31,32,33,35,
37,38,39,40,42,44,47,48|...]

(emacs@21ccw.blogspot.com)5> pqsort:integrate_output("+++-++++-+--++-+++++-+--+--+--+++-+--++-+------+++").
0123234565654565678910910989878767898987898987654345...


You can see that for the first few pevals, by integrating over the output, that number of parallel processes goes up to around 8-10. This will depend on the distribution, and the relative times it takes to initialise processes to doing the actual computation etc.

Now that we've got pqsort going, I'm looking forward to doing the parallel graph algorithms of Jaksa's fork/join graph algorithms. There will be an obstacle to ensure that no nodes are visited more than once, but this could (possibly) be solved by partitioning the graph into subgraphs first where no children have multiple parents. We'll see...

Update (7 May 2008): This implementation is wrong! (see comments). There's a follow-up post here: Parallel Quicksort in Erlang - Part II

4 comments:

jaksa said...

Hey Ben! I'm not very familiar with Erlang syntax, but is this really parallel or just executing different processes one after another? Shouldn't you be spawning each "subtask" at the same time, and then waiting for all of them to finish? Try writing a nonblocking peval() and then a wait() that takes a list of Pids. That should do the trick. ;)

Benjamin Nortier said...

You are right! It was a bit naive, thanks for pointing it out :) I've posted a follow-up...

http://21ccw.blogspot.com/2008/05/parallel-quicksort-in-erlang-part-ii.html

Curtis said...

Thanks for your posts on quicksort in Erlang. You have a small bug that causes duplicate values to be dropped. Amend the following line like so, to include elements equal to Pivot:

Right = [Y || Y <- Rest, Y >= Pivot],

Benjamin Nortier said...

Thanks!