MicaProc
A Parallel Matlab Implementation
David Schoenfeld
& Jacob Schneider & Peter Lazar
Massachusetts General Hospital, Biostatistics Department
March 2004
Table of contents:
Introduction:
“Parallel
programming remains very difficult and should be avoided if at all
possible.”
– Gordon Bell
Goal:
-
Provide parallel processing solutions to power users of MatLab
o
Do so with a natural interface that doesn’t require
complicated programming or serious learning curve (i.e. have it
interface nicely with MatLab environment).
Hardware:
We successfully had this software running on a
22-Node ROCKS cluster running RedHat LINUX.
Note:
This project builds on the research of Thomas
Abrahamson, Chalmers University of Technology, Sweden and builds on
further work done by Jacob Schneider
Setting
Up Server Nodes:
In order for the matlab servers to reliably monitor the shared
directory, there needs to be a monitor process running on each node.
This is accomplished by running the runmatlab.pl script from a
priveleged account like
/usr/bin/perl runmatlab.pl &
and then logging off the node. Due to the strange nature of the
runmatlab.pl script, it is reccomended to type "logout" before the
matlab process on the node comes alive. This is due to matlab's
tendency to hang the shell on logout even if it is running in the
background.
All this script does is:
while(1)
{
if (!defined($childid = fork()))
{
die "fork failed:
$!";
}
elsif($childid == 0)
{
exec("/opt/matlab6p5/bin/matlab -nosplash -nodesktop -r
/home/source/matlab/micaServe >/dev/null 2>&1 &");
die "something
broke: $!";
}
else{waitpid($childid, 0);}
}
It will fork off a commandline
/opt/matlab6p5/bin/matlab -nosplash -nodesktop -r
/home/source/matlab/micaServe >/dev/null 2>&1 &
and wait for it to die. The commandline launches matlab on the worker
with no display and begins executing the micaServe function. When it
dies, it immediatley restarts the matlab process. The idling matlab
process is relatively light and does not consume more than a few
percent of cpu time while waiting. The servers are currently hardcoded
to look at the shared directory /home/source/matlab. This can be
changed in the first few lines of micaServe.m. In order for the servers
to elicit any activity, they must be pointed at the same directory that
the clients are pointed to.
Checking on / Killing
and Configuring the Worker Nodes:
Once the worker ring is up and running, one can check upon it's status
by running the following commands
micaProc('hosts')
or
micaParam('hosts')
This will wait for 20 seconds and return a listing for every node that
has replied to the hostname request. It will also have a benchmark and
list the hostname. Some nodes are faster than others so some may reply
twice.
To shut down the ring, one can type
micaProc('kill')
or
micaParam('kill')
This will transmit a message to each worker node to exit. They will, of
course, be immediatley restarted by the monitor. This will clear any
initializations and restore the nodes to their base state. Please note
that workers nodes in the process of doing an execution will not stop
to check for the terminate messages. If you have a runaway computation,
use the command from the operating system:
cluster-fork "killall -9 matlab"
To order all of the nodes to shut down their matlab processes. This
will not harm the instance on the local machine and the workers will be
reborn. To fully shut down the ring:
cluster-fork "killall -9 runmatlab"
cluster-fork "killall -9 matlab"
The monitors will have to be restarted if this is ever done.
To configure the worker nodes with environment variables, a simple
interface was implemented that allows one to execute a single statement
on all of the worker nodes. It is of the form
micaProc('setenv', 'x=3')
or
micaParam('setenv', 'x=4; y=5')
Note that example two has two statements inside the quotes. The
following will NOT work.
micaProc('setenv', 'happy = 3');
micaProc('setenv', 'y = happy+3');
It will complain about happy being undefined in the local environment.
If you define it in the local environment, then micaProc will
substitiute the value for the term "happy" for the term. For exporting
huge datasets, one should use the local matlab environment to create
the dataset and (assuming the data object is under the variable "x")
save x
This will save your datastructure into x.mat file in your local
client's primary working directory. Move this file to the shared
directory and do
micaProc('setenv', 'load x')
or
MicaParam('setenv', 'load x')
Occasionally, one wants to unset a single environment variable or
change the definition of a function in the shared directory. Then you
should do a
micaProc('setenv', 'clear x')
or
MicaParam('setenv', 'clear x')
To avoid 2 users colliding with eachother, there is a method of setting
notification. If one wishes to notify all similar users of their usage
of the system, one can run the command
optin
from matlab and run
optout
when they are finished. If any subsequent user runs optin while you are
signed in, it will print an error message but will not exit matlab. It
is a purely cooperational system but will not lock matlab. The
timestamp is to allow users to see if the lock is valid or if it was
left up by accident.
================system busy with following user===================
Yourusername
Fri 03/12/2004
03:35 PM
====================Try back later===========================
The windows versions of this utility are optinwin and optoutwin.
Running
micaProc:
After the worker nodes are up and running, you're ready to run
micaProc, one of the 2 workhorses of the parallel toolkit. The
parameters for micaProc are
micaProc('myfunctionname', number_of_processes, parameter1,
parameter2.......,parameterx)
micaProc can accept an arbitrary number of input arguments but *must
have at least one*. Even if your function works in place, you must give
it at least 1 input argument to make micaProc happy. MicaProc will
write out a series of files
processin_<yourusername><processnumber>.mat into the shared
directory. As the servers poll this directory, they will see the
processin file and one of the servers will lock the directory and grab
the file. If the command succeeds, the server will output a
processout_processin_<yourusername><processnumber>.mat
which will be read out into a cell array of answers, one cell to each
worker . It uses concatenated filenames to avoid filename collisions
when more than 1 user is on the system. If there is an error in any of
the outputs, there will instead be a file present
errormsgprocessin_<yourusername><processnumber>. A typical
run looks something like this:
* adder(Y) = Y + 2;
>>
X = micaProc('adder', 10, 1)
Initiating
process: 1,2,3,4,5,6,7,8,9,10.
Reading
data of #: 1 10 2 3 4 5 6 7 8 9
X =
[3]
[3] [3]
[3] [3]
[3] [3]
[3] [3]
[3]
Please note that the following will not work:
micaProc('x+3', 10, 3)
You cannot define functions inline. The functions can be presented to
the workers in 2 ways.
1. For simpler functions, one should simply place a .m file containing
the function's code into the shared directory. All the workers can now
see it.
2. For functions that will be used all the time, one should place it in
the shared directory and then from the operating system:
cluster-fork "cp /home/source/matlab/myfunc.m
/opt/matlab6p5/toolobox/local/"
then perform a
micaProc('kill')
to reinitialize the workers. They will now have the function loaded at
startup.
There is also a version, micaProcdir that can be run from a windows
client. It assumes that you have /home/source/matlab as a samba share
and that you have mounted it as drive w:/ . Simply use micaProcdir from
your local windows matlab client but *do not change the pwd to the
network share*. Matlab is very busy in it's pwd and it will lock up for
the duration of any computation. MicaProcdir operates from your pwd
into w:\ and allows matlab to operate over a network share.
Running
micaParam:
•MicaParam is
the alternate operating method of the parallel toolkit. It takes the
parameters in the following form:
micaParam ( <function name>, <list of parameters for
function> )
•
–
The <list of parameters for function> should take the form, {
p1, p2, p3, p4;
p5, p6, p7, p8 }
… such that the function takes 4 parameters and should be run twice in
parallel – one process with parameters p1 - p4 and the other with
parameters p5 - p8.
–OUTPUT: a list of
each process’s output indexed by its number. So, output_name{45} = the
output of process #45
EXAMPLE:
A
=
[.3 .3 .4]
[1000] [100] [5]
[.3 .3 .4]
[ 500] [100] [5]
[.3 .3 .4]
[ 500] [100] [5]
[.3 .3 .4]
[ 500] [100] [5]
[.3 .3 .4]
[ 500] [100] [5]
[.3 .3 .4]
[ 500] [100] [5]
>> X =
micaParam('simulate', A)
Initiating
process: 1,2,3,4,5,6.
Reading data
of #: 2 3 1 4 5 6
X{6}
=
0.6540
0.8860 0.7540 0.9120
0.9380 0.7760
Other than this, it has the same structure of operation and limitations
of micaProc.
There is also a version, micaParamdir that can be run
from a windows
client as above.
CONCLUSION
For the current state of matlab and the speed, simplicity and
transparency of this system, I believe that it is the best suited to
the Biostatistics unit's current needs. It does not have any inherent
delay associated with scheduling systems and one can run this package
by deploying a simple perl script and sharing out a directory. For
performance, it is extremely fast and very resilient.