micaParalize - a Parallel Solution for Users of Matlab

MicaProc
A Parallel Matlab Implementation

David Schoenfeld & Jacob Schneider & Peter Lazar
Massachusetts General Hospital, Biostatistics Department
March 2004

Table of contents:

Introduction
Setting up the Server Nodes
Booting the Slave Nodes
Checking on / Killing the Slave Processes
Running micaProc
Running micaParam
Conclusion

Downloads:

Download Software

Introduction:

“Parallel programming remains very difficult and should be avoided if at all possible.”
– Gordon Bell

Goal:

- Provide parallel processing solutions to power users of MatLab

o Do so with a natural interface that doesn’t require complicated programming or serious learning curve (i.e. have it interface nicely with MatLab environment).

Hardware:

We successfully had this software running on a 22-Node ROCKS cluster running RedHat LINUX.

Note:

This project builds on the research of Thomas Abrahamson, Chalmers University of Technology, Sweden and builds on further work done by Jacob Schneider

Setting Up Server Nodes:

In order for the matlab servers to reliably monitor the shared directory, there needs to be a monitor process running on each node. This is accomplished by running the runmatlab.pl script from a priveleged account like

/usr/bin/perl runmatlab.pl &

and then logging off the node. Due to the strange nature of the runmatlab.pl script, it is reccomended to type "logout" before the matlab process on the node comes alive. This is due to matlab's tendency to hang the shell on logout even if it is running in the background.

All this script does is:

while(1)
{
    if (!defined($childid = fork()))
    {
           die "fork failed: $!";
    }
    elsif($childid == 0)
    {
            exec("/opt/matlab6p5/bin/matlab -nosplash -nodesktop -r /home/source/matlab/micaServe >/dev/null 2>&1 &");
            die "something broke: $!";
    }
    else{waitpid($childid, 0);}
}

It will fork off a commandline
/opt/matlab6p5/bin/matlab -nosplash -nodesktop -r /home/source/matlab/micaServe >/dev/null 2>&1 &
and wait for it to die. The commandline launches matlab on the worker with no display and begins executing the micaServe function. When it dies, it immediatley restarts the matlab process. The idling matlab process is relatively light and does not consume more than a few percent of cpu time while waiting. The servers are currently hardcoded to look at the shared directory /home/source/matlab. This can be changed in the first few lines of micaServe.m. In order for the servers to elicit any activity, they must be pointed at the same directory that the clients are pointed to.

Checking on / Killing and Configuring the Worker Nodes:

Once the worker ring is up and running, one can check upon it's status by running the following commands

micaProc('hosts')
or
micaParam('hosts')

This will wait for 20 seconds and return a listing for every node that has replied to the hostname request. It will also have a benchmark and list the hostname. Some nodes are faster than others so some may reply twice.

To shut down the ring, one can type

micaProc('kill')
or
micaParam('kill')

This will transmit a message to each worker node to exit. They will, of course, be immediatley restarted by the monitor. This will clear any initializations and restore the nodes to their base state. Please note that workers nodes in the process of doing an execution will not stop to check for the terminate messages. If you have a runaway computation, use the command from the operating system:

cluster-fork "killall -9 matlab"

To order all of the nodes to shut down their matlab processes. This will not harm the instance on the local machine and the workers will be reborn. To fully shut down the ring:

cluster-fork "killall -9 runmatlab"
cluster-fork "killall -9 matlab"

The monitors will have to be restarted if this is ever done.

To configure the worker nodes with environment variables, a simple interface was implemented that allows one to execute a single statement on all of the worker nodes. It is of the form

micaProc('setenv', 'x=3')
or
micaParam('setenv', 'x=4; y=5')

Note that example two has two statements inside the quotes. The following will NOT work.

micaProc('setenv', 'happy = 3');
micaProc('setenv', 'y = happy+3');

It will complain about happy being undefined in the local environment. If you define it in the local environment, then micaProc will substitiute the value for the term "happy" for the term. For exporting huge datasets, one should use the local matlab environment to create the dataset and (assuming the data object is under the variable "x")

save x

This will save your datastructure into x.mat file in your local client's primary working directory. Move this file to the shared directory and do

micaProc('setenv', 'load x')
or
MicaParam('setenv', 'load x')

Occasionally, one wants to unset a single environment variable or change the definition of a function in the shared directory. Then you should do a

micaProc('setenv', 'clear x')
or
MicaParam('setenv', 'clear x')

To avoid 2 users colliding with eachother, there is a method of setting notification. If one wishes to notify all similar users of their usage of the system, one can run the command

optin

from matlab and run

optout

when they are finished. If any subsequent user runs optin while you are signed in, it will print an error message but will not exit matlab. It is a purely cooperational system but will not lock matlab. The timestamp is to allow users to see if the lock is valid or if it was left up by accident.

================system busy with following user===================
Yourusername
Fri 03/12/2004
03:35 PM
====================Try back later===========================

The windows versions of this utility are optinwin and optoutwin.

Running micaProc:

After the worker nodes are up and running, you're ready to run micaProc, one of the 2 workhorses of the parallel toolkit. The parameters for micaProc are

micaProc('myfunctionname', number_of_processes, parameter1, parameter2.......,parameterx)

micaProc can accept an arbitrary number of input arguments but *must have at least one*. Even if your function works in place, you must give it at least 1 input argument to make micaProc happy. MicaProc will write out a series of files processin_<yourusername><processnumber>.mat into the shared directory. As the servers poll this directory, they will see the processin file and one of the servers will lock the directory and grab the file. If the command succeeds, the server will output a processout_processin_<yourusername><processnumber>.mat which will be read out into a cell array of answers, one cell to each worker . It uses concatenated filenames to avoid filename collisions when more than 1 user is on the system. If there is an error in any of the outputs, there will instead be a file present errormsgprocessin_<yourusername><processnumber>. A typical run looks something like this:

* adder(Y) = Y + 2;

>> X = micaProc('adder', 10, 1)

Initiating process: 1,2,3,4,5,6,7,8,9,10.

Reading data of #: 1 10 2 3 4 5 6 7 8 9

X =

[3] [3] [3] [3] [3] [3] [3] [3] [3] [3]

Please note that the following will not work:

micaProc('x+3', 10, 3)

You cannot define functions inline. The functions can be presented to the workers in 2 ways.

1. For simpler functions, one should simply place a .m file containing the function's code into the shared directory. All the workers can now see it.
2. For functions that will be used all the time, one should place it in the shared directory and then from the operating system:

cluster-fork "cp /home/source/matlab/myfunc.m /opt/matlab6p5/toolobox/local/"
then perform a

micaProc('kill')

to reinitialize the workers. They will now have the function loaded at startup.

There is also a version, micaProcdir that can be run from a windows client. It assumes that you have /home/source/matlab as a samba share and that you have mounted it as drive w:/ . Simply use micaProcdir from your local windows matlab client but *do not change the pwd to the network share*. Matlab is very busy in it's pwd and it will lock up for the duration of any computation. MicaProcdir operates from your pwd into w:\ and allows matlab to operate over a network share.

Running micaParam:

•MicaParam is the alternate operating method of the parallel toolkit. It takes the parameters in the following form:

micaParam ( <function name>, <list of parameters for function> )

•

–

The <list of parameters for function> should take the form, { p1, p2, p3, p4;
p5, p6, p7, p8 }

… such that the function takes 4 parameters and should be run twice in parallel – one process with parameters p1 - p4 and the other with parameters p5 - p8.

–OUTPUT: a list of each process’s output indexed by its number. So, output_name{45} = the output of process #45

EXAMPLE:

A =

[.3 .3 .4] [1000] [100] [5]

[.3 .3 .4] [ 500] [100] [5]

>> X = micaParam('simulate', A)

Initiating process: 1,2,3,4,5,6.

Reading data of #: 2 3 1 4 5 6

X{6} =

0.6540 0.8860 0.7540 0.9120 0.9380 0.7760

Other than this, it has the same structure of operation and limitations of micaProc.

There is also a version, micaParamdir that can be run from a windows client as above.

CONCLUSION

For the current state of matlab and the speed, simplicity and transparency of this system, I believe that it is the best suited to the Biostatistics unit's current needs. It does not have any inherent delay associated with scheduling systems and one can run this package by deploying a simple perl script and sharing out a directory. For performance, it is extremely fast and very resilient.