Parallel For Loop, Parallel Simulate and Parallel Bootstrap for DCT


Description


The following 3 utility functions pfor, pbootstrap and psimulate are utility wrapper functions for the new distributed computing toolbox. They are designed to encapsulate the functionality of the DCT for these commonly used pieces of code. Here are their descriptions:

pbootstrp2: The parallel bootstrap:
theanswer = pbootstrp2(varargin)

The parallel bootstrap has the same exact call pattern as the serial bootstrap. Since the serial bootstrap's first argument is nboot, the number of bootstraps, the pbootstrap extracts this number as the number of parallel runs and sends this to the worker pool using the remainder of the arguments. It is most effective when the bootstrap function takes a considerable time to run as there is some setup overhead.

psimulate2: The parallel simulate
theanswer = psimulate2(nruns, filedep, fxn, varargin)

psimulate performs nruns iterations of fxn. Fxn is a string containing a function name. Filedep is a cell array of strings describing the files and directories to search for user defined functions. psimulate returns a cell array of results after posting nruns executions of the simulation to the DCT cluster.

pfor2: The parallel for
theanswer = pfor2(itervar, from, to, returnthis, filedep, jobdata, body)

Pfor will generate from-to number of tasks to execute the statements in 'body'. You place a string in itervar like "i" to designate i as the iterator. Inside body, you can then write something along the lines of

'load(''/home/source/matlab/mydir/mydata.mat''); myout=myfxn(mydata{i});'

and "i" will be set to the current iteration value. "Returnthis" is a cell array of strings containing the variable names to return. The return is always a cell with one or more items inside (the single return makes the dct programing straightforward). Filedep is the same as the filedependencies variable on dfeval. It is a cell array of strings containing filenames or directory names to include for user functions. Jobdata is a variable name of a single array to be passed to all of the jobs. If used, you will need to do inside your code:

x=getcurrentJob; mydata=get(x,'JobData');

Then reference the mydata variable to access this data.

Here is an example to return the iterator squared 1-5 . "a" is the iterator and b is returned.

d=pfor2('a',1,5,{'b'},{},"",'b=a*a;');

and using the jobdata option.

d=pfor2('a',1,5,{'b'},{},myarray,'job=getCurrentJob; b=get(job,''JobData'');b{a}=b{a}*a;');

When all is done, the pfor will make a file pforfxn.m from your body code and pass this into the dct master. Please note that array population will have no effect inside a loop due to the distributed nature of the environments. It is best to return values and have the enduser populate arrays with loop results.



Embedded Usage Notes

%PBOOTSTRP, the parallel bootstrap function
%the syntax is: theanswer = pbootstrp(varargin)
%the call is identical to nonparallel bootstrap
%The function extracts the first argument and determines the number of tasks to create
%The remainder of the arguments are used as arguments to the function given to the bootstrap
%The function returns a cell array of outputs.

%PSIMULATE, parrallel simulate
%The syntax is:
%theanswer = psimulate(nruns, thefxn, varargin)
%Where nruns is the number of times to run your simulation function.
%fxn is a string containing the name of your function
%The remaining arguments are then used as arguments to the function in thefxn.
%This will return a cell array of outputs.

%PFOR, the parralellized for loop
%The syntax is: theanswer = pfor(itervar, from, to, returnthis, newdir, setenv, body)
%Where itervar is a string containing the variable name you wish to use for the iterator.
%Itervar's variable can be referenced in the body of your loop statement.
%From and to are the limits of the iteration. The function will step in increments of one from 'from' to 'to'.
%Returnthis is a cell array of strings containing variable names to be returned
%FileDep is a cell array of strings describing files and directories to add to the path.
%Jobdata is a single data structure to be made visible to all workers
%Body is a single string containing ';' separated statements that compromise the body of each 'for' iteration.
%Please note that it is not possible to "accumulate" a variable value using pfor due to the distributed nature of the underlying code.
%Body can accept a script if the script filename is specified (with extension) as seen relative to the calling instance of matlab.

Author(s)

Peter Lazar plazar@amber.mgh.harvard.edu and David Schoenfeld dschoenfeld@partners.org