Parallel For Loop, Parallel Simulate
and Parallel Bootstrap for DCT
Description
The following 3 utility functions pfor, pbootstrap and psimulate are
utility wrapper functions for the new distributed computing toolbox.
They are designed to encapsulate the functionality of the DCT for these
commonly used pieces of code. Here are their descriptions:
pbootstrp2: The parallel bootstrap:
theanswer = pbootstrp2(varargin)
The parallel bootstrap has the same exact call pattern as the serial
bootstrap. Since the serial bootstrap's first argument is nboot, the
number of bootstraps, the pbootstrap extracts this number as the number
of parallel runs and sends this to the worker pool using the remainder
of the arguments. It is most effective when the bootstrap function
takes a considerable time to run as there is some setup overhead.
psimulate2: The parallel simulate
theanswer = psimulate2(nruns, filedep, fxn, varargin)
psimulate performs nruns iterations of fxn. Fxn is a string containing
a function name. Filedep is a cell array of strings describing the
files and directories to search for user defined functions. psimulate
returns a cell array of results after posting nruns executions of the
simulation to the DCT cluster.
pfor2: The parallel for
theanswer = pfor2(itervar, from, to, returnthis, filedep, jobdata, body)
Pfor will generate from-to number of tasks to execute the statements in
'body'. You place a string in itervar like "i" to designate i as the
iterator. Inside body, you can then write something along the lines of
'load(''/home/source/matlab/mydir/mydata.mat'');
myout=myfxn(mydata{i});'
and "i" will be set to the current iteration value. "Returnthis" is a
cell array of strings containing the variable names to return. The
return is always a cell with one or more items inside (the single
return makes the dct programing straightforward). Filedep is the same
as the filedependencies variable on dfeval. It is a cell array of
strings containing filenames or directory names to include for user
functions. Jobdata is a variable name of a single array to be passed to
all of the jobs. If used, you will need to do inside your code:
x=getcurrentJob; mydata=get(x,'JobData');
Then reference the mydata variable to access this data.
Here is an example to return the iterator squared 1-5 . "a" is the
iterator and b is returned.
d=pfor2('a',1,5,{'b'},{},"",'b=a*a;');
and using the jobdata option.
d=pfor2('a',1,5,{'b'},{},myarray,'job=getCurrentJob;
b=get(job,''JobData'');b{a}=b{a}*a;');
When all is done, the pfor will make a file pforfxn.m from your body
code and pass this into the dct master. Please note that array
population will have no effect inside a loop due to the distributed
nature of the environments. It is best to return values and have the
enduser populate arrays with loop results.
Embedded Usage Notes
%PBOOTSTRP, the parallel bootstrap function
%the syntax is: theanswer = pbootstrp(varargin)
%the call is identical to nonparallel bootstrap
%The function extracts the first argument and determines the number of tasks to create
%The remainder of the arguments are used as arguments to the function given to the bootstrap
%The function returns a cell array of outputs.
%PSIMULATE, parrallel simulate
%The syntax is:
%theanswer = psimulate(nruns, thefxn, varargin)
%Where nruns is the number of times to run your simulation function.
%fxn is a string containing the name of your function
%The remaining arguments are then used as arguments to the function in thefxn.
%This will return a cell array of outputs.
%PFOR, the parralellized for loop
%The syntax is: theanswer = pfor(itervar, from, to, returnthis, newdir, setenv, body)
%Where itervar is a string containing the variable name you wish to use for the iterator.
%Itervar's variable can be referenced in the body of your loop statement.
%From and to are the limits of the iteration. The function will step in increments of one from 'from' to 'to'.
%Returnthis is a cell array of strings containing variable names to be returned
%FileDep is a cell array of strings describing files and directories to add to the path.
%Jobdata is a single data structure to be made visible to all workers
%Body is a single string containing ';' separated statements that compromise the body of each 'for' iteration.
%Please note that it is not possible to "accumulate" a variable value using pfor due to the distributed nature of the underlying code.
%Body can accept a script if the script filename is specified (with extension) as seen relative to the calling instance of matlab.
Author(s)
Peter Lazar plazar@amber.mgh.harvard.edu
and
David Schoenfeld dschoenfeld@partners.org