Pvm
For parallel programming we use pvm (Parallel Virtual Machine).
See also:
Communication primitives for c++.
PVM homepage
See also the pvm manual available at the lab:
- Chapter 5: for explanation (all except 5.7)
- Chapter 6: example programs
- Appendix B: reference manual of all pvm functions
Run your parallel program
On our LINUX system: Computers
you
can use to run in parallel: parallel1,
parallel2 ... parallel9 (Some might not be running) 9/7/2004
SHOULD BE OK now (mail if problems)
Open 3 consoles:
- To run your master program (like a normal program).
- To run pvm:
- type pvm: you'll get the pvm console, this are the
commands
you can use in pvm (see chapter 3.7 in the pvm manual):
- add machines to the parallel virtual machine: add
parallel1
parallel4
- to show all the machines of the parallel virtual machine: conf
- to show all the running processes with their tids: ps -a
- to stop a running process: kill process_tid
- to stop all running processes: reset
- to show all running jobs: jobs
- to quit pvm with stopping pvm (terminate): halt
- to quit pvm without stopping pvm: quit
- To show the log file (with the messages from the slaves:
pvm errors and slave output )
- cd /tmp (go to the temporary directory)
- tail -f pvmlogfile
- with this command you always see the tail of the file and
this
is updated when pvm writes something to it (-f option from force).
- each user got its own pvm-log file (something like
pvml.19441),
this file name is always the same
- for the student logins, this number is pvml.(19440 + the
number
of your login)
- or type ls pvml* to list all log files
- use Control-c to stop this
Terminate:
Don't forget to halt pvm when you stopped experimenting!!
Otherwise your pvm-processes keep on running! Also when you log out!
Cluster ONLY ACCESSIBLE FROM OUR
LINUX
SYSTEM 9/9/2004
PROBLEMS WITH node6, node8 & node9... they are unusable for the
moment
Do your final experiments (performance measurements) on the cluster
(ask first for permission to your assistent):
- our 8 dedicated cluster computers (333MHz) are called node2,
node3, ... , node9
- note that node6 is a little slower
- our old server parallel6
(biprocessor: 2 x 300MHz) is also available on the cluster as node1
- they are connected by a 100Mb/s non-blocking crossbar swith (a
dynamic interconnection network)
- our server (parallel1)
is also connected to this switch
- the crunch computers are only accessible from the server!! So
work
like this:
- run your master program on parallel1
- also start pvm from parallel1
and add the cluster computers from there
- when adding the nodes the first time, you will be asked (by
RSA)
if you want to continue, answer with yes. Enter your password when
asked.
- to do this: secure shell shell from your computer to parallel1 with command ssh
parallel1 (in a
console)
- so run the 3 consoles (to run your parallel program) parallel1 (also the log file)
Debugging
in pvm
To debug your master program, just use gdb (see doc).
To debug your slaves, start the slave programs with the PvmTaskDebug
flag (together with the normal PvmTaskDefault flag) and spawn the
slaves on
the same machine as the master (don't specify the machine name),
as follows:
- int flags = PvmTaskDefault;
- flags += PvmTaskDebug;
- string machine = "";
- pvm_spawn(task, 0, flags, (char*) machine, nbr, tids);
When the master spawns the slave, it will be in a gdb debug window,
where you can debug your program as always with gdb.
You should start the slave program yourself with r (run),
after
having put breakpoints.
Quit the debug window with q, and then press ENTER to close the
window.
Program Example
Download the code
The .h, the.cc-files and the project
file
Download the code all at once with this
compressed
archive file:
pvmExample.tgz,
extract
the files in a new directory with command:
tar -zxf pvmExample.tgz
Run the program
- create the makefile starting from the project file problem.pro
with qmake and compile it (note that
your
executable problem is put in the pvm3/bin/LINUX directory.)
- qmake problem.pro
- make all
- run your parallel program
- try to understand the program (documentation)
- Study it:
- See all program arguments: problem -h
- Give the array size: problem -s30
- Run on more computers: problem -p3
- Run in debug mode (see Debugging in pvm):
problem -d
- Also run the master under gdb:
gdb problem
Exercises
- Change the task of the parallel program, so that it checks
whether
the parameter is a prime
- Do a small performance analysis:
- Count the number of computations that each slave performs
- Also measure the computation time of each slave
- Is there a load imbalance?
- Measure the communication time and the ratio communication versus
computation
- Solve the load imbalance problem!
- eg: divide the given range in a great number (> p) of
subranges. Send the first subranges to the slaves, send the next
subranges whenever
a slave has finished with its subrange.
Program Example with EPPA instrumentation
On our LINUX system
EPPA is installed under the $PARDIR
directory.
Extract pvmExapleWithEPPA
in
a new directory with command: tar
-zxf pvmExampleWithEPPA.tgz
The example problemEPPA is the same as above, but now
instrumented
with EPPA. Run problemEPPA with the -m
option to activate the EPPA probe. View your results with the EPPA tool
(the
command EPPA in a console or the icon in the toolbar).
On another
system
(with QT, pvm and mysql), EPPA can be installed
Compressed archive file: EPPAFiles.tgz,
- download into your root directory
- extract in your root with command: tar -zxf EPPAFiles.tgz
- goto pvm3 directory: cd
pvm3
- Create everything: make
all (if the makefile is incorrect, recreate it (command qmake) and make (command make all) in the following
directories: pvm3/parlib/src,
pvm3/eppa and pvm3/pvmEPPA)
Contents
- pvm3/bin/LINUX:
put here
your pvm executables
- pvm3/pvmEPPA: the
above
example program, now instrumented with EPPA (run with the -m option to activate the EPPA
probe)
- pvm3/eppa: the
performance tool EPPA to view your experiments
- pvm3/parlib: our
library
with general files
- pvm3/parlib/include:
all header files to be included, interesting ones:
- standardlib.h:
general
things
- EPPAInstrumentation.h:
to instrument your code with an EPPAProbe
- randlib.h for
a random
generator
- pvm3/parlib/lib:
the library parlib.a that
should be linked with your objects
- pvm3/parlib/src:
the source file
More Pvm
PVM reference
manual
- There are basically 3 pvm receive functions:
- pvm_recv
is a blocking receive, it waits until the message has arrived
- pvm_nrecv
is non-blocking, it checks for a message and returns a positive
buffer-id
if a message has arrived
- use
pvm_bufinfo
to check the sender_tid and tag of the message
- pvm_trecv
waits for maximally t time for a message
- A pvm_send will cause pvm to start sending the message in the
background,
your program will be continued and the message is being sent in the
background.While
packing your data before sending, it can be copied to a buffer, or it
can
be sent immediately from where it is. The latter is faster, since no
copy
operation should be made, but than it must be sure that the data is not
accessed/modified
while the send operation is executed (use pvm_probe)
to test if the message has arrived)! For this, use the tag PVMINPLACE instead of PvmDataDefault in your
pvm_initsend.
- Use pvm_config
to get all machines added to pvm
- Use pvm_tasks
to get all tasks running
Advanced Pvm
- pvm_tidtohost:
to know on which machine a proces (with certain tid) is running. It
returns
the deamon tid of the machine, with pvm_config
the name of the machine can than be retrieved.
- Use
pvm_hostsync
to see how much the clocks of the machines are differing
- Check with pvm_probe
whether a message has arrived
- Use
pvm_catchout
to write slave output to a specific file (instead of the default log
file
of /tmp)
Configuration
To use pvm you should configure your account.
Directories
In order to make PVM work you need first to create a directory called pvm3/bin/LINUX.
Do this by typing the following commands in a console (or use the file
manager):
- cd (= goes to your home directory)
- mkdir pvm3
- cd pvm3
- mkdir bin
- cd bin
- mkdir LINUX
When you want to start a process by PVM (eg a master that starts a
slave),
pvm will look for the slave in the pvm3/bin directory! Based on the
architecture
of your machine PVM will choose a sub directory in that directory. To
do
so it will look at the PVM_ARCH environment variable, which is in our
case
LINUX (see next paragraph).
Add also this directory to your path (see linux configuration):
add
the line $(HOME)/pvm3/bin/LINUX to your .path.local file
Remark: sometimes we use a hostf file in our home directory
to indicate
this path (for the exe's).
PVM variables
In order to work PVM requires 2 environment variables to be set, namely
PVM_ROOT
and PVM_ARCH. These are used by the PVM daemon to find were the
different
files are located on the system. By default they have the following
values
on our machines:
- PVM_ROOT = /usr/shared/pvm3
- PVM_ARCH = LINUX
At this point PVM should work for you now try to type PVM at the
prompt. You should get a line like this pvm>
For further operations I refer you to the user manual.
.rhosts file
PVM uses remote shell to start programs on the different machines
that constitute a PVM. It is therefore important to ensure the you can
indeed do
a remote shell to another computer. Choose a computer other than the on
you
are login in and type :
ssh other_computer_name date
This line will run the program date on the remote machine and
redirect
the output of that program to the local machine. For this to work you
should
add the name of your local machine into the .rhosts file of
your home
directory. Again be aware of the fact that this is a hidden file and
that
you need to use ls -al if you want to make sure this file
exists.
What you do is simply put the name of all the machine you will allow to
do
a remote shell on the local machine in this file.
You can create the .rhosts file with: echo $HOSTNAME $LOGNAME > .rhosts
How does it work?
When you issue a rsh to a machine this machine check the .rhosts
file
to see which machine have the permission to request the execution of a
program.
Suppose machine A wants to run a program on machine B. Then the name of
machine
A should be present in the .rhosts file of machine B. So the remote
machine
has list of the machine that can start programs. The opposite would be
quite
dangerous. It that scenario it would suffice to make a file on a
machine
in order to run programs on any machine and by just adding the name of
the
machine to that file.
So you can do a remote shell that is good. I recommend putting all
the machines of our lab into your .rhosts file. This way you will be
able to start
PVM on all the machines you want.
Installation of PVM 3 on Windows 9x
IMPORTANT: Install PVM only in machines that have Visual C++ 5 or
greater,
because the pvm libraries in this distribution are incompatible with
other
compilers.
The self-installation program could be downloaded at:
http://www.epm.ornl.gov/~sscott/PVM/Software/ParallelVirtualMachine3.4.3.zip
This program will install all the necesary files.
After installation check that the following enviroment variables are
set...
if not, put then in the autoexec.bat:
set PVM_ROOT=3DC:\Program Files\PVM3.4
set PVM_ARCH=3DWIN32
Reboot the machine. Run the PVM console and you are ready to
execute any pvm program.
For more install information, check out the links page...