LAM-MPI on K42
We are currently undergoing an effort to get MPI running reliably across multiple K42 hosts. We are focusing on LAM-MPI right now, because it is provided in the PERCS SDK from Austin.
Here is a trace of lamboot startup: lamboot.strace.
If you are at Watson and wants to run mpi on a single node or multiple nodes, see LamMpiSetupWatson
Manual Setup
Set LAMHOME=<directory parent of bin where lamboot is>
- LAMRSH=ssh will force use of ssh instead of rsh
- LAMHOME does not seem quite necessary. It appears that the system infers it from the directory in which it found lamboot.
- Add location of lamboot (and hboot and lamd) to your PATH.
- If your default shell is /bin/bash things work fine on linux.
- /bin/sh does not work on linux unless you patch lam (patch is in /homes/kix/cascaval/k42/lam/lam-7.1.1.patch)
- on K42 on /bin/sh works, /bin/bash does not, and the patch is required.
Useful LAM commands
- initialize runtime environment. Use one of
- lamboot -v hostfile
- lamboot -v -ssi boot rsh hostfile
lamboot -v -d -ssi boot rsh <hostfilename>
LAMRSH=ssh -v -d -ssi boot rsh <hostfile>
- hostfile contains the ip address(es) of host(s) to use. One per line. For K42 must use numerical addresses.
- see processors in the env
- lamnodes
- run in all processors
- mpirun C cpi
- run in all nodes (1 process/node)
- mpirun N cpi
- run in one machine/n processes
- mpirun -np N cpi
In the host file you can specify for each host the number of cpus on that host: <host> cpu=x
- the N option will start one mpi process on each node (i.e. host in the hostfile)
- the C option will start one mpi process on each cpu, therefore there can be multiple mpi processes running on the same host
- the -np option starts that many processes using the cpu logic as the counter
- close env
- lamhalt
- if cannot halt then
- lamwipe -v hostfile
- a trace can be gotten with
- mpirun -v -t N etc
lamboot must have been started with -d -v for this to work. The trace is in /tmp/lam-<user>@<system>
Building LAM MPI
- By default LAM MPI uses rsh when invoked with the '-ssi boot rsh' option. To change this, configure LAM MPI with '--with-rsh="ssh -x" '
