lamssi_rpi(7) | LAM SSI RPI OVERVIEW | lamssi_rpi(7) |
lamssi_rpi - overview of LAM's RPI SSI modules
The "kind" for RPI SSI modules is "rpi". Specifically, the string "rpi" (without the quotes) should be used to specify which RPI should be used on the mpirun command line with the -ssi switch. For example:
The "rpi" string is also used as a prefix send parameters to specific RPI modules. For example:
LAM currently supports five different RPI SSI modules: gm, lamd, tcp, sysv, usysv.
Only one RPI module may be selected per command execution. The selection of which module occurs during MPI_INIT, and is used for the duration of the MPI process. It is erroneous to select different RPI modules for different processes.
The kind for selecting an RPI is "rpi". For example:
As with all SSI modules, it is possible to pass parameters at run time. This section discusses the built-in LAM RPI modules, as well as the run-time parameters that they accept.
In the discussion below, the parameters are discussed in terms of kind and name. The kind and name may be specified as command line arguments to the mpirun command with the -ssi switch, or they may be set in environment variables of the form LAM_MPI_SSI_name=value. Note that using the -ssi command line switch will take precendence over any environment variables.
If the RPI that is selected is unable to run (e.g., attempting to use the gm RPI when gm support was not compiled into LAM, or if no gm hardware is available on the nodes), an appropriate error message will be printed and execution will abort.
The crtcp RPI is a checkpoint/restart-able version of the tcp RPI (see below). It is separate from the tcp RPI because the current implementation imposes a slight performance penalty to enable the ability to checkpoint and restart MPI jobs. Its tunable parameters are the same as the tcp RPI. This RPI probably only needs to be used when the ability to checkpoint and restart MPI jobs is required.
See the LAM/MPI User's Guide for more details on the crtcp RPI as well as the checkpoint/restart capabilities of LAM/MPI. The lamssi_cr(7) manual page also contains additional information.
The gm RPI is used with native Myrinet networks. Please note that the gm RPI exists, but has not yet been optimized. It gives significantly better performance than TCP over Myrinet networks, but has not yet been properly tuned and instrumented in LAM.
That being said, there are several tunable parameters in the gm RPI:
The lamd RPI uses LAM's "out-of-band" communication mechanism for passing MPI messages. Specifically, MPI messages are sent from the user process to the local LAM daemon, then to the remote LAM daemon (if the destination process is on a different node), and then to the destination process.
While this adds latency to message passing because of the extra hops that each message must travel, it allows for true asynchronous message passing. Since the LAM daemon is running in its own execution space, it can make progress on message passing regardless of the state / status of the user's program. This can be an overall net savings in performance and execution time for some classes of MPI programs.
It is expected that this RPI will someday become obsolete when LAM becomes multi-threaded and allows progress to be made on message passing in separate threads rather than in separate processes.
The lamd RPI has no tunable parameters.
The tcp RPI uses pure TCP for all MPI message passing. TCP sockets are opened between MPI processes and are used for all MPI traffic.
The tcp RPI has one tunable parameter:
The sysv RPI uses shared memory for communication between MPI processes on the same node, and TCP sockets for communication between MPI processes on different nodes. System V semaphores are used to lock the shared memory pools. This RPI is best used when running multiple MPI processes on uniprocessors (or oversubscribed SMPs) because of the blocking / yielding nature of semaphores.
The sysv RPI has the following tunable parameters:
The configure script will try to determine a default size for the pool if none is explicitly specified (you should always check this to see if it is reasonable). Larger values should improve performance especially when an application passes large messages, but will also increase the system resources used by each task.
The configure script will try to determine a default size for the maximum atomic transfer size if none is explicitly specified (you should always check this to see if it is reasonable). Larger values should improve performance especially when an application passes large messages, but will also increase the system resources used by each task.
The usysv RPI uses shared memory for communication between MPI processes on the same node, and TCP sockets for communication between MPI processes on different nodes. Spin locks are used to lock the shared memory pools. This RPI is best used when the multiple of MPI processes on a single node is less than or equal to the number of processors because it allows LAM to fully occupy the processor while waiting for a message and never be swapped out.
The usysv RPI has many of the same tunable parameters as the sysv RPI:
lamssi(7), lamssi_cr(7), mpirun(1), LAM User's Guide
July, 2007 | LAM 7.1.4 |