pdsh(1) | General Commands Manual | pdsh(1) |
pdcp - copy files to groups of hosts in parallel
rpdcp - (reverse pdcp) copy files from a group of hosts in parallel
pdcp [options]... src [src2...] dest
rpdcp [options]... src [src2...] dir
pdcp is a variant of the rcp(1) command. Unlike rcp(1), which copies files to a single remote host, pdcp can copy files to multiple remote hosts in parallel. However, pdcp does not recognize files in the format ``rname@rhost:path,'' therefore all source files must be on the local host machine. Destination nodes must be listed on the pdcp command line using a suitable target nodelist option (See the OPTIONS section below). Each destination node listed must have pdcp installed for the copy to succeed.
When pdcp receives SIGINT (ctrl-C), it lists the status of current threads. A second SIGINT within one second terminates the program. Pending threads may be canceled by issuing ctrl-Z within one second of ctrl-C. Pending threads are those that have not yet been initiated, or are still in the process of connecting to the remote host.
Like pdsh(1), the functionality of pdcp may be supplemented by dynamically loadable modules. In pdcp, the modules may provide a new connect protocol (replacing the standard rsh(1) protocol), filtering options (e.g. excluding hosts that are down), and/or host selection options (e.g. -a selects all nodes from a local config file). By default, pdcp requires at least one "rcmd" module to be loaded (to provide the channel for remote copy).
rpdcp performs a reverse parallel copy. Rather than copying files to remote hosts, files are retrieved from remote hosts and stored locally. All directories or files retrieved will be stored with their remote hostname appended to the filename. The destination file must be a directory when this option is used.
In other respects, rpdcp is exactly like pdcp, and further statements regarding pdcp in this manual also apply to rpdcp.
The method by which pdcp connects to remote hosts may be selected at runtime using the -R option (See OPTIONS below). This functionality is ultimately implemented via dynamically loadable modules, and so the list of available options may be different from installation to installation. A list of currently available rcmd modules is printed when using any of the -h, -V, or -L options. The default rcmd module will also be displayed with the -h and -V options.
A list of rcmd modules currently distributed with pdcp follows.
The list of available pdcp options is determined at runtime by supplementing the list of standard pdcp options with any options provided by loaded rcmd and misc modules. In some cases, options provided by modules may conflict with each other. In these cases, the modules are incompatible and the first module loaded wins.
If a host or hostlist is preceded by a `-' character, this causes those hosts to be explicitly excluded. If the argument is preceded by a single `^' character, it is taken to be the path to file containing a list of hosts, one per line. If the item begins with a `/' character, it is taken as a regular expression on which to filter the list of hosts (a regex argument may also be optionally trailed by another '/', e.g. /node.*/). A regex or file name argument may also be preceeded by a minus `-' to exclude instead of include thoses hosts.
A list of hosts may also be preceded by "user@" to specify a remote username other than the default, or "rcmd_type:" to specify an alternate rcmd connection type for these hosts. When used together, the rcmd type must be specified first, e.g. "ssh:user1@host0" would use ssh to connect to host0 as user "user1."
As noted in sections above, pdcp accepts ranges of hostnames in the general form: prefix[n-m,l-k,...], where n < m and l < k, etc., as an alternative to explicit lists of hosts. This form should not be confused with regular expression character classes (also denoted by ``[]''). For example, foo[19] does not represent foo1 or foo9, but rather represents a degenerate range: foo19.
This range syntax is meant only as a convenience on clusters with a prefixNN naming convention and specification of ranges should not be considered necessary -- the list foo1,foo9 could be specified as such, or by the range foo[1,9].
Some examples of range usage follow:
Copy /etc/hosts to foo01,foo02,...,foo05
pdcp -w foo[01-05] /etc/hosts /etc Copy /etc/hosts to foo7,foo9,foo10
pdcp -w foo[7,9-10] /etc/hosts /etc Copy /etc/hosts to foo0,foo4,foo5
pdcp -w foo[0-5] -x foo[1-3] /etc/hosts /etc
As a reminder to the reader, some shells will interpret brackets ('[' and ']') for pattern matching. Depending on your shell, it may be necessary to enclose ranged lists within quotes. For example, in tcsh, the first example above should be executed as:
pdcp -w "foo[01-05]" /etc/hosts /etc
Pdsh/pdcp was originally a rewrite of IBM dsh(1) by Jim Garlick <garlick@llnl.gov> on LLNL's ASCI Blue-Pacific IBM SP system. It is now also used on Linux clusters at LLNL.
When using ssh for remote execution, stderr of ssh to be folded in with that of the remote command. When invoked by pdcp, it is not possible for ssh to prompt for confirmation if a host key changes, prompt for passwords if RSA keys are not configured properly, etc.. Finally, the connect timeout is only adjustable with ssh when the underlying ssh implementation supports it, and pdsh has been built to use the correct option.
linux-gnu |