ch-run - Run a command in a Charliecloud container
$ ch-run [OPTION...] NEWROOT CMD [ARG...]
Run command CMD in a Charliecloud container using the
flattened and unpacked image directory located at NEWROOT.
- -b,
--bind=SRC[:DST]
- mount SRC at guest DST (default /mnt/0,
/mnt/1, etc.)
- -c,
--cd=DIR
- initial working directory in container
- -g,
--gid=GID
- run as group GID within container
- -j,
--join
- use the same container (namespaces) as peer ch-run invocations
- --join-pid=PID
- join the namespaces of an existing process
- --join-ct=N
- number of ch-run peers (implies --join; default: see
below)
- --join-tag=TAG
- label for ch-run peer group (implies --join; default: see
below)
- --no-home
- do not bind-mount your home directory (by default, your home directory is
mounted at /home/$USER in the container)
- -t,
--private-tmp
- use container-private /tmp (by default, /tmp is shared with
the host)
- -u,
--uid=UID
- run as user UID within container
- -v,
--verbose
- be more verbose (debug if repeated)
- -w,
--write
- mount image read-write (by default, the image is mounted read-only)
- -?, --help
- print help and exit
- --usage
- print a short usage message and exit
- -V,
--version
- print version and exit
In addition to any directories specified by the user with
--bind, ch-run has standard host files and directories that
are bind-mounted in as well.
The following host files and directories are bind-mounted at the
same location in the container. These cannot be disabled.
- /dev
- /etc/passwd
- /etc/group
- /etc/hosts
- /etc/resolv.conf
- /proc
- /sys
Three additional bind mounts can be disabled by the user:
- Your home directory (i.e., $HOME) is mounted at guest
/home/$USER by default. This is accomplished by mounting a new
tmpfs at /home, which hides any image content under that
path. If --no-home is specified, neither of these things happens
and the image’s /home is exposed unaltered.
- /tmp is shared with the host by default. If --private-tmp is
specified, a new tmpfs is mounted on the guest’s /tmp
instead.
- If file /usr/bin/ch-ssh is present in the image, it is over-mounted
with the ch-ssh binary in the same directory as ch-run.
By default, different ch-run invocations use different user
and mount namespaces (i.e., different containers). While this has no impact
on sharing most resources between invocations, there are a few important
exceptions. These include:
- 1.
- ptrace(2), used by debuggers and related tools. One can attach a
debugger to processes in descendant namespaces, but not sibling
namespaces. The practical effect of this is that (without --join),
you can’t run a command with ch-run and then attach to it
with a debugger also run with ch-run.
- 2.
- Cross-memory attach (CMA) is used by cooperating processes to
communicate by simply reading and writing one another’s memory.
This is also not permitted between sibling namespaces. This affects
various MPI implementations that use CMA to pass messages between ranks on
the same node, because it’s faster than traditional shared
memory.
--join is designed to address this by placing related
ch-run commands (the “peer group”) in the same
container. This is done by one of the peers creating the namespaces with
unshare(2) and the others joining with setns(2).
To do so, we need to know the number of peers and a name for the
group. These are specified by additional arguments that can (hopefully) be
left at default values in most cases:
- --join-ct sets the number of peers. The default is the value of the
first of the following environment variables that is defined:
OMPI_COMM_WORLD_LOCAL_SIZE, SLURM_STEP_TASKS_PER_NODE,
SLURM_CPUS_ON_NODE.
- --join-tag sets the tag that names the peer group. The default is
environment variable SLURM_STEP_ID, if defined; otherwise, the PID
of ch-run’s parent. Tags can be re-used for peer groups that
start at different times, i.e., once all peer ch-run have replaced
themselves with the user command, the tag can be re-used.
Caveats:
- One cannot currently add peers after the fact, for example, if one decides
to start a debugger after the fact. (This is only required for code with
bugs and is thus an unusual use case.)
- ch-run instances race. The winner of this race sets up the
namespaces, and the other peers use the winner to find the namespaces to
join. Therefore, if the user command of the winner exits, any remaining
peers will not be able to join the namespaces, even if they are still
active. There is currently no general way to specify which ch-run
should be the winner.
- If --join-ct is too high, the winning ch-run’s user
command exits before all peers join, or ch-run itself crashes, IPC
resources such as semaphores and shared memory segments will be leaked.
These appear as files in /dev/shm/ and can be removed with
rm(1).
- Many of the arguments given to the race losers, such as the image path and
--bind, will be ignored in favor of what was given to the
winner.
ch-run generally tries to leave environment variables
unchanged, but in some cases, guests can be significantly broken unless
environment variables are tweaked. This section lists those changes.
- $HOME: If the path to your home directory is not /home/$USER
on the host, then an inherited $HOME will be incorrect inside the
guest. This confuses some software, such as Spack.
Thus, we change $HOME to /home/$USER, unless
--no-home is specified, in which case it is left unchanged.
- $PATH: Newer Linux distributions replace some root-level
directories, such as /bin, with symlinks to their counterparts in
/usr.
Some of these distributions (e.g., Fedora 24) have also
dropped /bin from the default $PATH. This is a problem
when the guest OS does not have a merged /usr (e.g.,
Debian 8 “Jessie”). Thus, we add /bin to
$PATH if it’s not already present.
Further reading:
- The case for the /usr Merge
- Fedora
- Debian
Run the command echo hello inside a Charliecloud container
using the unpacked image at /data/foo:
$ ch-run /data/foo -- echo hello
hello
Run an MPI job that can use CMA to communicate:
$ srun ch-run --join /data/foo -- bar
If Charliecloud was obtained from your Linux distribution, use
your distribution’s bug reporting procedures.
Otherwise, report bugs to:
<https://github.com/hpc/charliecloud/issues>
charliecloud(1)
Full documentation at:
<https://hpc.github.io/charliecloud>
Reid Priedhorsky, Tim Randles, and others
2014–2018, Los Alamos National Security, LLC