smap - graphically view information about Slurm jobs, partitions,
and set configurations parameters.
smap is used to graphically view job, partition and node
information for a system running Slurm. Note that information about nodes
and partitions to which you lack access will always be displayed to avoid
obvious gaps in the output. This is equivalent to the --all option of
the sinfo and squeue commands.
- -c,
--commandline
- Print output to the commandline, no curses.
- -D <option>,
--display=<option>
- sets the display mode for smap, showing relevant information about the
selected view and displaying a corresponding node chart. Note that
unallocated nodes are indicated by a '.' and nodes in the DOWN, DRAINED or
FAIL state by a '#'. When the --iterate=<seconds> option is
also selected, you can switch displays by typing a different letter from
the list below.
- j
- Displays information about jobs running on system.
- r
- Display information about advanced reservations. While all current and
future reservations will be listed, only currently active reservations
will appear on the node map.
- s
- Displays information about slurm partitions on the system
- -h,
--noheader
- Do not print a header on the output.
- -H,
--show_hidden
- Display hidden partitions and their jobs.
- --help,
- Print a message describing all smap options.
- -i <seconds> ,
--iterate=<seconds>
- Print the state on a periodic basis. Sleep for the indicated number of
seconds between reports. User can exit at anytime by typing 'q' or hitting
the return key. If user is in configure mode type 'exit' to exit program,
'quit' to exit configure mode.
- -M,
--clusters=<string>
- Clusters to issue commands to. Note that the SlurmDBD must be up for this
option to work properly.
- -n, --nodes
- Only show objects with these nodes.
- -Q, --quiet
- Avoid printing error messages.
- --usage
- Print a brief message listing the smap options.
- -V ,
--version
- Print version information and exit.
When using smap in curses mode and when the
--iterate=<seconds> option is also selected, you can scroll
through the different windows using the arrow keys. The up and
down arrow keys scroll the window containing the grid, and the
left and right arrow keys scroll the window containing the
text information.
With the iterate option selected, you can use any of the options
available to the -D option listed above (except 'c') to change
screens. You can also hide or make visible hidden partitions by pressing 'h'
at any moment.
- ACCESS_CONTROL
- Identifies the users or bank accounts which can use this advanced
reservation. A prefix of "A:" indicates that the following
account names may use this reservation. A prefix of "U:"
indicates that the following user names may use this reservation.
- AVAIL
- Partition state: up or down.
- END_TIME
- The time when an advanced reservation ended.
- ID
- Key to identify the nodes associated with this entity in the node
chart.
- NAME
- Name of the job or advanced reservation.
- NODELIST
- Names of nodes associated with this configuration, partition or
reservation.
- NODES
- Count of nodes with this particular configuration.
- PARTITION
- Name of a partition. Note that the suffix "*" identifies the
default partition.
- ST
- State of a job in compact form. Possible states include: PD (pending), R
(running), S (suspended), CD (completed), CF (configuring), CG
(completing), F (failed), TO (timeout), and NF (node failure). See JOB
STATE CODES section below for more information.
- START_TIME
- The time when an advanced reservation started.
- STATE
- State of the nodes. Possible states include: allocated, completing, down,
drained, draining, fail, failing, idle, and unknown plus their abbreviated
forms: alloc, comp, down, drain, drng, fail, failg, idle, and unk
respectively. Note that the suffix "*" identifies nodes that are
presently not responding. See NODE STATE CODES section below for
more information.
- TIMELIMIT
- Maximum time limit for any user job in days-hours:minutes:seconds.
infinite is used to identify jobs or partitions without a job time
limit.
The node chart is designed to indicate relative locations of the
nodes. On most Linux clusters this will represent a one-dimensional array of
nodes. Larger clusters will utilize multiple as needed with right side of
one line being logically followed by the left side of the next line.
Node state codes are shortened as required for the field size.
These node states may be followed by a special character to identify state
flags associated with the node. The following node sufficies and states are
used:
- *
- The node is presently not responding and will not be allocated any new
work. If the node remains non-responsive, it will be placed in the
DOWN state (except in the case of COMPLETING,
DRAINED, DRAINING, FAIL, FAILING nodes).
- ~
- The node is presently in a power saving mode (typically running at reduced
frequency).
- #
- The node is presently being powered up or configured.
- $
- The node is currently in a reservation with a flag value of
"maintenance".
- @
- The node is pending reboot.
- ALLOCATED
- The node has been allocated to one or more jobs.
- ALLOCATED+
- The node is allocated to one or more active jobs plus one or more jobs are
in the process of COMPLETING.
- COMPLETING
- All jobs associated with this node are in the process of COMPLETING. This
node state will be removed when all of the job's processes have terminated
and the Slurm epilog program (if any) has terminated. See the
Epilog parameter description in the slurm.conf man page for
more information.
- DOWN
- The node is unavailable for use. Slurm can automatically place nodes in
this state if some failure occurs. System administrators may also
explicitly place nodes in this state. If a node resumes normal operation,
Slurm can automatically return it to service. See the
ReturnToService and SlurmdTimeout parameter descriptions in
the slurm.conf(5) man page for more information.
- DRAINED
- The node is unavailable for use per system administrator request. See the
update node command in the scontrol(1) man page or the
slurm.conf(5) man page for more information.
- DRAINING
- The node is currently executing a job, but will not be allocated to
additional jobs. The node state will be changed to state DRAINED
when the last job on it completes. Nodes enter this state per system
administrator request. See the update node command in the
scontrol(1) man page or the slurm.conf(5) man page for more
information.
- FAIL
- The node is expected to fail soon and is unavailable for use per system
administrator request. See the update node command in the
scontrol(1) man page or the slurm.conf(5) man page for more
information.
- FAILING
- The node is currently executing a job, but is expected to fail soon and is
unavailable for use per system administrator request. See the update
node command in the scontrol(1) man page or the
slurm.conf(5) man page for more information.
- IDLE
- The node is not allocated to any jobs and is available for use.
- MAINT
- The node is currently in a reservation with a flag value of
"maintainence".
- REBOOT
- The node is currently scheduled to be rebooted.
- UNKNOWN
- The Slurm controller has just started and the node's state has not yet
been determined.
Jobs typically pass through several states in the course of their
execution. The typical states are PENDING, RUNNING,
SUSPENDED, COMPLETING, and COMPLETED. An explanation of
each state follows.
- BF BOOT_FAIL
- Job terminated due to launch failure, typically due to a hardware failure
(e.g. unable to boot the node or block and the job can not be
requeued).
- CA CANCELLED
- Job was explicitly cancelled by the user or system administrator. The job
may or may not have been initiated.
- CD COMPLETED
- Job has terminated all processes on all nodes with an exit code of
zero.
- CG COMPLETING
- Job is in the process of completing. Some processes on some nodes may
still be active.
- CF CONFIGURING
- Job has been allocated resources, but are waiting for them to become ready
for use (e.g. booting).
- F FAILED
- Job terminated with non-zero exit code or other failure condition.
- NF NODE_FAIL
- Job terminated due to failure of one or more allocated nodes.
- PD PENDING
- Sibling job (in federation) revoked.
- PR PREEMPTED
- Job terminated due to preemption.
- RV REVOKED
- Job currently has an allocation.
- R RUNNING
- Job currently has an allocation.
- SI SIGNALING
- Signal of job currently in progress.
- SO STAGE_OUT
- Staging out data after job completion.
- SE SPECIAL_EXIT
- The job was requeued in a special state. This state can be set by users,
typically in EpilogSlurmctld, if the job has terminated with a particular
exit value.
- ST STOPPED
- Job has an allocation, but execution has been stopped with SIGSTOP signal.
CPUS have been retained by this job.
- S SUSPENDED
- Job has an allocation, but execution has been suspended and CPUs have been
released for other jobs.
- TO TIMEOUT
- Job terminated upon reaching its time limit.
The following environment variables can be used to override
settings compiled into smap.
- SLURM_CONF
- The location of the Slurm configuration file.
Copyright (C) 2004-2007 The Regents of the University of
California. Produced at Lawrence Livermore National Laboratory (cf,
DISCLAIMER).
Copyright (C) 2008-2009 Lawrence Livermore National Security.
Copyright (C) 2010-2013 SchedMD LLC.
This file is part of Slurm, a resource management program. For
details, see <https://slurm.schedmd.com/>.
Slurm is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the Free
Software Foundation; either version 2 of the License, or (at your option)
any later version.
Slurm is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
more details.
scontrol(1), sinfo(1), squeue(1),
slurm_load_ctl_conf (3), slurm_load_jobs (3),
slurm_load_node (3), slurm_load_partitions (3),
slurm_reconfigure (3), slurm_shutdown (3),
slurm_update_job (3), slurm_update_node (3),
slurm_update_partition (3), slurm.conf(5)