slurmd - The compute node daemon for Slurm.
slurmd is the compute node daemon of Slurm. It monitors all
tasks running on the compute node , accepts work (tasks), launches tasks,
and kills running tasks upon request.
- -b
- Report node rebooted when daemon restarted. Used for testing
purposes.
-
- -c
- Clear system locks as needed. This may be required if slurmd
terminated abnormally.
-
- -C
- Print the actual hardware configuration (not the configuration from the
slurm.conf file) and exit. The format of output is the same as used in
slurm.conf to describe a node's configuration plus its uptime.
-
- --conf <node
parameters>
- Used in conjunction with the -Z option. Used to override or define
additional parameters of a dynamic node using the same syntax and
parameters used to define nodes in the slurm.conf. Specifying any of
CPUs, Boards, SocketsPerBoard, CoresPerSocket
or ThreadsPerCore will override the defaults defined by the
-C option. NodeName and Port are not supported.
For example if slurmd -C reports
NodeName=node1 CPUs=16 Boards=1 SocketsPerBoard=1 CoresPerSocket=8 ThreadsPerCore=2 RealMemory=31848
the following --conf specifications will generate the
corresponding node definitions:
--conf "Gres=gpu:2"
NodeName=node1 CPUs=16 Boards=1 SocketsPerBoard=1 CoresPerSocket=8 ThreadsPerCore=2 RealMemory=31848 Gres=gpu:2
--conf "RealMemory=30000"
NodeName=node1 CPUs=16 Boards=1 SocketsPerBoard=1 CoresPerSocket=8 ThreadsPerCore=2 RealMemory=30000
--conf "CPUs=16"
NodeName=node1 CPUs=16 RealMemory=331848
--conf "CPUs=16 RealMemory=30000 Gres=gpu:2"
NodeName=node1 CPUs=16 RealMemory=30000 Gres=gpu:2"
-
- --conf-server
<host>[:<port>]
- Comma-separated list of controllers, the first being the primary
slurmctld. A port can (optionally) be specified for each controller. These
hosts are where the slurmd will fetch the configuration from when running
in "configless" mode.
-
- -d <file>
- Specify the fully qualified pathname to the slurmstepd program to
be used for shepherding user job steps. This can be useful for testing
purposes.
-
- -D
- Run slurmd in the foreground. Error and debug messages will be copied to
stderr.
-
- -f <file>
- Read configuration from the specified file. See NOTES below.
-
- -F[feature]
- Start this node as a Dynamic Future node. It will try to match a node
definition with a state of FUTURE, optionally using the specified
feature to match the node definition.
-
- -G
- Print Generic RESource (GRES) configuration (based upon slurm.conf GRES
merged with gres.conf contents for this node) and exit.
-
- -h
- Help; print a brief summary of command options.
-
- -L <file>
- Write log messages to the specified file.
-
- -M
- Lock slurmd pages into system memory using mlockall (2) to disable paging
of the slurmd process. This may help in cases where nodes are marked DOWN
during periods of heavy swap activity. If the mlockall (2) system call is
not available, an error will be printed to the log and slurmd will
continue as normal.
It is suggested to set
LaunchParameters=slurmstepd_memlock in slurm.conf(5) when
setting -M.
-
- -n <value>
- Set the daemon's nice value to the specified value, typically a negative
number. Also note the PropagatePrioProcess configuration
parameter.
-
- -N <nodename>
- Run the daemon with the given nodename. Used to emulate a larger system
with more than one slurmd daemon per node. Requires that Slurm be built
using the --enable-multiple-slurmd configure option.
-
- -s
- Change working directory of slurmd to SlurmdLogFile path if possible, or
to SlurmdSpoolDir otherwise. If both of them fail it will fallback to
/var/tmp.
-
- -v
- Verbose operation. Multiple -v's increase verbosity.
-
- -V, --version
- Print version information and exit.
-
- -Z
- Start this node as a Dynamic Normal node. If no --conf is
specified, then the slurmd will register with the same hardware
configuration as defined by the -C option.
-
The following environment variables can be used to override
settings compiled into slurmd.
- SLURM_CONF
- The location of the Slurm configuration file. This is overridden by
explicitly naming a configuration file on the command line.
-
- SLURM_DEBUG_FLAGS
- Specify debug flags for slurmd to use. See DebugFlags in the
slurm.conf(5) man page for a full list of flags. The environment
variable takes precedence over the setting in the slurm.conf.
-
- SIGTERM
SIGINT
- slurmd will shutdown cleanly, waiting for in-progress rollups to
finish.
-
- SIGHUP
- Reloads the slurm configuration files, similar to 'scontrol
reconfigure'.
-
- SIGUSR2
- Reread the log level from the configs, and then reopen the log file. This
should be used when setting up logrotate(8).
-
- SIGPIPE
- This signal is explicitly ignored.
-
If slurmd is started with the -D option then the core file
will be written to the current working directory. Otherwise if
SlurmdLogFile is a fully qualified path name (starting with a slash),
the core file will be written to the same directory as the log file.
Otherwise the core file will be written to the SlurmSpoolDir
directory, or "/var/tmp/" as a last resort. If none of the above
directories can be written, no core file will be produced.
It may be useful to experiment with different slurmd
specific configuration parameters using a distinct configuration file (e.g.
timeouts). However, this special configuration file will not be used by the
slurmctld daemon or the Slurm programs, unless you specifically tell
each of them to use it. If you desire changing communication ports, the
location of the temporary file system, or other parameters used by other
Slurm components, change the common configuration file,
slurm.conf.
If you are using configless mode with a login node that runs a lot
of client commands, you may consider running slurmd on that machine
so it can manage a cached version of the configuration files. Otherwise,
each client command will use the DNS record to contact the controller and
get the configuration information, which could place additional load on the
controller.
Copyright (C) 2002-2007 The Regents of the University of
California. Copyright (C) 2008-2010 Lawrence Livermore National Security.
Copyright (C) 2010-2022 SchedMD LLC. Produced at Lawrence Livermore National
Laboratory (cf, DISCLAIMER).
This file is part of Slurm, a resource management program. For
details, see <https://slurm.schedmd.com/>.
Slurm is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the Free
Software Foundation; either version 2 of the License, or (at your option)
any later version.
Slurm is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
more details.