irqbalance - distribute hardware interrupts across processors on a
multiprocessor system
The purpose of irqbalance is to distribute hardware
interrupts across processors on a multiprocessor system in order to increase
performance.
- -o, --oneshot
- Causes irqbalance to be run once, after which the daemon exits.
- -d, --debug
- Causes irqbalance to print extra debug information. Implies --foreground.
- -f, --foreground
- Causes irqbalance to run in the foreground (without --debug).
- -j, --journal
- Enables log output optimized for systemd-journal.
- -p,
--powerthresh=<threshold>
- Set the threshold at which we attempt to move a CPU into powersave mode If
more than <threshold> CPUs are more than 1 standard deviation below
the average CPU softirq workload, and no CPUs are more than 1 standard
deviation above (and have more than 1 IRQ assigned to them), attempt to
place 1 CPU in powersave mode. In powersave mode, a CPU will not have any
IRQs balanced to it, in an effort to prevent that CPU from waking up
without need.
- -i,
--banirq=<irqnum>
- Add the specified IRQ to the set of banned IRQs. irqbalance will not
affect the affinity of any IRQs on the banned list, allowing them to be
specified manually. This option is additive and can be specified multiple
times. For example to ban IRQs 43 and 44 from balancing, use the following
command line: irqbalance --banirq=43 --banirq=44
- -m,
--banmod=<module_name>
- Add the specified module to the set of banned modules, similar to
--banirq. irqbalance will not affect the affinity of any IRQs of given
modules, allowing them to be specified manually. This option is additive
and can be specified multiple times. For example to ban all IRQs of module
foo and module bar from balancing, use the following command line:
irqbalance --banmod=foo --banmod=bar
- -c,
--deepestcache=<integer>
- This allows a user to specify the cache level at which irqbalance
partitions cache domains. Specifying a deeper cache may allow a greater
degree of flexibility for irqbalance to assign IRQ affinity to achieve
greater performance increases, but setting a cache depth too large on some
systems (specifically where all CPUs on a system share the deepest cache
level), will cause irqbalance to see balancing as unnecessary.
irqbalance --deepestcache=2
The default value for deepestcache is 2.
- -l,
--policyscript=<script>
- When specified, the referenced script or directory will execute once for
each discovered IRQ, with the sysfs device path and IRQ number passed as
arguments. Note that the device path argument will point to the parent
directory from which the IRQ attributes directory may be directly opened.
Policy scripts specified need to be owned and executable by the user of
irqbalance process, if a directory is specified, non-executable files will
be skipped. The script may specify zero or more key=value pairs that will
guide irqbalance in the management of that IRQ. Key=value pairs are
printed by the script on stdout and will be captured and interpreted by
irqbalance. Irqbalance expects a zero exit code from the provided utility.
Recognized key=value pairs are:
- ban=[true |
false]
- Directs irqbalance to exclude the passed in IRQ from balancing.
- balance_level=[none
| package | cache | core]
- This allows a user to override the balance level of a given IRQ. By
default the balance level is determined automatically based on the pci
device class of the device that owns the IRQ.
- numa_node=<integer>
- This allows a user to override the NUMA node that sysfs indicates a given
device IRQ is local to. Often, systems will not specify this information
in ACPI, and as a result devices are considered equidistant from all NUMA
nodes in a system. This option allows for that hardware provided
information to be overridden, so that irqbalance can bias IRQ affinity for
these devices toward its most local node. Note that specifying a -1 here
forces irqbalance to consider an interrupt from a device to be equidistant
from all nodes.
- Note that, if a directory is
specified rather than a regular file, all files in
- the directory will be considered policy scripts, and executed on adding of
an irq to a database. If such a directory is specified, scripts in the
directory must additionally exit with one of the following exit
codes:
- 0
- This indicates the script has a policy for the referenced irq, and that
further script processing should stop
- 1
- This indicates that the script has no policy for the referenced irq, and
that script processing should continue
- 2
- This indicates that an error has occurred in the script, and it should be
skipped (further processing to continue)
- --migrateval,
-e <val>
- Specify a minimum migration ratio to trigger a rebalancing Normally any
improvement in load distribution will trigger the migration of an irq, as
long as preforming the migration will not simply move the load to a new
cpu. By specifying a migration value, the load balance improvement is
subject to hysteresis defined by this value, which is inversely
propotional to the value. For example, a value of 2 in this option tells
irqbalance that the improvement in load distribution must be at least 50%,
a value of 4 indicates the load distribution improvement must be at least
25%, etc
- -s,
--pid=<file>
- Have irqbalance write its process id to the specified file. By default no
pidfile is written. The written pidfile is automatically unlinked when
irqbalance exits. It is ignored when used with --debug or
--foreground.
- -t,
--interval=<time>
- Set the measurement time for irqbalance. irqbalance will sleep for
<time> seconds between samples of the irq load on the system cpus.
Defaults to 10.
- IRQBALANCE_ONESHOT
- Same as --oneshot.
- IRQBALANCE_DEBUG
- Same as --debug.
- IRQBALANCE_BANNED_CPUS
- Provides a mask of CPUs which irqbalance should ignore and never assign
interrupts to. If not specified, irqbalance use mask of isolated and
adaptive-ticks CPUs on the system as the default value. The
"isolcpus=" boot parameter specifies the isolated CPUs. The
"nohz_full=" boot parameter specifies the adaptive-ticks CPUs.
By default, no CPU will be an isolated or adaptive-ticks CPU. This is a
hexmask without the leading ’0x’. On systems with large
numbers of processors, each group of eight hex digits is separated by a
comma ’,’. i.e. ‘export
IRQBALANCE_BANNED_CPUS=fc0‘ would prevent irqbalance from assigning
irqs to the 7th-12th cpus (cpu6-cpu11) or ‘export
IRQBALANCE_BANNED_CPUS=ff000000,00000001‘ would prevent irqbalance
from assigning irqs to the 1st (cpu0) and 57th-64th cpus (cpu56-cpu63).
Notes: This environment variable will be discarded, please use
IRQBALANCE_BANNED_CPULIST instead. Before deleting this environment
variable, Introduce a deprecation period first for the consider of
compatibility.
- IRQBALANCE_BANNED_CPULIST
- Provides a cpulist which irqbalance should ignore and never assign
interrupts to. If not specified, irqbalance use mask of isolated and
adaptive-ticks CPUs on the system as the default value.
- SIGHUP
- Forces a rescan of the available IRQs and system topology.
irqbalance is able to communicate via socket and return it's
current assignment tree and setup, as well as set new settings based on sent
values. Socket is abstract, with a name in form of
irqbalance<PID>.sock , where <PID> is the process ID of
irqbalance instance to communicate with. Possible values to send:
- stats
- Retrieve assignment tree of IRQs to CPUs, in recursive manner. For each
CPU node in tree, it's type, number, load and whether the save mode is
active are sent. For each assigned IRQ type, it's number, load, number of
IRQs since last rebalancing and it's class are sent. Refer to types.h file
for explanation of defines.
- setup
- Get the current value of sleep interval, mask of banned CPUs and list of
banned IRQs.
- settings sleep
<s>
- Set new value of sleep interval, <s> >= 1.
- settings cpus
<cpu_number1> <cpu_number2> ...
- Ban listed CPUs from IRQ handling, all old values of banned CPUs are
forgotten.
- settings ban
irqs <irq1> <irq2> ...
- Ban listed IRQs from being balanced, all old values of banned IRQs are
forgotten.
irqbalance checks SCM_CREDENTIALS of sender (only root user is
allowed to interact). Based on chosen tools, ancillary message with
credentials needs to be sent with request.
https://github.com/Irqbalance/irqbalance