HWLOC-CALC(1) | hwloc | HWLOC-CALC(1) |
hwloc-calc - Operate on cpu mask strings and objects
hwloc-calc [topology options] [options] <location1> [<location2> [...] ]
Note that hwloc(7) provides a detailed explanation of the hwloc system and of valid <location> formats; it should be read before reading this man page.
All topology options must be given before all other options.
When specified by index, it corresponds to hwloc ranking of CPU kinds which returns energy-efficient cores first, and high-performance power-hungry cores last. The full list of CPU kinds may be seen with lstopo --cpukinds.
If <path> is a file and XML support has been compiled in hwloc, it may be a XML file exported by a previous hwloc program. If <path> is "-", the standard input may be used as a XML file.
On Linux, <path> may be a directory containing the topology files gathered from another machine topology with hwloc-gather-topology.
On x86, <path> may be a directory containing a cpuid dump gathered with hwloc-gather-cpuid.
When the archivemount program is available, <path> may also be a tarball containing such Linux or x86 topology files.
All these options must be given after all topology options above.
When combined with --nodeset or --nodeset-output, the nodeset is considered instead of the CPU set for finding matching objects. This is useful when reporting the output as a number or set of NUMA nodes.
If an OS device subtype such as gpu is given instead of osdev, only the os devices of that subtype will be counted.
When combined with --physical, the list is convenient to pass to external tools such as taskset or numactl --physcpubind or --membind. This is different from --largest since the latter requires that all reported objects are strictly included inside the input objects.
When combined with --nodeset or --nodeset-output, the nodeset is considered instead of the CPU set for finding matching objects. This is useful when reporting the output as a number or set of NUMA nodes.
If an OS device subtype such as gpu is given instead of osdev, only the os devices of that subtype will be returned.
Only normal CPU-side object types should be used.
NUMA nodes may be used but they may cause redundancy in the output on heterogeneous memory platform. For instance, on a platform with both DRAM and HBM memory on a package, the first core will be considered both as first core of first NUMA node (DRAM) and as first core of second NUMA node (HBM).
This option is similar to -I numa but the way nodes are selected is different: The selection performed by --local-memory may be precisely configured with --local-memory-flags, while -I numa just selects all nodes that are somehow local to any of the input objects.
This option enables --local-memory.
If the memory attribute values depend on the initiator, the hwloc-calc input objects are used as the initiator.
Standard attribute names are Capacity, Locality, Bandwidth, and Latency. All existing attributes in the current topology may be listed with
$ lstopo --memattrs
hwloc-calc generates and manipulates CPU mask strings or objects. Both input and output may be either objects (with physical or logical indexes), CPU lists (with physical or logical indexes), or CPU mask strings (always physically indexed). Input location specification is described in hwloc(7).
If objects or CPU mask strings are given on the command-line, they are combined and a single output is printed. If no object or CPU mask strings are given on the command-line, the program will read the standard input. It will combine multiple objects or CPU mask strings that are given on the same line of the standard input line with spaces as separators. Different input lines will be processed separately.
Command-line arguments and options are processed in order. First topology configuration options should be given. Then, for instance, changing the type of input indexes with --li or changing the input topology with -i only affects the processing the following arguments.
NOTE: It is highly recommended that you read the hwloc(7) overview page before reading this man page. Most of the concepts described in hwloc(7) directly apply to the hwloc-calc utility.
hwloc-calc's operation is best described through several examples.
To display the (physical) CPU mask corresponding to the second package:
$ hwloc-calc package:1
0x000000f0
To display the (physical) CPU mask corresponding to the third pacakge, excluding its even numbered logical processors:
$ hwloc-calc package:2 ~PU:even
0x00000c00
To convert a cpu mask to human-readable output, the -H option can be used to emit a space-delimited list of locations:
$ echo 0x000000f0 | hwloc-calc -H package.core
Package:1.Core1 Package:1.Core:1 Package:1.Core:2 Package:1.Core:3
To use some other character (e.g., a comma) instead of spaces in output, use the --sep option:
$ echo 0x000000f0 | hwloc-calc -H package.core --sep ,
Package:1.Core1,Package:1.Core:1,Package:1.Core:2,Package:1.Core:3
To combine two (physical) CPU masks:
$ hwloc-calc 0x0000ffff 0xff000000
0xff00ffff
To display the list of logical numbers of processors included in the second package:
$ hwloc-calc --intersect PU package:1
4,5,6,7
To bind GNU OpenMP threads logically over the whole machine, we need to use physical number output instead:
$ export GOMP_CPU_AFFINITY=`hwloc-calc --physical-output --intersect PU all`
$ echo $GOMP_CPU_AFFINITY
0,4,1,5,2,6,3,7
To display the list of NUMA nodes, by physical indexes, that intersect a given (physical) CPU mask:
$ hwloc-calc --physical --intersect NUMAnode 0xf0f0f0f0
0,2
To find how many cores are in the second CPU kind (those cores are likely higher-performance and more power-hungry than cores of the first kind):
$ hwloc-calc --cpukind 1 -N core all
4
To display the list of NUMA nodes, by physical indexes, whose locality is exactly equal to a Package:
$ hwloc-calc --local-memory-flags 0 pack:1
4,7
To display the best-capacity NUMA node, by physical indexe, whose locality is exactly equal to a Package:
$ hwloc-calc --local-memory-flags 0 --best-memattr capacity pack:1
4
Converting object logical indexes (default) from/to physical/OS indexes may be performed with --intersect combined with either --physical-output (logical to physical conversion) or --physical-input (physical to logical):
$ hwloc-calc --physical-output PU:2 --intersect PU
3
$ hwloc-calc --physical-input PU:3 --intersect PU
2
One should add --nodeset when converting indexes of memory objects to make sure a single NUMA node index is returned on platforms with heterogeneous memory:
$ hwloc-calc --nodeset --physical-output node:2 --intersect node
3
$ hwloc-calc --nodeset --physical-input node:3 --intersect node
2
To display the set of CPUs near network interface eth0:
$ hwloc-calc os=eth0
0x00005555
To display the indexes of packages near PCI device whose bus ID is 0000:01:02.0:
$ hwloc-calc pci=0000:01:02.0 --intersect Package
1
To display the list of per-package cores that intersect the input:
$ hwloc-calc 0x00003c00 --hierarchical package.core
Package:2.Core:1 Package:3.Core:0
To display the (physical) CPU mask of the entire topology except the third package:
$ hwloc-calc all ~package:3
0x0000f0ff
To combine both physical and logical indexes as input:
$ hwloc-calc PU:2 --physical-input PU:3
0x0000000c
To synthetize a set of cores into largest objects on a 2-node 2-package 2-core machine:
$ hwloc-calc core:0 --largest
Core:0
$ hwloc-calc core:0-1 --largest
Package:0
$ hwloc-calc core:4-7 --largest
NUMANode:1
$ hwloc-calc core:2-6 --largest
Package:1 Package:2 Core:6
$ hwloc-calc pack:2 --largest
Package:2
$ hwloc-calc package:2-3 --largest
NUMANode:1
To get the set of first threads of all cores:
$ hwloc-calc core:all.pu:0
$ hwloc-calc --no-smt all
This can also be very useful in order to make GNU OpenMP use exactly one thread per core, and in logical core order:
$ export OMP_NUM_THREADS=`hwloc-calc --number-of core all`
$ echo $OMP_NUM_THREADS
4
$ export GOMP_CPU_AFFINITY=`hwloc-calc --physical-output --intersect PU
--no-smt all`
$ echo $GOMP_CPU_AFFINITY
0,2,1,3
To export bitmask in a format that is acceptable by the resctrl Linux subsystem (for configuring cache partitioning, etc), apply a sed regexp to the output of hwloc-calc:
$ hwloc-calc pack:all.core:7-9.pu:0
0x00000380,,0x00000380 <this format cannot be given to resctrl>
$ hwloc-calc pack:all.core:7-9.pu:0 | sed -e 's/0x//g' -e 's/,,/,0,/g' -e
's/,,/,0,/g'
00000380,0,00000380
# echo 00000380,0,00000380 > /sys/fs/resctrl/test/cpus
# cat /sys/fs/resctrl/test/cpus
00000000,00000380,00000000,00000380 <the modified bitmask was corrected
parsed by resctrl>
OS devices may also be filtered by subtype. In this example, there are 8 OS devices in the system, 4 of them are near NUMA node #1, and only 2 of these are CoProcessors:
$ utils/hwloc/hwloc-calc -I osdev all
0,1,2,3,4,5,6,7,8
$ utils/hwloc/hwloc-calc -I osdev node:1
5,6,7,8
$ utils/hwloc/hwloc-calc -I coproc node:1
7,8
Upon successful execution, hwloc-calc displays the (physical) CPU mask string, (physical or logical) object list, or (physical or logical) object number list. The return value is 0.
hwloc-calc will return nonzero if any kind of error occurs, such as (but not limited to): failure to parse the command line.
December 14, 2022 | 2.9.0 |