ROC-SMI(1) | User Commands | ROC-SMI(1) |
rocm-smi - a tool to monitor AMD accelerators and GPUs
roc-smi [-h] [-d DEVICE [DEVICE ...]] [--alldevices] [--showhw] [-a] [-i] [-v] [-e [EVENT ...]]
Radeon Open Compute Platform (ROCm) - System Management Interface (SMI) - Command Line Interface (CLI).
rocm-smi is the python reference implementation of a CLI, from AMD, over its C system management library.
This tool acts as a command line interface for manipulating and monitoring the amdgpu kernel, and is intended to replace and deprecate the existing rocm_smi.py CLI tool. It uses Ctypes to call the rocm_smi_lib API. Recommended: At least one AMD GPU with ROCm driver installed Required: ROCm SMI library installed (librocm_smi64).
--alldevices
The above four flags display the different "bad pages" as reported by the kernel. The three types of pages are: Retired pages (reserved pages) - These pages are reserved and are unable to be used Pending pages - These pages are pending for reservation, and will be reserved/retired Unreservable pages - These pages are not reservable for some reason.
This used to indicate how busy the respective blocks are. For example, for --showuse (gpu_busy_percent sysfs file), the SMU samples every ms or so to see if any GPU block (RLC, MEC, PFP, CP) is busy. If so, that's 1 (or high). If not, that's 0 (low). If we have 5 high and 5 low samples, that means 50% utilization (50% GPU busy, or 50% GPU use). The windows and sampling vary from generation to generation, but that is how GPU and VRAM use is calculated in a generic sense. --showmeminfo (and VRAM% in concise output) will show the amount of VRAM used (visible, total, GTT), as well as the total available for those partitions. The percentage shown there indicates the amount of used memory in terms of current allocations.
This shows an approximation of the number of bytes received and sent by the GPU over the last second through the PCIe bus. Note that this will not work for APUs since data for the GPU portion of the APU goes through the memory fabric and does not 'enter/exit' the chip via the PCIe interface, thus no accesses are generated, and the performance counters can't count accesses that are not generated. NOTE: It is not possible to easily grab the size of every packet that is transmitted in real time, so the kernel estimates the bandwidth by taking the maximum payload size (mps), which is the max size that a PCIe packet can be. and multiplies it by the number of packets received and sent. This means that the SMI will report the maximum estimated bandwidth, the actual usage could (and likely will be) less.
Clock type | Description |
DCEFCLK | DCE (Display) |
FCLK | Data fabric (VG20 and later) - Data flow from XGMI, Memory, PCIe |
SCLK | GFXCLK (Graphics core) |
Note | SOCCLK split from SCLK as of Vega10. Pre-Vega10 they were both controlled by SCLK |
MCLK | GPU Memory (VRAM) |
PCLK | PCIe bus |
Note | This gives 2 speeds, PCIe Gen1 x1 and the highest available based on the hardware |
SOCCLK | System clock (VG10 and later) - DF, MM HUB, AT HUB, SYSTEM HUB, OSS, DFD |
Note | DF split from SOCCLK as of Vega20. Pre-Vega20 they were both controlled by SOCCLK |
This allows the user to see the amount of used and total memory for a given block (vram, vis_vram, gtt). It returns the number of bytes used and total number of bytes for each block 'all' can be passed as a field to return all blocks, otherwise a quoted-string is used for multiple values (e.g. "vram vis_vram") vram refers to the Video RAM, or graphics memory, on the specified device vis_vram refers to Visible VRAM, which is the CPU-accessible video memory on the device gtt refers to the Graphics Translation Table.
This shows the RAS information for a given block. This includes enablement of the block (currently GFX, SDMA and UMC are the only supported blocks) and the number of errors ue - Uncorrectable errors ce - Correctable errors.
The two above options allow you to set a mask for the levels. For example, if a GPU has 8 clock levels, you can set a mask to use levels 0, 5, 6 and 7 with --setsclk 0 5 6 7 . This will only use the base level, and the top 3 clock levels. This will allow you to keep the GPU at base level when there is no GPU load, and the top 3 levels when the GPU load increases.
NOTES:
The clock levels will change dynamically based on GPU load based on the
default
Compute and Graphics profiles. The thresholds and delays for a custom
mask cannot
be controlled through the SMI tool.
This flag automatically sets the Performance Level to "manual"
as the mask is not
applied when the Performance level is set to auto.
This sets the fan speed to a value ranging from 0 to maxlevel,
or from 0%-100% If the level ends with a %, the fan speed is calculated
as pct*maxlevel/100
(maxlevel is usually 255, but is determined by the ASIC).
NOTE: While the hardware is usually capable of overriding this
value when required, it is
recommended to not set the fan level lower than the default value for
extended periods
of time.
This lets you use the pre-defined Performance Level values for clocks and power profile, which can include: auto (Automatically change values based on GPU workload) low (Keep values low, regardless of workload) high (Keep values high, regardless of workload) manual (Only use values defined by --setsclk and --setmclk).
The above two options are DEPRECATED IN NEWER KERNEL VERSIONS (use --setslevel/--setmlevel instead) This sets the percentage above maximum for the max Performance Level. For example, --setoverdrive 20 will increase the top sclk level by 20%, similarly --setmemoverdrive 20 will increase the top mclk level by 20%. Thus if the maximum clock level is 1000MHz, then --setoverdrive 20 will increase the maximum clock to 1200MHz.
NOTES:
This option can be used in conjunction with the --setsclk/--setmclk
mask.
Operating the GPU outside of specifications can cause irreparable
damage to your hardware.
Please observe the warning displayed when using this option.
This flag automatically sets the clock to the highest level, as only the
highest level is
increased by the OverDrive value.
This allows users to change the maximum power available to a GPU package. The input value is in Watts. This limit is enforced by the hardware, and some cards allow users to set it to a higher value than the default that ships with the GPU. This Power OverDrive mode allows the GPU to run at higher frequencies for longer periods of time, though this may mean the GPU uses more power than it is allowed to use per power supply specifications. Each GPU has a model-specific maximum Power OverDrive that is will take; attempting to set a higher limit than that will cause this command to fail.
NOTES:
Operating the GPU outside of specifications can cause irreparable
damage to your hardware.
Please observe the warning displayed when using this option.
The Compute Profile accepts 1 or n parameters, either the Profile to select (see --showprofile for a list of preset Power Profiles) or a quoted string of values for the CUSTOM profile. NOTE: These values can vary based on the ASIC, and may include:
SCLK_PROFILE_ENABLE - Whether or not to apply the 3 following SCLK settings (0=disable,1=enable) NOTE: This is a hidden field. If set to 0, the following 3 values are displayed as '-’.
Setting | Description |
SCLK_UP_HYST | Delay before sclk is increased (in milliseconds) |
SCLK_DOWN_HYST | Delay before sclk is decresed (in milliseconds) |
SCLK_ACTIVE_LEVEL | Workload required before sclk levels change (in %) |
MCLK_PROFILE_ENABLE - Whether or not to apply the 3 following MCLK settings (0=disable,1=enable) NOTE: This is a hidden field. If set to 0, the following 3 values are displayed as '-'.
Setting | Description |
MCLK_UP_HYST | Delay before mclk is increased (in milliseconds) |
MCLK_DOWN_HYST | Delay before mclk is decresed (in milliseconds) |
MCLK_ACTIVE_LEVEL | Workload required before mclk levels change (in %) |
Other settings:
Setting | Description |
BUSY_SET_POINT | Threshold for raw activity level before levels change |
FPS | Frames Per Second |
USE_RLC_BUSY | When set to 1, DPM is switched up as long as RLC busy message is received |
MIN_ACTIVE_LEVEL | Workload required before levels change (in %) |
NOTES:
When a compute queue is detected, the COMPUTE Power Profile values will
be automatically
applied to the system, provided that the Perf Level is set to
"auto".
The CUSTOM Power Profile is only applied when the Performance Level is
set to "manual"
so using this flag will automatically set the performance level to
"manual".
It is not possible to modify the non-CUSTOM Profiles. These are
hard-coded by the kernel.
Enabling OverDrive requires both a card that support OverDrive and a driver parameter that enables its use. Because OverDrive features can damage your card, most workstation and server GPUs cannot use OverDrive. Consumer GPUs that can use OverDrive must enable this feature by setting bit 14 in the amdgpu driver's ppfeaturemask module parameter.
For OverDrive functionality, the OverDrive bit (bit 14) must be enabled (by default, the OverDrive bit is disabled on the ROCK and upstream kernels). This can be done by setting amdgpu.ppfeaturemask accordingly in the kernel parameters, or by changing the default value inside amdgpu_drv.c (if building your own kernel).
As an example, if the ppfeaturemask is set to 0xffffbfff (11111111111111111011111111111111), then enabling the OverDrive bit would make it 0xffffffff (11111111111111111111111111111111).
These are the flags that require OverDrive functionality to be enabled for the flag to work: --showclkvolt --showvoltagerange --showvc --showsclkrange --showmclkrange --setslevel --setmlevel --setoverdrive --setpoweroverdrive --resetpoweroverdrive --setvc --setsrange --setmrange
The information contained herein is for informational purposes only, and is subject to change without notice. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described herein.
Copyright (c) 2014-2022 Advanced Micro Devices, Inc. All rights reserved.
The present manpage has been aggregated from the help output of rocm-smi and the readme github page, by Maxime Chambonnet. This work is made available under the Expat license.
1.4.1
The SMI will report a "version" which is the version of the kernel installed: uname. For ROCk installations, this will be the AMDGPU module version (e.g. 5.0.71) For non-ROCk or monolithic ROCk installations, this will be the kernel version, which will be equivalent to the following bash command: uname -a | cut -d ' ' -f 3
Please report bugs to rocm.smi.lib@amd.com, and in last resort to debian-ai@lists.debian.org .
AMD Research and AMD HSA Software Development
Advanced Micro Devices, Inc.
www.amd.com
The full local documentation for the C rocm-smi library is available with the binary deb package librocm-smi-dev, and is installed at: /usr/share/doc/librocm-smi-dev/ROCm_SMI_Manual.pdf .
The documentation for rocm-smi is maintained as a README markdown file at https://github.com/RadeonOpenCompute/rocm_smi_lib/blob/master/python_smi_tools/README.md .
2022-01-30 | rocm-smi 1.4.1 |