PT-DISKSTATS(1p) | User Contributed Perl Documentation | PT-DISKSTATS(1p) |
pt-diskstats - An interactive I/O monitoring tool for GNU/Linux.
Usage: pt-diskstats [OPTIONS] [FILES]
pt-diskstats prints disk I/O statistics for GNU/Linux. It is somewhat similar to iostat, but it is interactive and more detailed. It can analyze samples gathered from another machine.
Percona Toolkit is mature, proven in the real world, and well tested, but all database tools can pose a risk to the system and the database server. Before using this tool, please:
The pt-diskstats tool is similar to iostat, but has some advantages. It prints read and write statistics separately, and has more columns. It is menu-driven and interactive, with several different ways to aggregate the data. It integrates well with the pt-stalk tool. It also does the "right thing" by default, such as hiding disks that are idle. These properties make it very convenient for quickly drilling down into I/O performance and inspecting disk behavior.
This program works in two modes. The default is to collect samples of /proc/diskstats and print out the formatted statistics at intervals. The other mode is to process a file that contains saved samples of /proc/diskstats; there is a shell script later in this documentation that shows how to collect such a file.
In both cases, the tool is interactively controlled by keystrokes, so you can redisplay and slice the data flexibly and easily. It loops forever, until you exit with the 'q' key. If you press the '?' key, you will bring up the interactive help menu that shows which keys control the program.
When the program is gathering samples of /proc/diskstats and refreshing its display, it prints information about the newest sample each time it refreshes. When it is operating on a file of saved samples, it redraws the entire file's contents every time you change an option.
The program doesn't print information about every block device on the system. It hides devices that it has never observed to have any activity. You can enable and disable this by pressing the 'i' key.
In the rest of this documentation, we will try to clarify the distinction between block devices (/dev/sda1, for example), which the kernel presents to the application via a filesystem, versus the (usually) physical device underneath the block device, which could be a disk, a RAID controller, and so on. We will sometimes refer to logical I/O operations, which occur at the block device, versus physical I/Os which are performed on the underlying device. When we refer to the queue, we are speaking of the queue associated with the block device, which holds requests until they're issued to the physical device.
The program's output looks like the following sample, which is too wide for this manual page, so we have formatted it as several samples with line breaks:
#ts device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt {6} sda 0.9 4.2 0.0 0% 0.0 17.9 {6} sdb 0.4 4.0 0.0 0% 0.0 26.1 {6} dm-0 0.0 4.0 0.0 0% 0.0 13.5 {6} dm-1 0.8 4.0 0.0 0% 0.0 16.0 ... wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt ... 99.7 6.2 0.6 35% 3.7 23.7 ... 14.5 15.8 0.2 75% 0.5 9.2 ... 1.0 4.0 0.0 0% 0.0 2.3 ... 117.7 4.0 0.5 0% 4.1 35.1 ... busy in_prg io_s qtime stime ... 6% 0 100.6 23.3 0.4 ... 4% 0 14.9 8.6 0.6 ... 0% 0 1.1 1.5 1.2 ... 5% 0 118.5 34.5 0.4
The columns are as follows:
In the "all" group-by mode, this column shows timestamp offsets, relative to the time the tool began aggregating or the timestamp of the previous lines printed, depending on the mode. The output can be confusing to explain, but it's rather intuitive when you see the lines appearing on your screen periodically.
Similarly, in "sample" group-by mode, the number indicates the total time span that is grouped into each sample.
If you specify "--show-timestamps", this field instead shows the timestamp at which the sample was taken; if multiple timestamps are present in a single line of output, then the first timestamp is used.
This field is computed from the contents of /proc/diskstats as follows. See "KERNEL DOCUMENTATION" below for the meaning of the field numbers:
delta[field1] / delta[time]
2 * delta[field3] / delta[field1]
2 * delta[field3] / delta[time]
100 * delta[field2] / (delta[field2] + delta[field1])
delta[field4] / delta[time] / 1000 / devices-in-group
delta[field4] / (delta[field1] + delta[field2])
100 * delta[field10] / (1000 * delta[time])
This field cannot exceed 100% unless there is a rounding error, but it is a common mistake to think that a device that's busy all the time is saturated. A device such as a RAID volume should support concurrency higher than 1, and solid-state drives can support very high concurrency. Concurrency can grow without bound, and is a more reliable indicator of how loaded the device really is.
field9
It is computed in a slightly complex way: the average response time seen by the application, minus the average service time (see the description of the next column). This is derived from the queueing theory formula for response time, R = W + S: response time = queue time + service time. This is solved for W, of course, to give W = R - S. The computation follows:
delta[field11] / (delta[field1, 2, 5, 6] + delta[field9]) - delta[field10] / delta[field1, 2, 5, 6]
See the description for "stime" for more details and cautions.
delta[field10] / (delta[field1, 2, 5, 6])
Note, however, that there can be some kernel bugs that cause field 9 in /proc/diskstats to become negative, and this can cause field 10 to be wrong, thus making the service time computation not wholly trustworthy.
Note that in the above formula we use utilization very specifically. It is a duration, not a percentage.
You can compare the stime and qtime columns to see whether the response time for reads and writes is spent in the queue or on the physical device. However, you cannot see the difference between reads and writes. Changing the block device scheduler algorithm might improve queue time greatly. The default algorithm, cfq, is very bad for servers, and should only be used on laptops and workstations that perform tasks such as working with spreadsheets and surfing the Internet.
If you are used to using iostat, you might wonder where you can find the same information in pt-diskstats. Here are two samples of output from both tools on the same machine at the same time, for /dev/sda, wrapped to fit:
#ts dev rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt 08:50:10 sda 0.0 0.0 0.0 0% 0.0 0.0 08:50:20 sda 0.4 4.0 0.0 0% 0.0 15.5 08:50:30 sda 2.1 4.4 0.0 0% 0.0 21.1 08:50:40 sda 2.4 4.0 0.0 0% 0.0 15.4 08:50:50 sda 0.1 4.0 0.0 0% 0.0 33.0 wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt 7.7 25.5 0.2 84% 0.0 0.3 49.6 6.8 0.3 41% 2.4 28.8 210.1 5.6 1.1 28% 7.4 25.2 297.1 5.4 1.6 26% 11.4 28.3 11.9 11.7 0.1 66% 0.2 4.9 busy in_prg io_s qtime stime 1% 0 7.7 0.1 0.2 6% 0 50.0 28.1 0.7 12% 0 212.2 24.8 0.4 16% 0 299.5 27.8 0.4 1% 0 12.0 4.7 0.3 Dev rrqm/s wrqm/s r/s w/s rMB/s wMB/s 08:50:10 sda 0.00 41.40 0.00 7.70 0.00 0.19 08:50:20 sda 0.00 34.70 0.40 49.60 0.00 0.33 08:50:30 sda 0.00 83.30 2.10 210.10 0.01 1.15 08:50:40 sda 0.00 105.10 2.40 297.90 0.01 1.58 08:50:50 sda 0.00 22.50 0.10 11.10 0.00 0.13 avgrq-sz avgqu-sz await svctm %util 51.01 0.02 2.04 1.25 0.96 13.55 2.44 48.76 1.16 5.79 11.15 7.45 35.10 0.55 11.76 10.81 11.40 37.96 0.53 15.97 24.07 0.17 15.60 0.87 0.97
The correspondence between the columns is not one-to-one. In particular:
It is straightforward to gather a sample of data for this tool. Files should have this format, with a timestamp line preceding each sample of statistics:
TS <timestamp> <contents of /proc/diskstats> TS <timestamp> <contents of /proc/diskstats> ... et cetera
You can simply use pt-diskstats with "--save-samples" to collect this data for you. If you wish to capture samples as part of some other tool, and use pt-diskstats to analyze them, you can include a snippet of shell script such as the following:
INTERVAL=1 while true; do sleep=$(date +%s.%N | awk "{print $INTERVAL - (\$1 % $INTERVAL)}") sleep $sleep date +"TS %s.%N %F %T" >> diskstats-samples.txt cat /proc/diskstats >> diskstats-samples.txt done
This documentation supplements the official documentation <http://www.kernel.org/doc/Documentation/iostats.txt> on the contents of /proc/diskstats. That documentation can sometimes be difficult to understand for those who are not familiar with Linux kernel internals. The contents of /proc/diskstats are generated by the "diskstats_show()" function in the kernel source file block/genhd.c.
Here is a sample of /proc/diskstats on a recent kernel.
8 1 sda1 426 243 3386 2056 3 0 18 87 0 2135 2142
The fields in this sample are as follows. The first three fields are the major and minor device numbers (8, 1), and the device name (sda1). They are followed by 11 fields of statistics:
This tool accepts additional command-line arguments. Refer to the "SYNOPSIS" and usage information for details.
Print columns that match this Perl regex.
Read this comma-separated list of config files; if specified, this must be the first option on the command line.
Print devices that match this Perl regex.
Group-by mode: disk, sample, or all. In disk mode, each line of output shows one disk device, with the statistics computed since the tool started. In sample mode, each line of output shows one sample of statistics, with all disks averaged together. In all mode, each line of output shows one sample and one disk device.
If "group" is present, each sample will be separated by a blank line, unless the sample is only one line. If "scroll" is present, the tool will print the headers as often as needed to prevent them from scrolling out of view. Note that you can press the space bar, or the enter key, to reprint headers at will.
When in interactive mode, wait N seconds before printing to the screen. Also, how often the tool should sample /proc/diskstats.
The tool attempts to gather statistics exactly on even intervals of clock time. That is, if you specify a 5-second interval, it will try to capture samples at 12:00:00, 12:00:05, and so on; it will not gather at 12:00:01, 12:00:06 and so forth.
This can lead to slightly odd delays in some circumstances, because the tool waits one full cycle before printing out the first set of lines. (Unlike iostat and vmstat, pt-diskstats does not start with a line representing the averages since the computer was booted.) Therefore, the rule has an exception to avoid very long delays. Suppose you specify a 10-second interval, but you start the tool at 12:00:00.01. The tool might wait until 12:00:20 to print its first lines of output, and in the intervening 19.99 seconds, it would appear to do nothing.
To alleviate this, the tool waits until the next even interval of time to gather, unless more than 20% of that interval remains. This means the tool will never wait more than 120% of the sampling interval to produce output, e.g if you start the tool at 12:00:53 with a 10-second sampling interval, then the first sample will be only 7 seconds long, not 10 seconds.
When in interactive mode, stop after N samples. Run forever by default.
In --group-by sample mode, include N seconds of samples per group.
File to save diskstats samples in; these can be used for later analysis.
Check for the latest version of Percona Toolkit, MySQL, and other programs.
This is a standard "check for updates automatically" feature, with two additional features. First, the tool checks its own version and also the versions of the following software: operating system, Percona Monitoring and Management (PMM), MySQL, Perl, MySQL driver for Perl (DBD::mysql), and Percona Toolkit. Second, it checks for and warns about versions with known problems. For example, MySQL 5.5.25 had a critical bug and was re-released as 5.5.25a.
A secure connection to Percona’s Version Check database server is done to perform these checks. Each request is logged by the server, including software version numbers and unique ID of the checked system. The ID is generated by the Percona Toolkit installation script or when the Version Check database call is done for the first time.
Any updates or known problems are printed to STDOUT before the tool's normal output. This feature should never interfere with the normal operation of the tool.
For more information, visit <https://www.percona.com/doc/percona-toolkit/LATEST/version-check.html>.
The environment variable "PTDEBUG" enables verbose debugging output to STDERR. To enable debugging and capture all output to a file, run the tool like:
PTDEBUG=1 pt-diskstats ... > FILE 2>&1
Be careful: debugging output is voluminous and can generate several megabytes of output.
This tool requires Perl v5.8.0 or newer and the /proc filesystem, unless reading from files.
For a list of known bugs, see <http://www.percona.com/bugs/pt-diskstats>.
Please report bugs at <https://jira.percona.com/projects/PT>. Include the following information in your bug report:
If possible, include debugging output by running the tool with "PTDEBUG"; see "ENVIRONMENT".
Visit <http://www.percona.com/software/percona-toolkit/> to download the latest release of Percona Toolkit. Or, get the latest release from the command line:
wget percona.com/get/percona-toolkit.tar.gz wget percona.com/get/percona-toolkit.rpm wget percona.com/get/percona-toolkit.deb
You can also get individual tools from the latest release:
wget percona.com/get/TOOL
Replace "TOOL" with the name of any tool.
Baron Schwartz, Brian Fraser, and Daniel Nichter
This tool is part of Percona Toolkit, a collection of advanced command-line tools for MySQL developed by Percona. Percona Toolkit was forked from two projects in June, 2011: Maatkit and Aspersa. Those projects were created by Baron Schwartz and primarily developed by him and Daniel Nichter. Visit <http://www.percona.com/software/> to learn about other free, open-source software from Percona.
This program is copyright 2011-2018 Percona LLC and/or its affiliates, 2010-2011 Baron Schwartz.
THIS PROGRAM IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2; OR the Perl Artistic License. On UNIX and similar systems, you can issue `man perlgpl' or `man perlartistic' to read these licenses.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
pt-diskstats 3.2.1
Hey! The above document had some coding errors, which are explained below:
2020-08-30 | perl v5.30.3 |