SQLHIST(1) | libtracefs Manual | SQLHIST(1) |
sqlhist - Tool that uses SQL language to create / show creation of tracefs histograms and synthetic events.
sqlhist [OPTIONS] [SQL-select-command]
The sqlhist(1) will take an SQL like statement to create tracefs histograms and synthetic events that can perform various actions for various handling of the data.
The tracefs file system interfaces with the Linux tracing infrastructure that has various dynamic and static events through out the kernel. Each of these events can have a "histogram" attached to it, where the fields of the event will define the buckets of the histogram.
A synthetic event is a way to attach two separate events and use the fields and time stamps of those events to create a new dynamic event. This new dynamic event is call a synthetic event. The fields of each event can have simple calculations done on them where, for example, the delta between a field of one event to a field of the other event can be taken. This also works for the time stamps of the events where the time delta between the two events can also be extracted and placed into the synthetic event.
Other actions can be done from the fields of the events. A snapshot can be taken of the kernel ring buffer a variable used in the synthetic event creating hits a max, or simply changes.
The commands to create histograms and synthetic events are complex and not easy to remember. sqlhist is used to convert SQL syntax into the commands needed to create the histogram or synthetic event.
The SQL-select-command is a SQL string defined by tracefs_sql(3).
Note, this must be run as root (or sudo) as interacting with the tracefs directory requires root privilege, unless the -t option is given with a copy of the tracefs directory and its events.
The sqlhist is a simple program where its code actual exists in the tracefs_sql(3) man page.
-n name
-t tracefs-dir
# mkdir /tmp/tracing # cp -r /sys/kernel/tracing/events /tmp/tracing # exit $ ./sqlhist -t /tmp/tracing ...
-e
-f file
-m var
-c var
-s
-T
-S fields[,fields]
-B instance
Create the sqlhist executable:
man tracefs_sql | sed -ne '/^EXAMPLE/,/FILES/ { /EXAMPLE/d ; /FILES/d ; p}' > sqlhist.c
gcc -o sqlhist sqlhist.c `pkg-config --cflags --libs libtracefs`
As described above, for testing purposes, make a copy of the event directory:
$ mkdir /tmp/tracing
$ sudo cp -r /sys/kernel/tracing/events /tmp/tracing/
$ sudo chmod -R 0644 /tmp/tracing/
For an example of simple histogram output using the copy of the tracefs directory.
$ ./sqlhist -t /tmp/tracing/ 'SELECT CAST(call_site as SYM-OFFSET), bytes_req, CAST(bytes_alloc AS _COUNTER_) FROM kmalloc'
Produces the output:
echo 'hist:keys=call_site.sym-offset,bytes_req:vals=bytes_alloc' > /sys/kernel/tracing/events/kmem/kmalloc/trigger
Which could be used by root:
# echo 'hist:keys=call_site.sym-offset,bytes_req:vals=bytes_alloc' > /sys/kernel/tracing/events/kmem/kmalloc/trigger
# cat /sys/kernel/tracing/events/kmem/kmalloc/hist # event histogram # # trigger info: hist:keys=call_site.sym-offset,bytes_req:vals=hitcount,bytes_alloc:sort=hitcount:size=2048 [active] # { call_site: [ffffffff813f8d8a] load_elf_phdrs+0x4a/0xb0 , bytes_req: 728 } hitcount: 1 bytes_alloc: 1024 { call_site: [ffffffffc0c69e74] nf_ct_ext_add+0xd4/0x1d0 [nf_conntrack] , bytes_req: 128 } hitcount: 1 bytes_alloc: 128 { call_site: [ffffffff818355e6] dma_resv_get_fences+0xf6/0x440 , bytes_req: 8 } hitcount: 1 bytes_alloc: 8 { call_site: [ffffffffc06dc73f] intel_gt_get_buffer_pool+0x15f/0x290 [i915] , bytes_req: 424 } hitcount: 1 bytes_alloc: 512 { call_site: [ffffffff813f8d8a] load_elf_phdrs+0x4a/0xb0 , bytes_req: 616 } hitcount: 1 bytes_alloc: 1024 { call_site: [ffffffff8161a44c] __sg_alloc_table+0x11c/0x180 , bytes_req: 32 } hitcount: 1 bytes_alloc: 32 { call_site: [ffffffffc070749d] shmem_get_pages+0xad/0x5d0 [i915] , bytes_req: 16 } hitcount: 1 bytes_alloc: 16 { call_site: [ffffffffc07507f5] intel_framebuffer_create+0x25/0x60 [i915] , bytes_req: 408 } hitcount: 1 bytes_alloc: 512 { call_site: [ffffffffc06fc20f] eb_parse+0x34f/0x910 [i915] , bytes_req: 408 } hitcount: 1 bytes_alloc: 512 { call_site: [ffffffffc0700ebd] i915_gem_object_get_pages_internal+0x5d/0x270 [i915] , bytes_req: 16 } hitcount: 1 bytes_alloc: 16 { call_site: [ffffffffc0771188] intel_frontbuffer_get+0x38/0x220 [i915] , bytes_req: 400 } hitcount: 1 bytes_alloc: 512 { call_site: [ffffffff8161a44c] __sg_alloc_table+0x11c/0x180 , bytes_req: 128 } hitcount: 1 bytes_alloc: 128 { call_site: [ffffffff813f8f45] load_elf_binary+0x155/0x1680 , bytes_req: 28 } hitcount: 1 bytes_alloc: 32 { call_site: [ffffffffc07038c8] __assign_mmap_offset+0x208/0x3d0 [i915] , bytes_req: 288 } hitcount: 1 bytes_alloc: 512 { call_site: [ffffffff813737b2] alloc_bprm+0x32/0x2f0 , bytes_req: 416 } hitcount: 1 bytes_alloc: 512 { call_site: [ffffffff813f9027] load_elf_binary+0x237/0x1680 , bytes_req: 64 } hitcount: 1 bytes_alloc: 64 { call_site: [ffffffff8161a44c] __sg_alloc_table+0x11c/0x180 , bytes_req: 64 } hitcount: 1 bytes_alloc: 64 { call_site: [ffffffffc040ffe7] drm_vma_node_allow+0x27/0xe0 [drm] , bytes_req: 40 } hitcount: 2 bytes_alloc: 128 { call_site: [ffffffff813cda98] __do_sys_timerfd_create+0x58/0x1c0 , bytes_req: 336 } hitcount: 2 bytes_alloc: 1024 { call_site: [ffffffff818355e6] dma_resv_get_fences+0xf6/0x440 , bytes_req: 40 } hitcount: 2 bytes_alloc: 128 { call_site: [ffffffff8139b75a] single_open+0x2a/0xa0 , bytes_req: 32 } hitcount: 2 bytes_alloc: 64 { call_site: [ffffffff815df715] bio_kmalloc+0x25/0x80 , bytes_req: 136 } hitcount: 2 bytes_alloc: 384 { call_site: [ffffffffc071e5cd] i915_vma_work+0x1d/0x50 [i915] , bytes_req: 416 } hitcount: 3 bytes_alloc: 1536 { call_site: [ffffffff81390d0d] alloc_fdtable+0x4d/0x100 , bytes_req: 56 } hitcount: 3 bytes_alloc: 192 { call_site: [ffffffffc06ff65f] i915_gem_do_execbuffer+0x158f/0x2440 [i915] , bytes_req: 16 } hitcount: 4 bytes_alloc: 64 { call_site: [ffffffff8137713c] alloc_pipe_info+0x5c/0x230 , bytes_req: 384 } hitcount: 5 bytes_alloc: 2560 { call_site: [ffffffff813771b4] alloc_pipe_info+0xd4/0x230 , bytes_req: 640 } hitcount: 5 bytes_alloc: 5120 { call_site: [ffffffff81834cdb] dma_resv_list_alloc+0x1b/0x40 , bytes_req: 40 } hitcount: 6 bytes_alloc: 384 { call_site: [ffffffff81834cdb] dma_resv_list_alloc+0x1b/0x40 , bytes_req: 56 } hitcount: 9 bytes_alloc: 576 { call_site: [ffffffff8120086e] tracing_map_sort_entries+0x9e/0x3e0 , bytes_req: 24 } hitcount: 60 bytes_alloc: 1920 Totals:
Hits: 122
Entries: 30
Dropped: 0
Note, although the examples use uppercase for the SQL keywords, they do not have to be. SELECT could also be select or even sElEcT.
By using the full SQL language, synthetic events can be made and processed. For example, using sqlhist along with trace-cmd(1), wake up latency can be recorded by creating a synthetic event by attaching the sched_waking and the sched_switch events.
# sqlhist -n wakeup_lat -e -T -m lat 'SELECT end.next_comm AS comm, (end.TIMESTAMP_USECS - start.TIMESTAMP_USECS) AS lat FROM ' \
'sched_waking AS start JOIN sched_switch AS end ON start.pid = end.next_pid WHERE end.next_prio < 100 && end.next_comm == "cyclictest"'
# trace-cmd start -e all -e wakeup_lat -R stacktrace
# cyclictest -l 1000 -p80 -i250 -a -t -q -m -d 0 -b 1000 --tracemark
# trace-cmd show -s | tail -30
<idle>-0 [002] dNh4 23454.902246: sched_wakeup: comm=cyclictest pid=12272 prio=120 target_cpu=002
<idle>-0 [005] ...1 23454.902246: cpu_idle: state=4294967295 cpu_id=5
<idle>-0 [007] d..1 23454.902246: cpu_idle: state=0 cpu_id=7
<idle>-0 [002] dNh1 23454.902247: hrtimer_expire_exit: hrtimer=0000000037956dc2
<idle>-0 [005] d..1 23454.902248: cpu_idle: state=0 cpu_id=5
<idle>-0 [002] dNh1 23454.902248: write_msr: 6e0, value 4866ce957272
<idle>-0 [006] ...1 23454.902248: cpu_idle: state=4294967295 cpu_id=6
<idle>-0 [002] dNh1 23454.902249: local_timer_exit: vector=236
<idle>-0 [006] d..1 23454.902250: cpu_idle: state=0 cpu_id=6
<idle>-0 [002] .N.1 23454.902250: cpu_idle: state=4294967295 cpu_id=2
<idle>-0 [002] dN.1 23454.902251: rcu_utilization: Start context switch
<idle>-0 [002] dN.1 23454.902252: rcu_utilization: End context switch
<idle>-0 [001] ...1 23454.902252: cpu_idle: state=4294967295 cpu_id=1
<idle>-0 [002] dN.3 23454.902253: prandom_u32: ret=3692516021
<idle>-0 [001] d..1 23454.902254: cpu_idle: state=0 cpu_id=1
<idle>-0 [002] d..2 23454.902254: sched_switch: prev_comm=swapper/2 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=cyclictest next_pid=12275 next_prio=19
<idle>-0 [002] d..4 23454.902256: wakeup_lat: next_comm=cyclictest lat=17
<idle>-0 [002] d..5 23454.902258: <stack trace>
=> trace_event_raw_event_synth
=> action_trace
=> event_hist_trigger
=> event_triggers_call
=> trace_event_buffer_commit
=> trace_event_raw_event_sched_switch
=> __traceiter_sched_switch
=> __schedule
=> schedule_idle
=> do_idle
=> cpu_startup_entry
=> secondary_startup_64_no_verify
Here’s the options for sqlhist explained:
-n wakeup_lat
-e
-T
-m lat
Now a breakdown of the SQL statement:
'SELECT end.next_comm AS comm, (end.TIMESTAMP_USECS - start.TIMESTAMP_USECS) AS lat FROM ' \
'sched_waking AS start JOIN sched_switch AS end ON start.pid = end.next_pid WHERE end.next_prio < 100 && end.next_comm == "cyclictest"'
end.next_comm AS comm
(end.TIMESTAMP_USECS - start.TIMESTAMP_USECS) AS lat
FROM sched_waking AS start JOIN sched_switch AS end ON start.pid = end.next_pid
WHERE end.next_prio < 100 && end.next_comm == "cyclictest"
For the trace-cmd(3) command:
trace-cmd start -e all -e wakeup_lat -R stacktrace
trace-cmd start
-e all
-e wakeup_lat -R stacktrace
After calling cyclictest (a real-time tool to measure wakeup latency), read the snapshot buffer.
trace-cmd show -s
<idle>-0 [002] d..4 23454.902256: wakeup_lat: next_comm=cyclictest lat=17
We see on the "wakeup_lat" event happened on CPU 2, with a wake up latency 17 microseconds.
This can be extracted into a trace.dat file that trace-cmd(3) can read and do further analysis, as well as kernelshark.
# trace-cmd extract -s
# trace-cmd report --cpu 2 | tail -30
<idle>-0 [002] 23454.902238: prandom_u32: ret=1633425088
<idle>-0 [002] 23454.902239: sched_wakeup: cyclictest:12275 [19] CPU:002
<idle>-0 [002] 23454.902241: hrtimer_expire_exit: hrtimer=0xffffbbd68286fe60
<idle>-0 [002] 23454.902241: hrtimer_cancel: hrtimer=0xffffbbd6826efe70
<idle>-0 [002] 23454.902242: hrtimer_expire_entry: hrtimer=0xffffbbd6826efe70 now=23455294430750 function=hrtimer_wakeup/0x0
<idle>-0 [002] 23454.902243: sched_waking: comm=cyclictest pid=12272 prio=120 target_cpu=002
<idle>-0 [002] 23454.902244: prandom_u32: ret=1102749734
<idle>-0 [002] 23454.902246: sched_wakeup: cyclictest:12272 [120] CPU:002
<idle>-0 [002] 23454.902247: hrtimer_expire_exit: hrtimer=0xffffbbd6826efe70
<idle>-0 [002] 23454.902248: write_msr: 6e0, value 4866ce957272
<idle>-0 [002] 23454.902249: local_timer_exit: vector=236
<idle>-0 [002] 23454.902250: cpu_idle: state=4294967295 cpu_id=2
<idle>-0 [002] 23454.902251: rcu_utilization: Start context switch
<idle>-0 [002] 23454.902252: rcu_utilization: End context switch
<idle>-0 [002] 23454.902253: prandom_u32: ret=3692516021
<idle>-0 [002] 23454.902254: sched_switch: swapper/2:0 [120] R ==> cyclictest:12275 [19]
<idle>-0 [002] 23454.902256: wakeup_lat: next_comm=cyclictest lat=17
<idle>-0 [002] 23454.902258: kernel_stack: <stack trace > => trace_event_raw_event_synth (ffffffff8121a0db) => action_trace (ffffffff8121e9fb) => event_hist_trigger (ffffffff8121ca8d) => event_triggers_call (ffffffff81216c72) => trace_event_buffer_commit (ffffffff811f7618) => trace_event_raw_event_sched_switch (ffffffff8110fda4) => __traceiter_sched_switch (ffffffff8110d449) => __schedule (ffffffff81c02002) => schedule_idle (ffffffff81c02c86) => do_idle (ffffffff8111e898) => cpu_startup_entry (ffffffff8111eba9) => secondary_startup_64_no_verify (ffffffff81000107)
As sqlhist is just example code from a man page, it is guaranteed to contain lots of bugs. For one thing, not all error paths are covered properly.
Written by Steven Rostedt, <rostedt@goodmis.org[1]>
https://git.kernel.org/pub/scm/utils/trace-cmd/trace-cmd.git/
Copyright (C) 2021 , Inc. Free use of this software is granted under the terms of the GNU Public License (GPL).
01/17/2023 | libtracefs 1.6.4 |