LIKWID-BENCH(1) | General Commands Manual | LIKWID-BENCH(1) |
likwid-bench - low-level benchmark suite and microbenchmarking framework
likwid-bench [-hap] [-t <testname>] [-s <min_time>] [-w <workgroup_expression>] [-l <testname>] [-d <delimiter>] [-i <iterations>]
likwid-bench is a benchmark suite for low-level (assembly) benchmarks to measure bandwidths and instruction throughput for specific instruction code on x86 systems. The currently included benchmark codes include common data access patterns like load and store but also calculations like vector triad and sum. likwid-bench includes architecture specific benchmarks for x86, x86_64 and x86 for Intel Xeon Phi coprocessors. The performance values can either be calculated by likwid-bench or measured using performance counters by using likwid-perfctr as a wrapper to likwid-bench. This requires to build likwid-bench with instrumentation enabled in config.mk.
<thread_domain>:<size> [:<num_threads>[:<chunk_size>:<stride>]] [-<streamId>:<domain_id>] with size in kB, MB or GB. The <thread_domain> defines where the threads are placed. <size> is the total data set size for the benchmark, the allocated vectors in memory sum up to this size. <num_threads> specifies how many threads are used in the <thread_domain>. Threads are always placed using a compact policy in likwid-bench. This means that per default all SMT threads are used. Optionally similar a the expression based syntax in likwid-pin a <chunk_size> and <stride> can be provided. Optionally for every stream (array, vector) the placement can be controlled. Per default all arrays are placed in the same <thread_domain> the threads are running in. To place the data in a different domain for every stream of a benchmark case (the total number of streams can be acquired by the -l option) the domain to place the data in can be specified. Multiple streams are comma separated. Either the placement is provided or all streams have to be explicitly placed. Please refer to the Wiki pages on http://code.google.com/p/likwid/wiki/LikwidBench for further details and examples on usage.
Since no <num_threads> is given in the workload expression, each core of socket 0 gets one thread. The workload is split up between all threads and the number of iterations is determined automatically.
Assuming socket 0 ( S0 ) has 2 physical cores with SMT enabled, hence in total 4 hardware threads, one thread is assigned to each physical core of socket 0.
The results of both workgroups are combinded for the output. Hence the workload in each workgroup expression should have the same size.
likwid-perfctr will configure and start the performance counters on socket 0 ( S0 ) with 4 threads prior to the execution of likwid-bench. The performance counters are read right before and after running the benchmarking code to minimize the interferences of the measurement.
Stream id 0 and 1 are placed in thread domains S1, which is socket 1. This can be verified as the initialization threads output where they are running.
Written by Thomas Roehl <thomas.roehl@googlemail.com>.
Report Bugs on <https://github.com/RRZE-HPC/likwid/issues>.
likwid-perfctr(1), likwid-pin(1), likwid-topology(1), likwid-setFrequencies(1)
26.11.2018 | likwid-4 |