GMX-TUNE_PME(1) | GROMACS | GMX-TUNE_PME(1) |
gmx-tune_pme - Time mdrun as a function of PME ranks to optimize settings
gmx tune_pme [-s [<.tpr>]] [-cpi [<.cpt>]] [-table [<.xvg>]]
[-tablep [<.xvg>]] [-tableb [<.xvg>]]
[-rerun [<.xtc/.trr/...>]] [-ei [<.edi>]] [-p [<.out>]]
[-err [<.log>]] [-so [<.tpr>]] [-o [<.trr/.cpt/...>]]
[-x [<.xtc/.tng>]] [-cpo [<.cpt>]]
[-c [<.gro/.g96/...>]] [-e [<.edr>]] [-g [<.log>]]
[-dhdl [<.xvg>]] [-field [<.xvg>]] [-tpi [<.xvg>]]
[-tpid [<.xvg>]] [-eo [<.xvg>]] [-px [<.xvg>]]
[-pf [<.xvg>]] [-ro [<.xvg>]] [-ra [<.log>]]
[-rs [<.log>]] [-rt [<.log>]] [-mtx [<.mtx>]]
[-swap [<.xvg>]] [-bo [<.trr/.cpt/...>]] [-bx [<.xtc>]]
[-bcpo [<.cpt>]] [-bc [<.gro/.g96/...>]] [-be [<.edr>]]
[-bg [<.log>]] [-beo [<.xvg>]] [-bdhdl [<.xvg>]]
[-bfield [<.xvg>]] [-btpi [<.xvg>]] [-btpid [<.xvg>]]
[-bdevout [<.xvg>]] [-brunav [<.xvg>]] [-bpx [<.xvg>]]
[-bpf [<.xvg>]] [-bro [<.xvg>]] [-bra [<.log>]]
[-brs [<.log>]] [-brt [<.log>]] [-bmtx [<.mtx>]]
[-bdn [<.ndx>]] [-bswap [<.xvg>]] [-xvg <enum>]
[-mdrun <string>] [-np <int>] [-npstring <enum>]
[-ntmpi <int>] [-r <int>] [-max <real>] [-min <real>]
[-npme <enum>] [-fix <int>] [-rmax <real>]
[-rmin <real>] [-[no]scalevdw] [-ntpr <int>]
[-steps <int>] [-resetstep <int>] [-nsteps <int>]
[-[no]launch] [-[no]bench] [-[no]check]
[-gpu_id <string>] [-[no]append] [-[no]cpnum]
[-deffnm <string>]
For a given number -np or -ntmpi of ranks, gmx tune_pme systematically times gmx mdrun with various numbers of PME-only ranks and determines which setting is fastest. It will also test whether performance can be enhanced by shifting load from the reciprocal to the real space part of the Ewald sum. Simply pass your .tpr file to gmx tune_pme together with other options for gmx mdrun as needed.
gmx tune_pme needs to call gmx mdrun and so requires that you specify how to call mdrun with the argument to the -mdrun parameter. Depending how you have built GROMACS, values such as 'gmx mdrun', 'gmx_d mdrun', or 'gmx_mpi mdrun' might be needed.
The program that runs MPI programs can be set in the environment variable MPIRUN (defaults to 'mpirun'). Note that for certain MPI frameworks, you need to provide a machine- or hostfile. This can also be passed via the MPIRUN variable, e.g.
export MPIRUN="/usr/local/mpirun -machinefile hosts" Note that in such cases it is normally necessary to compile and/or run gmx tune_pme without MPI support, so that it can call the MPIRUN program.
Before doing the actual benchmark runs, gmx tune_pme will do a quick check whether gmx mdrun works as expected with the provided parallel settings if the -check option is activated (the default). Please call gmx tune_pme with the normal options you would pass to gmx mdrun and add -np for the number of ranks to perform the tests on, or -ntmpi for the number of threads. You can also add -r to repeat each test several times to get better statistics.
gmx tune_pme can test various real space / reciprocal space workloads for you. With -ntpr you control how many extra .tpr files will be written with enlarged cutoffs and smaller Fourier grids respectively. Typically, the first test (number 0) will be with the settings from the input .tpr file; the last test (number ntpr) will have the Coulomb cutoff specified by -rmax with a somewhat smaller PME grid at the same time. In this last test, the Fourier spacing is multiplied with rmax/rcoulomb. The remaining .tpr files will have equally-spaced Coulomb radii (and Fourier spacings) between these extremes. Note that you can set -ntpr to 1 if you just seek the optimal number of PME-only ranks; in that case your input .tpr file will remain unchanged.
For the benchmark runs, the default of 1000 time steps should suffice for most MD systems. The dynamic load balancing needs about 100 time steps to adapt to local load imbalances, therefore the time step counters are by default reset after 100 steps. For large systems (>1M atoms), as well as for a higher accuracy of the measurements, you should set -resetstep to a higher value. From the 'DD' load imbalance entries in the md.log output file you can tell after how many steps the load is sufficiently balanced. Example call:
gmx tune_pme -np 64 -s protein.tpr -launch
After calling gmx mdrun several times, detailed performance information is available in the output file perf.out. Note that during the benchmarks, a couple of temporary files are written (options -b*), these will be automatically deleted after each test.
If you want the simulation to be started automatically with the optimized parameters, use the command line option -launch.
Basic support for GPU-enabled mdrun exists. Give a string containing the IDs of the GPUs that you wish to use in the optimization in the -gpu_id command-line argument. This works exactly like mdrun -gpu_id, does not imply a mapping, and merely declares the eligible set of GPU devices. gmx-tune_pme will construct calls to mdrun that use this set appropriately. gmx-tune_pme does not support -gputasks.
Options to specify input files:
Options to specify output files:
Other options:
More information about GROMACS is available at <http://www.gromacs.org/>.
2023, GROMACS development team
February 3, 2023 | 2022.5 |