SGE_PE(5) | Grid Engine File Formats | SGE_PE(5) |
sge_pe - Grid Engine parallel environment configuration file format
Parallel environments are parallel programming and runtime environments supporting the execution of shared memory or distributed memory parallelized applications. Parallel environments usually require some kind of setup to be operational before starting parallel applications. Examples of common parallel environments are OpenMP on shared memory multiprocessor systems, and Message Passing Interface (MPI) on shared memory or distributed systems.
sge_pe allows for the definition of interfaces to arbitrary parallel environments. Once a parallel environment is defined or modified with the -ap or -mp options to qconf(1) and linked with one or more queues via pe_list in queue_conf(5) the environment can be requested for a job via the -pe switch to qsub(1) together with a request for a numeric range of parallel processes to be allocated by the job. Additional -l options may be used to specify more detailed job requirements.
Note, Grid Engine allows backslashes (\) be used to escape newline characters. The backslash and the newline are replaced with a space character before any interpretation.
The format of a sge_pe file is defined as follows:
The name of the parallel environment in the format for pe_name in sge_types(1). To be used in the qsub(1) -pe switch.
The total number of slots (normally one per parallel process or thread) allowed to be filled concurrently under the parallel environment. Type is integer, valid values are 0 to 9999999.
A comma-separated list of user access list names (see access_list(5)).
Each user contained in at least one of the user_lists access lists has access to the parallel environment. If the user_lists parameter is set to NONE (the default) any user has access if not explicitly excluded via the xuser_lists parameter.
Each user contained in at least one of the xuser_lists access lists is not allowed to access the parallel environment. If the xuser_lists parameter is set to NONE (the default) any user has access.
If a user is contained both in an access list in xuser_lists and user_lists the user is denied access to the parallel environment.
The command line respectively of a startup or shutdown procedure (an executable command, plus possible arguments) for the parallel environment, or "none" for no procedure (typically for tightly integrated PEs). The command line is started directly, not in a shell. An optional prefix "user@" specifies the username under which the procedure is to be started. In that case see the SECURITY section below concerning security issues running as a privileged user.
The startup procedure is invoked by sge_shepherd(8) on the master node of the job prior to executing the job script. Its purpose is to setup the parallel environment according to its needs. The shutdown procedure is invoked by sge_shepherd(8) after the job script has finished. Its purpose is to stop the parallel environment and to remove it from all participating systems. The standard output of the procedure is redirected to the file REQUEST.poJID in the job's working directory (see qsub(1)), with REQUEST being the name of the job as displayed by qstat(1), and JID being the job's identification number. Likewise, the standard error output is redirected to REQUEST.peJID. If the -e or -o options are given on job submission, the PE error and standard output is merged into the paths specified.
The following special variables, expanded at runtime, can be used (besides any other strings which have to be interpreted by the start and stop procedures) to constitute a command line:
The start and stop commands are run with the same environment setting as that of the job to be started afterwards (see qsub(1)).
The allocation rule is interpreted by the scheduler thread and helps the scheduler to decide how to distribute parallel processes among the available machines. If, for instance, a parallel environment is built for shared memory applications only, all parallel processes have to be assigned to a single machine, no matter how many suitable machines are available. If, however, the parallel environment follows the distributed memory paradigm, an even distribution of processes among machines may be favorable, as may packing processes onto the minimum number of machines.
The current version of the scheduler only understands the following allocation rules:
This parameter can be set to TRUE or FALSE (the default). It indicates whether Grid Engine is the creator of the slave tasks of a parallel application via sge_execd(8) and sge_shepherd(8) and thus has full control over all processes in a parallel application ("tight integration"). This enables:
To gain control over the slave tasks of a parallel application, a sophisticated PE interface is required, which works closely together with Grid Engine facilities, typically interpreting the Grid Engine hostfile and starting remote tasks with qrsh(1) and its -inherit option. See, for instance, the $SGE_ROOT/mpi directory and the howto pages ⟨URL: http://arc.liv.ac.uk/SGE/howto/#Tight%20Integration%20of%20Parallel%20Libraries ⟩.
Please set the control_slaves parameter to false for all other PE interfaces.
The job_is_first_task parameter can be set to TRUE or FALSE. A value of TRUE indicates that the Grid Engine job script already contains one of the tasks of the parallel application (and the number of slots reserved for the job is the number of slots requested with the -pe switch). FALSE indicates that the job script (and its child processes) is not part of the parallel program, just being used to kick off the tasks that do the work; then the number of slots reserved for the job in the master queue is increased by 1, as indicated by qstat/qhost.
This should be TRUE for the common modern MPI implementations with tight integration. Consider if the allocation rule is $fill_up, and a job is allocated only a single slot on the master host; then one of the MPI processes actually runs in that slot, and should be accounted as such, so the job is the first task.
If wallclock accounting is used (execd_params
ACCT_RESERVED_USAGE
and/or SHARETREE_RESERVED_USAGE Is TRUE) and
control_slaves is set to FALSE, the job_is_first_task
parameter influences the accounting for the job: A value of TRUE means that
accounting for CPU and requested memory gets multiplied by the number of
slots requested with the -pe switch. FALSE means the accounting information
gets multiplied by number of slots + 1. Otherwise, the only significant
effect of the parameter is on the display of the job.
For pending jobs with a slot range PE request with different minimum and maximum, the number of slots they will actually use is not determined. This setting specifies the method to be used by Grid Engine to assess the number of slots such jobs might finally get.
The assumed slot allocation has a meaning when determining the resource-request-based priority contribution for numeric resources as described in sge_priority(5) and is displayed when qstat(1) is run without -g t option.
The following methods are supported:
This parameter is only checked if control_slaves (see above) is set to TRUE and thus Grid Engine is the creator of the slave tasks of a parallel application via sge_execd(8) and sge_shepherd(8). In this case, accounting information is available for every single slave task started by Grid Engine.
The accounting_summary parameter can be set to TRUE or FALSE. A value of TRUE indicates that only a single accounting record is written to the accounting(5) file, containing the accounting summary of the whole job, including all slave tasks, while a value of FALSE indicates an individual accounting(5) record is written for every slave task, as well as for the master task.
Note: When running tightly integrated jobs with SHARETREE_RESERVED_USAGE set, and accounting_summary enabled in the parallel environment, reserved usage will only be reported by the master task of the parallel job. No per-parallel task usage records will be sent from execd to qmaster, which can significantly reduce load on the qmaster when running large, tightly integrated parallel jobs. However, this removes the only post-hoc information about which hosts a job used.
Specifies a method for specifying the queues/hosts and order that should be used to schedule a parallel job. For details, and the API, consult the header file $SGE_ROOT/include/sge_pqs_api.h. library is the path to the qsort dynamic library, qsort-function is the name of the qsort function implemented by the library, and the args are arguments passed to qsort. Substitutions from the hard requested resource list for the job are made for any strings of the form $resource, where resource is the full name of the resource as defined in the complex(5) list. If resource is not requested in the job, a null string is substituted.
Note that the functionality of the start and stop procedures remains the full responsibility of the administrator configuring the parallel environment. Grid Engine will invoke these procedures and evaluate their exit status. A non-zero exit status will put the queue into an error state. If the start procedure has a non-zero exit status, the job will be re-queued.
If start_proc_args, or stop_proc_args is specified with a user@ prefix, the same considerations apply as for the prolog and epilog, as described in the SECURITY section of sge_conf(5).
sge_intro(1), sge__types(1), qconf(1), qdel(1), qmod(1), qrsh(1), qsub(1), access_list(5), sge_conf(5), sge_qmaster(8), sge_shepherd(8).
$SGE_ROOT/include/sge_pqs_api.h
See sge_intro(1) for a full statement of rights and permissions.
2012-09-11 | SGE 8.1.3pre |