SGE_SHEPHERD(8) | Grid Engine Administrative Commands | SGE_SHEPHERD(8) |
sge_shepherd - Grid Engine single job-controlling agent
sge_shepherd
sge_shepherd provides the parent process functionality for a single Grid Engine job. The parent functionality is necessary on UNIX systems to retrieve resource usage information (see getrusage(2)) after a job has finished. In addition, the sge_shepherd forwards signals to the job, such for suspension, enabling, termination, and the Grid Engine checkpointing signal (see sge_ckpt(1) and queue_conf(5) for details).
The sge_shepherd receives information about the job to be started from the sge_execd(8). During the execution of the job it actually starts up to 5 child processes. First a prolog script is run if this feature is enabled by the prolog parameter in the cluster configuration. (See sge_conf(5).) Next a parallel environment startup procedure is run if the job is a parallel job. (See sge_pe(5) for more information.) After that, the job itself is run, followed by a parallel environment shutdown procedure for parallel jobs, and finally an epilog script if requested by the epilog parameter in the cluster configuration. The prolog and epilog scripts, as well as the parallel environment startup and shutdown procedures, are to be provided by the Grid Engine administrator and are intended for site-specific actions to be taken before and after execution of the actual user job.
After the job has finished and the epilog script is processed, sge_shepherd retrieves resource usage statistics about the job, places them in a job-specific subdirectory of the sge_execd(8) spool directory for reporting through sge_execd(8), and finishes.
sge_shepherd also places an exit status file in the spool directory. This exit status can be viewed with qacct -j JobId (see qacct(1)); it is not the exit status of sge_shepherd itself but of one of the methods executed by sge_shepherd. This exit status can have several meanings, depending on the method in which an error occurred (if any). The possible methods are: prolog, parallel start, job, parallel stop, epilog, suspend, restart, terminate, clean, migrate, and checkpoint.
The following exit values are returned:
For the meaning of the return codes of the shepherd itself (which are interpreted by qacct(1)) see sge_status(5).
sge_shepherd should not be invoked manually, but only by sge_execd(8).
The name of the default cell, i.e. default.
sgepasswd contains a list of user names and their corresponding encrypted passwords. If available, the password file will be used by sge_shepherd. To change the contents of this file please use the sgepasswd command. It is not advised to change that file manually.
<execd_spool>/job_dir/<job_id> job specific directory <sge_root>/<cell>/common/sgepasswd Password information used on Microsoft Windows hosts. See sgepasswd(5).
sge_intro(1), sge_conf(5), sge_status(5), remote_startup(5), sgepasswd(5), sge_execd(8).
See sge_intro(1) for a full statement of rights and permissions.
$Date: 2007-07-19 09:04:33 $ | SGE 8.1.3pre |