SGE_CONF(5) | Grid Engine File Formats | SGE_CONF(5) |
sge_conf - Grid Engine configuration files
sge_conf defines the global and local Grid Engine configurations and can be shown/modified by qconf(1) using the -sconf/-mconf options. Only root or the cluster administrator may modify sge_conf.
At its initial start-up, sge_qmaster(8) checks to see if a valid Grid Engine configuration is available at a well known location in the Grid Engine internal directory hierarchy. If so, it loads that configuration information and proceeds. If not, sge_qmaster(8) writes a generic configuration containing default values to that same location. The Grid Engine execution daemons sge_execd(8) upon start-up retrieve their configuration from sge_qmaster(8).
The actual configuration for both sge_qmaster(8) and sge_execd(8) is a superposition of a global configuration and a local configuration pertinent for the host on which a master or execution daemon resides. If a local configuration is available, its entries overwrite the corresponding entries of the global configuration. Note: The local configuration does not have to contain all valid configuration entries, but only those which need to be modified against the global entries.
Note: Grid Engine allows backslashes (\) be used to escape newline characters. The backslash and the newline are replaced with a space (" ") character before any interpretation.
The paragraphs that follow provide brief descriptions of the individual parameters that compose the global and local configurations for a Grid Engine cluster:
The execution daemon spool directory path. Again, a feasible spool directory requires read/write access permission for root. The entry in the global configuration for this parameter can be overwritten by execution host local configurations, i.e. each sge_execd(8) may have a private spool directory with a different path, in which case it needs to provide read/write permission for the root account of the corresponding execution host only.
Under execd_spool_dir a directory named corresponding to the unqualified hostname of the execution host is opened and contains all information spooled to disk. Thus, it is possible for the execd_spool_dirs of all execution hosts to physically reference the same directory path (the root access restrictions mentioned above need to be met, however).
Changing the global execd_spool_dir parameter set at installation time is not supported in a running system. If the change should still be done it is required to restart all affected execution daemons. Please make sure running jobs have finished before doing so, otherwise running jobs will be lost.
The default location for the execution daemon spool directory is $SGE_ROOT/$SGE_CELL/spool.
The global configuration entry for this value may be overwritten by the execution host local configuration.
mailer is the absolute pathname to the electronic mail delivery agent on your system. An optional prefix "user@" specifies the user under which this procedure is to be started; the default is root. The mailer must accept the following syntax:
Each sge_execd(8) may use a private mail agent. Changing mailer will take immediate effect.
The default for mailer depends on the operating system of the host on which the Grid Engine master installation was run. Common values are /bin/mail or /usr/bin/Mail. Note that since the mail is sent by compute hosts, not the master, it may be necessary to take steps to route it appropriately, e.g. by using a cluster head node as a "smart host" for the private network.
The global configuration entry for this value may be overwritten by the execution host local configuration.
xterm is the absolute pathname to the X Window System terminal emulator, xterm(1).
Changing xterm will take immediate effect.
The default for xterm is system-dependent.
The global configuration entry for this value may be overwritten by the execution host local configuration.
A comma-separated list of executable shell script paths or programs to be started by sge_execd(8) and to be used in order to retrieve site-configurable load information (e.g. free space on a certain disk partition).
Each sge_execd(8) may use a set of private load_sensor programs or scripts. Changing load_sensor will take effect after two load report intervals (see load_report_time). A load sensor will be restarted automatically if the file modification time of the load sensor executable changes.
The global configuration entry for this value may be overwritten by the execution host local configuration.
In addition to the load sensors configured via load_sensor, sge_exec(8) searches for an executable file named qloadsensor in the execution host's Grid Engine binary directory path. If such a file is found, it is treated like the configurable load sensors defined in load_sensor. This facility is intended for pre-installing a default load sensor. See sge_execd(8) for information on writing load sensors.
The path of an executable, with optional arguments, that is started before execution of Grid Engine jobs with the same environment setting as that for the Grid Engine jobs to be started afterwards (see qsub(1)). The prolog command is started directly, not in a shell. An optional prefix "user@" specifies the user under which this procedure is to be started. In that case see the SECURITY section below concerning security issues running as a privileged user. The procedure's standard output and the error output stream are written to the same file as used for the standard output and error output of each job.
This procedure is intended as a means for the Grid Engine administrator to automate the execution of general site-specific tasks, like the preparation of temporary file systems, with a need for the same context information as the job. For a parallel job, only a single instance of the prolog is run, on the master node. Each sge_execd(8) may use a private prolog. Correspondingly, the global or execution host local configuration can be overwritten by the queue configuration (see queue_conf(5)). Changing prolog will take immediate effect.
The default for prolog is the special value NONE, which prevents execution of a prolog.
The following special variables, expanded at runtime, can be used (besides any other strings which have to be interpreted by the procedure) to compose a command line:
If the prolog is written in shell script, the usual care must be exercised, e.g. when expanding such values from the command line or the environment which are user-supplied. In particular, note that the job name could be of the form "; evil doings;". Also, use absolute path names for commands if inheriting the user's environment.
The global configuration entry for this value may be overwritten by the execution host local configuration.
See sge_shepherd(8) for the significance of exit codes returned by the prolog.
The path of an executable, with optional argument, that is started after execution of Grid Engine jobs with the same environment setting as that for the Grid Engine job that has just completed (see qsub(1)), with the addition of the variable named SGE_JOBEXIT_STAT which holds the exit status of the job. The epilog command is started directly, not in a shell. An optional prefix "user@" specifies the user under which this procedure is to be started. In that case see the SECURITY section below concerning security issues running as a privileged user. The procedure's standard output and the error output stream are written to the same file used for the standard output and error output of each job.
The same special variables can be used to compose a command line as for the prolog.
This procedure is intended as a means for the Grid Engine administrator to automate the execution of general site-specific tasks, like the cleaning up of temporary file systems with the need for the same context information as the job. For a parallel job, only a single instance of the epilog is run, on the master node. Each sge_execd(8) may use a private epilog. Correspondingly, the global or execution host local configurations can be overwritten by the queue configuration (see queue_conf(5)). Changing epilog will take immediate effect.
The default for epilog is the special value NONE, which prevents execution of an epilog. The same special variables as for prolog can be used to constitute a command line.
The same considerations (above) apply as for a prolog when an epilog is written in shell script.
See sge_shepherd(8) for the significance of exit codes returned by the epilog.
Note: Deprecated, may be removed in future release.
This parameter defines the mechanisms which are used to actually invoke the
job scripts on the execution hosts. The following values are recognized:
Changes to shell_start_mode will take immediate effect. The default for shell_start_mode is posix_compliant.
This value is a global configuration parameter only. It cannot be overwritten by the execution host local configuration.
UNIX command interpreters like the Bourne-Shell (see sh(1)) or the C-Shell (see csh(1)) can be used by Grid Engine to start job scripts. The command interpreters can either be started as login-shells (i.e. all system and user default resource files like .login or .profile will be executed when the command interpreter is started, and the environment for the job will be set up as if the user has just logged in) or just for command execution (i.e. only shell-specific resource files like .cshrc will be executed and a minimal default environment is set up by Grid Engine - see qsub(1)). The parameter login_shells contains a comma-separated list of the executable names of the command interpreters to be started as login shells. Shells in this list are only started as login shells if the parameter shell_start_mode (see above) is set to posix_compliant.
Changes to login_shells will take immediate effect. The default for login_shells is sh,bash,csh,tcsh,ksh.
This value is a global configuration parameter only. It cannot be overwritten by the execution host local configuration.
min_uid places a lower bound on user IDs that may use the cluster. Users whose user ID (as returned by getpwnam(3)) is less than min_uid will not be allowed to run jobs on the cluster.
Changes to min_uid will take immediate effect. The default is 0 but, if CSP or MUNGE security is not in use, the installation script sets it to 100 to prevent unauthorized access by root or system accounts.
This value is a global configuration parameter only. It cannot be overwritten by the execution host local configuration.
This parameter sets the lower bound on group IDs that may use the cluster. Users whose default group ID (as returned by getpwnam(3)) is less than min_gid will not be allowed to run jobs on the cluster.
Changes to min_gid will take immediate effect. The default is 0 but, if CSP security is not in use, the installation script sets it to 100 to prevent unauthorized access by root or system accounts.
This value is a global configuration parameter only. It cannot be overwritten by the execution host local configuration.
The user_lists parameter contains a comma-separated list of user access lists as described in access_list(5). Each user contained in at least one of the access lists has access to the cluster. If the user_lists parameter is set to NONE (the default) any user has access if not explicitly excluded via the xuser_lists parameter described below. If a user is contained both in an access list xuser_lists and user_lists, the user is denied access to the cluster.
Changes to user_lists will take immediate effect.
This value is a global configuration parameter insofar as it restricts access to the whole cluster, but the execution host local configuration may define a value to restrict access to that host further.
The xuser_lists parameter contains a comma-separated list of user access lists as described in access_list(5). Each user contained in at least one of the access lists is denied access to the cluster. If the xuser_lists parameter is set to NONE (the default) any user has access. If a user is contained both in an access list in xuser_lists and user_lists (see above) the user is denied access to the cluster.
Changes to xuser_lists will take immediate effect.
This value is a global configuration parameter insofar as it restricts access to the whole cluster, but the execution host local configuration may define a value to restrict access to that host further.
administrator_mail specifies a comma-separated list of the electronic mail address(es) of the cluster administrator(s) to whom internally-generated problem reports are sent. The mail address format depends on your electronic mail system and how it is configured; consult your system's configuration guide for more information.
Changing administrator_mail takes immediate effect. The default for administrator_mail is an empty mail list.
This value is a global configuration parameter only. It cannot be overwritten by the execution host local configuration.
The projects list contains all projects which are granted access to Grid Engine. Users not belonging to one of these projects cannot submit jobs. If users belong to projects in the projects list and the xprojects list (see below), they also cannot submit jobs.
Changing projects takes immediate effect. The default for projects is none.
While globally-configured projects affect job submission, projects configured for queues or hosts affect job execution in the appropriate context.
The xprojects list contains all projects that are denied access to Grid Engine. Users belonging to one of these projects cannot use Grid Engine. If users belong to projects in the projects list (see above) and the xprojects list, they also cannot use the system.
Changing xprojects takes immediate effect. The default for xprojects is none.
This value is a global configuration parameter only. It cannot be overwritten by the execution host local configuration.
System load is reported periodically by the execution daemons to sge_qmaster(8). The parameter load_report_time defines the time interval between load reports.
Each sge_execd(8) may use a different load report time. Changing load_report_time will take immediate effect.
Note: Be careful when modifying load_report_time. Reporting load too frequently might block sge_qmaster(8) especially if the number of execution hosts is large. Moreover, since the system load typically increases and decreases smoothly, frequent load reports hardly offer any benefit.
The default for load_report_time is 40 seconds.
The global configuration entry for this value may be overwritten by the execution host local configuration.
Determines whether jobs on hosts in an unknown state are rescheduled, and thus sent to other hosts. Hosts are registered as unknown if sge_master(8) cannot establish contact to the sge_execd(8) on those hosts (see max_unheard). Likely reasons are a breakdown of the host or a breakdown of the network connection in between, but also sge_execd(8) may not be executing on such hosts.
In any case, Grid Engine can reschedule jobs running on such hosts to another system. reschedule_unknown controls the time which Grid Engine will wait before jobs are rescheduled after a host became unknown. The time format specification is hh:mm:ss. If the special value 00:00:00 is set, then jobs will not be rescheduled from this host.
Rescheduling is only initiated for jobs which have activated the rerun flag (see the -r y option of qsub(1) and the rerun option of queue_conf(5)). Parallel jobs are only rescheduled if the host on which their master task executes is in unknown state. The behavior of reschedule_unknown for parallel jobs and for jobs without the rerun flag set can be adjusted using the qmaster_params settings ENABLE_RESCHEDULE_KILL and ENABLE_RESCHEDULE_SLAVE.
Checkpointing jobs will only be rescheduled when the when option of the corresponding checkpointing environment contains an appropriate flag. (see checkpoint(5)). Interactive jobs (see qsh(1), qrsh(1), qtcsh(1)) are not rescheduled.
The default for reschedule_unknown is 00:00:00
The global configuration entry for this value may be overwritten by the execution host local configuration.
If sge_qmaster(8) could not contact, or was not contacted by, the execution daemon of a host for max_unheard seconds, all queues residing on that particular host are set to status unknown. sge_qmaster(8), at least, should be contacted by the execution daemons in order to get the load reports. Thus, max_unheard should be greater than the load_report_time (see above).
Changing max_unheard takes immediate effect. The default for max_unheard is 5 minutes.
This value is a global configuration parameter only. It cannot be overwritten by the execution host local configuration.
This parameter specifies the level of detail that Grid Engine components such as sge_qmaster(8) or sge_execd(8) use to produce informative, warning or error messages which are logged to the messages files in the master and execution daemon spool directories (see the description of the execd_spool_dir parameter above). The following message levels are available:
Changing loglevel will take immediate effect.
The default for loglevel is log_warning.
This value is a global configuration parameter only. It cannot be overwritten by the execution host local configuration.
This parameter defines the maximum number of array tasks to be scheduled to run simultaneously per array job. An instance of an array task will be created within the master daemon when it gets a start order from the scheduler. The instance will be destroyed when the array task finishes. Thus the parameter provides control mainly over the memory consumption of array jobs in the master daemon. It is most useful for very large clusters and very large array jobs. The default for this parameter is 2000. The value 0 will deactivate this limit and will allow the scheduler to start as many array job tasks as suitable resources are available in the cluster.
Changing max_aj_instances will take immediate effect.
This value is a global configuration parameter only. It cannot be overwritten by the execution host local configuration.
This parameter defines the maximum number of array job tasks within an array job. sge_qmaster(8) will reject all array job submissions which request more than max_aj_tasks array job tasks. The default for this parameter is 75000. The value 0 will deactivate this limit.
Changing max_aj_tasks will take immediate effect.
This value is a global configuration parameter only. It cannot be overwritten by the execution host local configuration.
The number of active (not finished) jobs which each Grid Engine user can have in the system simultaneously is controlled by this parameter. A value greater than 0 defines the limit. The default value 0 means "unlimited". If the max_u_jobs limit is exceeded by a job submission then the submission command exits with exit status 25 and an appropriate error message.
Changing max_u_jobs will take immediate effect.
This value is a global configuration parameter only. It cannot be overwritten by the execution host local configuration.
The number of active (not finished) jobs simultaneously allowed in Grid Engine is controlled by this parameter. A value greater than 0 defines the limit. The default value 0 means "unlimited". If the max_jobs limit is exceeded by a job submission then the submission command exits with exit status 25 and an appropriate error message.
Changing max_jobs will take immediate effect.
This value is a global configuration parameter only. It cannot be overwritten by the execution host local configuration.
The number of active (not finished) Advance Reservations simultaneously allowed in Grid Engine is controlled by this parameter. A value greater than 0 defines the limit. The default value 0 means "unlimited". If the max_advance_reservations limit is exceeded by an Advance Reservation request then the submission command exits with exit status 25 and an appropriate error message.
Changing max_advance_reservations will take immediate effect.
This value is a global configuration parameter only. It cannot be overwritten by the execution host local configuration.
If set to true, users are required to request a project whenever submitting a job. See the -P option to qsub(1) for details.
Changing enforce_project will take immediate effect. The default for enforce_project is false.
This value is a global configuration parameter only. It cannot be overwritten by the execution host local configuration.
If set to true, a user(5) must exist to allow for job submission. Jobs are rejected if no corresponding user exists.
If set to auto, a user(5) object for the submitting user will automatically be created during job submission, if one does not already exist. The auto_user_oticket, auto_user_fshare, auto_user_default_project, and auto_user_delete_time configuration parameters will be used as default attributes of the new user(5) object.
Changing enforce_user will take immediate effect. The default for enforce_user is auto.
This value is a global configuration parameter only. It cannot be overwritten by the execution host local configuration.
The number of override tickets to assign to automatically created user(5) objects. User objects are created automatically if the enforce_user attribute is set to auto.
Changing auto_user_oticket will affect any newly created user objects, but will not change user objects created in the past.
This value is a global configuration parameter only. It cannot be overwritten by the execution host local configuration.
The number of functional shares to assign to automatically created user(5) objects. User objects are created automatically if the enforce_user attribute is set to auto.
Changing auto_user_fshare will affect any newly created user objects, but will not change user objects created in the past.
This value is a global configuration parameter only. It cannot be overwritten by the execution host local configuration.
The default project to assign to automatically created user(5) objects. User objects are created automatically if the enforce_user attribute is set to auto.
Changing auto_user_default_project will affect any newly created user objects, but will not change user objects created in the past.
This value is a global configuration parameter only. It cannot be overwritten by the execution host local configuration.
The number of seconds of inactivity after which automatically created user(5) objects will be deleted. User objects are created automatically if the enforce_user attribute is set to auto. If the user has no active or pending jobs for the specified amount of time, the object will automatically be deleted. A value of 0 can be used to indicate that the automatically created user object is permanent and should not be automatically deleted.
Changing auto_user_delete_time will affect the deletion time for all users with active jobs.
This value is a global configuration parameter only. It cannot be overwritten by the execution host local configuration.
NB. If the qmaster spool area is world-readable for
non-admin users, you must take steps to encrypt the credentials,
since they are stored there after job submission.
Set_token_cmd points to a command which sets and extends AFS tokens for
Grid Engine jobs. It is run by sge_coshepherd(8). It expects two
command line parameters:
<set_token_cmd> <user> <token_extend_after_seconds>
- SetToken - forge
which are provided by your distributor as source code. The script looks as follows:
-------------------------------- #!/bin/sh # set_token_cmd forge -u $1 -t $2 | SetToken --------------------------------
Since it is necessary for forge to read the secret AFS server key, a site might wish to replace the set_token_cmd script by a command, which connects to a custom daemon at the AFS server. The token must be forged at the AFS server and returned to the local machine, where SetToken is executed.
Changing set_token_cmd will take immediate effect. The default for set_token_cmd is none.
The global configuration entry for this value may be overwritten by the execution host local configuration.
The path to your pagsh is specified via this parameter. The sge_shepherd(8) process and the job run in a pagsh. Please ask your AFS administrator for details.
Changing pag_cmd will take immediate effect. The default for pag_cmd is none.
The global configuration entry for this value may be overwritten by the execution host local configuration.
The token_extend_time is the time period for which AFS tokens are periodically extended. Grid Engine will call the token extension 30 minutes before the tokens expire until jobs have finished and the corresponding tokens are no longer required.
Changing token_extend_time will take immediate effect. The default for token_extend_time is 24:0:0, i.e. 24 hours.
The global configuration entry for this value may be overwritten by the execution host local configuration.
Alternative path to the shepherd_cmd binary. Typically used to call the shepherd binary by a wrapper script or command. If used in production, this must take care to handle signals the way the shepherd would or, for instance, jobs will not be killed correctly.
Changing shepherd_cmd will take immediate effect. The default for shepherd_cmd is none.
The global configuration entry for this value may be overwritten by the execution host local configuration.
The gid_range is a comma-separated list of range expressions of the form m-n, where m and n are integer numbers greater than 99, and m is an abbreviation for m-m. These numbers are used in sge_execd(8) to identify processes belonging to the same job.
Each sge_execd(8) may use a separate set of group ids for this purpose. All numbers in the group id range have to be unused supplementary group ids on the system, where the sge_execd(8) is started.
Changing gid_range will take immediate effect. There is no default for gid_range. The administrator will have to assign a value for gid_range during installation of Grid Engine.
The global configuration entry for this value may be overwritten by the execution host local configuration.
A list of additional parameters can be passed to the Grid Engine qmaster. The following values are recognized:
After the s_rt or h_rt limit of a job is expired, the master daemon will wait additional time defined by DURATION_OFFSET (see sched_conf(5)). If the execution daemon still cannot be contacted when this additional time is elapsed, then the master daemon will force the deletion of the job (see -f of qdel(1)).
For jobs which will be deleted that way, an accounting record will be created. For usage, the record will contain the last reported online value when the execution daemon could contact qmaster. The failed state in the record will be set to 37 to indicate that the job was terminated by a limit enforced by the master daemon.
After the restart of sge_qmaster(8) the limit enforcement will be triggered after twice the biggest load_report_interval interval defined in sge_conf(5) has elapsed. This will give the execution daemons enough time to re-register with the master daemon.
Note: Forced deletion for jobs is executed differently, depending on whether users are Grid Engine administrators or not. In the case of administrative users, the jobs are removed from the internal database of Grid Engine immediately. For regular users, the equivalent of a normal qdel(1) is executed first, and deletion is forced only if the normal cancellation was unsuccessful.
Profiling provides the user with the possibility to get system measurements. This can be useful for debugging or optimization of the system. The profiling output will be done within the messages file.
Please note that the CPU utime and stime values contained in the profiling output are not per-thread CPU times. These CPU usage statistics are per-process statistics. So the printed profiling values for CPU mean "CPU time consumed by sge_qmaster (all threads) while the reported profiling level was active".
In this condition, job deletion works, but at least interactive jobs, tightly-integrated parallel ones, and job suspension don't. The execution hosts configured need not exist, but must have resolvable network names.
qsub -b y sleep 10
Changing qmaster_params will take immediate effect, except that gdi_timeout, gdi_retries, and cl_ping will take effect only for new connections. The default for qmaster_params is none.
This value is a global configuration parameter only. It cannot be overwritten by the execution host local configuration.
This is used for passing additional parameters to the Grid Engine execution daemon. The following values are recognized:
Changing execd_params will take effect after it is propagated to the execution daemons. The propagation is done in one load report interval. The default for execd_params is none.
The global configuration entry for this value may be overwritten by the execution host local configuration.
Used to define the behavior of reporting modules in the Grid Engine qmaster. Changes to the reporting_params take immediate effect. The following values are recognized:
Note: Deprecated, may be removed in a future release.
Grid Engine stores a certain number of just finished jobs to provide
post mortem status information via qstat -s z. The
finished_jobs parameter defines the number of finished
("zombie") jobs stored. If this maximum number is reached, the
eldest finished job will be discarded for every new job added to the
finished job list. (The zombie list is not spooled, and so will be lost by a
qmaster re-start.)
Changing finished_jobs will take immediate effect. The default for finished_jobs is 100.
This value is a global configuration parameter only. It cannot be overwritten by the execution host local configuration.
These three pairs of entries are responsible for defining a remote startup method for either interactive jobs by qlogin(1) or qrsh(1) without a command, or an interactive qrsh(1) request with a command. The last startup method is also used to startup tasks on a slave exechost of a tightly integrated parallel job. Each pair for one startup method must contain matching communication methods. All entries can contain the value builtin (which is the default) or a full path to a binary which should be used, and additional arguments to this command if necessary.
The entries for the three ..._command definitions can, in addition, contain the value NONE in case a particular startup method should be disabled.
Changing any of these entries will take immediate effect.
The global configuration entries for these values may be overwritten by a execution host local configuration.
See remote_startup(5) for a detailed explanation of these settings.
This flag must be set to "true" when the prolog and epilog are ready for delegated file staging, so that the DRMAA attribute 'drmaa_transfer_files' is supported. To establish delegated file staging, use the variables beginning with "$fs_..." in prolog and epilog to move the input, output and error files from one host to the other. When this flag is set to "false", no file staging is available for the DRMAA interface. File staging is currently implemented only via the DRMAA interface. When an error occurs while moving the input, output and error files, return error code 100 so that the error handling mechanism can handle the error correctly. (See also FORBID_APPERROR.)
Note: Deprecated, may be removed in future release.
This flag enables or disables the reprioritization of jobs based on their
ticket amount. The reprioritize_interval in sched_conf(5)
takes effect only if reprioritize is set to true. To turn off job
reprioritization, the reprioritize flag must be set to false and the
reprioritize_interval to 0, which is the default.
This value is a global configuration parameter only. It cannot be overridden by the execution host local configuration.
This setting defines a server JSV instance which will be started and triggered by the sge_qmaster(8) process. This JSV instance will be used to verify job specifications of jobs before they are accepted and stored in the internal master database. The global configuration entry for this value cannot be overwritten by execution host local configurations.
Find more details concerning JSV in jsv(1) and sge_request(1).
The syntax of the jsv_url is specified in sge_types(1).
If there is a server JSV script defined with the jsv_url parameter, then all qalter(1) or qmon(1) modification requests for jobs are rejected by qmaster. With the jsv_allowed_mod parameter an administrator has the possibility to allow a set of switches which can then be used with clients to modify certain job attributes. The value for this parameter has to be a comma-separated list of JSV job parameter names as documented in qsub(1), or the value none to indicate that no modification should be allowed. Please note that even if none is specified, the switches -w and -t are allowed for qalter.
libjvm_path is usually set during qmaster installation and points to the absolute path of libjvm.so (or the corresponding library depending on your architecture - e.g. /usr/java/jre/lib/i386/server/libjvm.so). The referenced libjvm version must be at least 1.5. It is needed by the JVM qmaster thread only. If the Java VM needs additional starting parameters they can be set in additional_jvm_args. Whether the JVM thread is started at all can be defined in the bootstrap(5) file. If libjvm_path is empty, or an incorrect path, the JVM thread fails to start.
The global configuration entry for this value may be overwritten by the execution host local configuration.
additional_jvm_args is usually set during qmaster installation. Details about possible values additional_jvm_args can be found in the help output of the accompanying Java command. This setting is normally not needed.
The global configuration entry for this value may be overwritten by the execution host local configuration.
If prolog or epilog is specified with a user@ prefix, security considerations apply. The methods are run in a user-supplied environment (via -V or -v) which provides a mechanism to run arbitrary code as user (which might well be root) by setting variables such as LD_LIBRARY_PATH and LD_PRELOAD to affect the running of the dynamically linked programs, such as shells, which are used to implement the methods.
To combat this, known problematic variables are removed from the environment before starting the methods other than as the job owner, but this may not be foolproof on arbitrary systems with obscure variables. The environment can be safely controlled by running the methods under a statically-linked version of env(1), such as typically available using busybox(1), for example. Use
to unset sensitive variables, or
to set only specific variables. On some systems, such as recent
Solaris, it is essentially impossible to build static binaries. In that case
it is typically possible to use a setuid wrapper, relying on the dynamic
linker to do the right thing. An example is the safe_exec wrapper
which is available from
⟨URL: http://arc.liv.ac.uk/downloads/SGE/support/ ⟩ at the time
of writing. When using a non-shell scripting language wrapper for the method
daemon, try to use options which avoid interpreter-specific environmental
damage, such as Perl's -T and Python's -E. Privileged shell
script wrappers should be avoided if possible, and should be written
carefully if they are used - e.g. invoke programs with full file names - but
if bash(1) is used, it should be run with the -p option.
It is not currently possible to specify the variables unset, e.g. as a host-dependent execd parameter, but certain system-dependent ones are selected. The list of sensitive variables is taken mostly from GNU libc and sudo(1). It includes known system-dependent dynamic linker ones, sensitive locale ones and others, like TMPDIR, but does not attempt to deal interpreter-specific variables such as PYTHONPATH. The locale specification is also sanitized. See the source file source/libs/uti2/sge_execvlp.c for details. Note that TMPDIR is one of the variables affected, and may need to be recreated (as /tmp/$JOB_ID.$TASK_ID.$QUEUE).
sge_intro(1), csh(1), qconf(1), qsub(1), jsv(1), rsh(1), sh(1), getpwnam(3), drmaa_attributes(3), queue_conf(5), sched_conf(5), sge_types(1), sge_execd(8), sge_qmaster(8), sge_shepherd(8), cron(8), remote_startup(5)
See sge_intro(1) for a full statement of rights and permissions.
2011-11-27 | SGE 8.1.3pre |