QSTAT(1) | Grid Engine User Commands | QSTAT(1) |
qstat - show the status of Grid Engine jobs and queues
qstat [-ext] [-f] [-F [resource_name,...]] [-g c|d|t[+]] [-help] [-j [job_list]] [-l resource=val,...] [-ne] [-pe pe_name,...] [-ncb] [-pri] [-q wc_queue_list] [-qs a|c|d|o|s|u|A|C|D|E|S] [-r] [-s {r|p|s|z|hu|ho|hs|hd|hj|ha|h|a}[+]] [-t] [-U user,...] [-u user,...] [-urg] [-xml]
qstat shows the current status of the available Grid Engine queues and the jobs associated with the queues. Selection options allow you to get information about specific jobs, queues or users. If multiple selections are done, a queue is only displayed if all selection criteria for a queue instance are met. Without any option qstat will display only a list of jobs, with no queue status information.
The administrator and the user may define files (see sge_qstat(5)), which can contain any of the options described below. A cluster-wide sge_qstat file may be placed under $SGE_ROOT/$SGE_CELL/common/sge_qstat. The user private file is searched for at the location $HOME/.sge_qstat. The home directory request file has higher precedence than the cluster global file. The command line can be used to override the flags contained in the files.
The output format for the alarm reasons is one line per reason, containing the resource value and threshold. For details about the resource value please refer to the description of the Full Format in section OUTPUT FORMATS below.
With -g c a cluster queue summary is displayed. Find more information in the section OUTPUT FORMATS.
With -g d array jobs are displayed verbosely in a one line per job task fashion. By default, array jobs are grouped and all tasks with the same status (for pending tasks only) are displayed in a single line. The array job task id range field in the output (see section OUTPUT FORMATS) specifies the corresponding set of tasks.
With -g t parallel jobs are displayed verbosely in a one line per parallel job task fashion. By default, parallel job tasks are displayed in a single line. Also with -g t option the function of each parallel task is displayed rather than the jobs slot amount (see section OUTPUT FORMATS).
For jobs in E(rror) state the error reason is displayed. For jobs that could not be dispatched during in the last scheduling interval the obstacles are shown, if schedd_job_info in sched_conf(5) is configured accordingly.
For running jobs, the available information on resource utilization is shown for each task (see accounting(5)): consumed cpu time in seconds, integral memory usage in Gbytes seconds, amount of data transferred in io operations in Gbytes, current virtual memory utilization in Mbytes, and maximum virtual memory utilization in Mbytes. This information is not available if resource utilization retrieval is not supported for the OS platform where the job is hosted. It is also not available immediately after a job has started, before a load report is received.
The resource usage reported is affected if ACCT_RESERVED_USAGE or SHARETREE_RESERVED_USAGE is specified in the sge_conf(5) configuration. Then the requested values are reported, not the actual usage (not multiplied by the slot count). If there is no memory request, 'mem' is reported as zero, and the vmem values as 'N/A'.
Unless -ncb is specified, the output contains information about a requested binding (see -binding of option qsub(1)) and the changes that have been applied to the topology string (real binding) for the host where this job is running. The topology string will contain capital letters for all those cores that were not bound to the displayed job. Bound cores will be shown lowercase (e.g. "SCCcCSCCcC" means that core 2 on the two available sockets was bound to this job).
See load_parameters(5) for detailed information on the standard set of load values.
Note that a version n field in the output indicates n changes with qalter(1).
Please note that this command line switch is intended to provide backward compatibility and will be removed in the next major release.
Status information is displayed for jobs which are executing in one of the selected queues.
Please refer to the OUTPUT FORMATS sub-section Expanded Format below for detailed information.
The string $user is a placeholder for the current username. An asterisk "*" can be used as username wildcard to request any users' jobs be displayed. The default value for this switch is -u $user.
If the -xml parameter is combined with -ncb then the XML output does not contain tags with information about job to core binding.
The following two debugging options are available only when the environment variable MORE_INFO is defined.
Depending on the presence or absence of the -explain, -f, -F, or -qs and -r and -t option three output formats need to be differentiated.
The -ext and -urg options may be used to display additional information for each job.
Following the header line a section for each cluster queue is provided. When queue instances selection are applied (-l -pe, -q, -U) the cluster format contains only cluster queues of the corresponding queue instances.
Following the header line a line is printed for each job consisting of
The state d(eletion) indicates that qdel(1) has been used to initiate job deletion. The states t(ransfering) and r(unning) indicate that a job is about to be executed or is already executing, whereas the states s(uspended), S(uspended) and T(hreshold) show that an already running jobs has been suspended. The s(uspended) state is caused by suspending the job via the qmod(1) command, the S(uspended) state indicates that the queue containing the job is suspended and therefore the job is also suspended and the T(hreshold) state shows that at least one suspend threshold of the corresponding queue was exceeded (see queue_conf(5)), and that the job has been suspended as a consequence. The state R(estarted) indicates that the job was restarted. This can be caused by a job migration or for one of the reasons described in the -r section of qsub(1).
The states q(ueued)/w(aiting) and h(old) only appear for pending jobs. Pending, unheld jobs are displayed as qw. The h(old) state indicates that a job currently is not eligible for execution due to a hold state assigned to it via qhold(1), qalter(1) or the qsub(1) -h option, or that the job is waiting for completion of the jobs for which job dependencies have been assigned to it job via the -hold_jid or -hold_jid_ad options of qsub(1) or qalter(1).
The state z(ombie) appears for finished jobs when the -s z option is used.
The state E(rror) appears for pending jobs that couldn't be started due to job properties. The reason for the job error is shown by the -j job_list option.
See also sge_status(5).
Without -g t option the total number of slots occupied or requested by the job is displayed. For pending parallel jobs with a PE slot range request, the assumed future slot allocation is displayed. With -g t option the function of the running jobs (MASTER or SLAVE - the latter for parallel jobs only) is displayed.
If the -t option is supplied, each status line always contains parallel job task information as if -g t were specified and each line contains the following parallel job subtask information:
Following the header line a section for each queue separated by a horizontal line is provided. For each queue the information printed consists of
If the state is u, the corresponding sge_execd(8) cannot be contacted.
If the state is a(larm), at least one of the load thresholds defined in the load_thresholds list of the queue configuration (see queue_conf(5)) is currently exceeded, which prevents scheduling further jobs to that queue. The state A(larm) indicates that at least one of the suspend thresholds of the queue (see queue_conf(5)) is currently exceeded. This will result in jobs running in that queue being successively suspended until no threshold is violated.
The states s(uspended) And d(isabled) can be assigned to queues and released via the qmod(1) command. Suspending a queue will cause all jobs executing in that queue to be suspended.
The states D(isabled) And C(alendar suspended) indicate that the queue has been disabled, or suspended automatically via the Grid Engine calendar facility (see calendar_conf(5)), while the S(ubordinate) state indicates that the queue has been suspended via subordination to another queue (see queue_conf(5) for details). When suspending a queue (regardless of the cause) all jobs executing in that queue are suspended too.
The state P(reempted) indicates that the queue has been disabled via slotwise subordination to another queue, preventing it getting jobs which would simply be suspended.
An E(rror) state is displayed for a queue for various reasons such as failing to find executables or directories. Please check the error logfile of that sge_execd(8) for the reason, indicating how to resolve the problem. Please enable the queue afterwards via the -c option of the qmod(1) command manually.
If the c(onfigurationambiguous) state is displayed for a queue instance, the configuration specified for this queue instance in sge_conf(5) is ambiguous. This state is cleared when the configuration becomes unambiguous again. This state prevents further jobs from being scheduled to that queue instance. Detailed reasons why a queue instance entered the c state can be found in the sge_qmaster(8) messages file and are shown by the qstat(1) -explain switch. For queue instances in this state the cluster queue's default settings are used for the ambiguous attribute.
If an o(rphaned) state is displayed for a queue instance, it indicates that the queue instance is no longer demanded by the current cluster queue configuration or the host group configuration. The queue instance is kept because jobs which have not yet finished are still associated with it, and it will vanish from qstat output when these jobs have finished. To quicken vanishing of an orphaned queue instance, associated job(s) can be deleted using qdel(1). A queue instance in the orphaned state can be revived by changing the cluster queue configuration to cover that queue instance. This state prevents scheduling further jobs to that queue instance.
If the -F option was used, resource availability information is printed following the queue status line. For each resource (as selected in an option argument to -F, or for all resources if the option argument was omitted) a single line is displayed with the following format:
The displayed availability values and the sources from which they derive are always the minimum values of all possible combinations. Hence, for example, a line of the form "qf:h_vmem=4G" indicates that a queue currently has a maximum availability in virtual memory of 4 Gigabyte, where this value is a fixed value (e.g. a resource limit in the queue configuration) and it is queue dominated, i.e. the host in total may have more virtual memory available than this, but the queue doesn't allow for more. Contrarily a line "hl:h_vmem=4G" would also indicate an upper bound of 4 Gigabyte virtual memory availability, but the limit would be derived from a load value currently reported for the host. So while the queue might allow for jobs with higher virtual memory requirements, the host on which this particular queue resides currently only has 4 Gigabyte available.
If the -explain option was used with the character 'a' or
'A', information about resources is displayed, that violate load or suspend
thresholds.
The same format as with the -F option is used with following
extensions:
After the queue status line (in case of -f) or the resource availability information (in case of -F) a single line is printed for each job running currently in this queue. Each job status line contains
Without -g t option the number of slots occupied per queue resp. requested by the job is displayed. For pending parallel jobs with a PE slot range request, the assumed future slot allocation is displayed. With -g t option the function of the running jobs (MASTER or SLAVE - the latter for parallel jobs only) is displayed.
If the -t option is supplied, each job status line also contains
Following the list of queue sections a PENDING JOBS list may be printed in case jobs are waiting to be assigned to a queue. A status line for each waiting job is displayed similar to the one for the running jobs. The differences are that the status for the jobs is w(aiting) or h(old), that the submit time and date is shown instead of the start time and that no function is displayed for the jobs.
In very rare cases, e.g. if sge_qmaster(8) starts up from an inconsistent state in the job or queue spool files or if the clean queue (-cq) option of qconf(1) is used, qstat cannot assign jobs to either the running or pending jobs section of the output. In this case as job status inconsistency (e.g. a job has a running status but is not assigned to a queue) has been detected. Such jobs are printed in an ERROR JOBS section at the very end of the output. The ERROR JOBS section should disappear upon restart of sge_qmaster(8). Please contact your Grid Engine support representative if you feel uncertain about the cause or effects of such jobs.
If the -r option was specified together with qstat, the following information for each displayed job is printed (a single line for each of the following job characteristics):
For each job the following additional items are displayed:
For each job the following additional urgency policy related items are displayed (see also sge_priority(5)):
For each job, the following additional job priority related items are displayed (see also sge_priority(5)):
The name of the default cell, i.e. default.
<sge_root>/<cell>/common/act_qmaster Grid Engine master host file <sge_root>/<cell>/common/sge_qstat cluster qstat default options $HOME/.sge_qstat user qstat default options
sge_intro(1), accounting(5), load_parameters(5), qalter(1), qconf(1), qhold(1), qhost(1), qmod(1), qsub(1), queue_conf(5), sge_execd(8), sge_qmaster(8), sge_status(5). sge_shepherd(8).
See sge_intro(1) for a full statement of rights and permissions.
2012-09-17 | SGE 8.1.3pre |