condor_q(1) | General Commands Manual | condor_q(1) |
condor_q Display - information about jobs in queue
condor_q [-help [Universe | State]]
condor_q[-debug] [general options] [restriction list] [output options] [analyze options]
condor_q displays information about jobs in the HTCondor job queue. By default, condor_q queries the local job queue, but this behavior may be modified by specifying one of the general options.
As of version 8.5.2, condor_qdefaults to querying only the current user's jobs. This default is overridden when the restriction list has usernames and/or job ids, when the -submitteror -allusersarguments are specified, or when the current user is a queue superuser. It can also be overridden by setting the CONDOR_Q_ONLY_MY_JOBSconfiguration macro to False.
As of version 8.5.6, condor_qdefaults to batch-mode output (see -batchin the Options section below). The old behavior can be obtained by specifying -nobatchon the command line. To change the default back to its pre-8.5.6 value, set the new configuration variable CONDOR_Q_DASH_BATCH_IS_DEFAULTto False.
As of version 8.5.6, condor_q defaults to displaying information about batches of jobs, rather than individual jobs. The intention is that this will be a more useful, and user-friendly, format for users with large numbers of jobs in the queue. Ideally, users will specify meaningful batch names for their jobs, to make it easier to keep track of related jobs.
(For information about specifying batch names for your jobs, see the condor_submit( 11) and condor_submit_dag( 11) man pages.)
A batch of jobs is defined as follows:
There are many output options that modify the output generated by condor_q. The effects of these options, and the meanings of the various output data, are described below.
If the -longoption is specified, condor_q displays a long description of the queried jobs by printing the entire job ClassAd for all jobs matching the restrictions, if any. Individual attributes of the job ClassAd can be displayed by means of the -formatoption, which displays attributes with a printf(3) format, or with the -autoformatoption. Multiple -formatoptions may be specified in the option list to display several attributes of the job.
For most output options (except as specified), the last line of condor_qoutput contains a summary of the queue: the total number of jobs, and the number of jobs in the completed, removed, idle, running, held and suspended states.
If no output options are specified, condor_qnow defaults to batch mode, and displays the following columns of information, with one line of output per batch of jobs:
OWNER, BATCH_NAME, SUBMITTED, DONE, RUN, IDLE, [HOLD,] TOTAL, JOB_IDS
Note that the HOLD column is only shown if there are held jobs in the output or if there are nojobs in the output.
If the -nobatchoption is specified, condor_qdisplays the following columns of information, with one line of output per job:
ID, OWNER, SUBMITTED, RUN_TIME, ST, PRI, SIZE, CMD
If the -dagoption is specified (in conjunction with -nobatch), condor_qdisplays the following columns of information, with one line of output per job; the owner is shown only for top-level jobs, and for all other jobs (including sub-DAGs) the node name is shown:
ID, OWNER/NODENAME, SUBMITTED, RUN_TIME, ST, PRI, SIZE, CMD
If the -runoption is specified (in conjunction with -nobatch), condor_qdisplays the following columns of information, with one line of output per running job:
ID, OWNER, SUBMITTED, RUN_TIME, HOST(S)
Also note that the -runoption disables output of the totals line.
If the -gridoption is specified, condor_qdisplays the following columns of information, with one line of output per job:
ID, OWNER, STATUS, GRID->MANAGER, HOST, GRID_JOB_ID
If the -goodputoption is specified, condor_qdisplays the following columns of information, with one line of output per job:
ID, OWNER, SUBMITTED, RUN_TIME, GOODPUT, CPU_UTIL, Mb/s
If the -iooption is specified, condor_qdisplays the following columns of information, with one line of output per job:
ID, OWNER, RUNS, ST, INPUT, OUTPUT, RATE, MISC
If the -cputimeoption is specified (in conjunction with -nobatch), condor_qdisplays the following columns of information, with one line of output per job:
ID, OWNER, SUBMITTED, CPU_TIME, ST, PRI, SIZE, CMD
If the -holdoption is specified, condor_qdisplays the following columns of information, with one line of output per job:
ID, OWNER, HELD_SINCE, HOLD_REASON
If the -totalsoption is specified, condor_qdisplays only one line of output no matter how many jobs and batches of jobs are in the queue. That line of output contains the total number of jobs, and the number of jobs in the completed, removed, idle, running, held and suspended states.
The available output data are as follows:
ID
OWNER
OWNER/NODENAME
BATCH_NAME
SUBMITTED
DONE
RUN
IDLE
HOLD
TOTAL
JOB_IDS
RUN_TIME
ST
PRI
SIZE
CMD
HOST(S)
STATUS
GRID->MANAGER
HOST
GRID_JOB_ID
GOODPUT
CPU_UTIL
Mb/s
INPUT
OUTPUT
RATE
MISC
CPU_TIME
HELD_SINCE
HOLD_REASON
The -analyzeor -better-analyzeoptions can be used to determine why certain jobs are not running by performing an analysis on a per machine basis for each machine in the pool. The reasons can vary among failed constraints, insufficient priority, resource owner preferences and prevention of preemption by the PREEMPTION_REQUIREMENTSexpression. If the analyze option -verboseis specified along with the -analyzeoption, the reason for failure is displayed on a per machine basis. -better-analyzediffers from -analyzein that it will do matchmaking analysis on jobs even if they are currently running, or if the reason they are not running is not due to matchmaking. -better-analyzealso produces more thorough analysis of complex Requirements and shows the values of relevant job ClassAd attributes. When only a single machine is being analyzed via -machineor -mconstraint, the values of relevant attributes of the machine ClassAd are also displayed.
To restrict the display to jobs of interest, a list of zero or more restriction options may be supplied. Each restriction may be one of:
If clusteror cluster.processis specified, and the job matching that restriction is a condor_dagmanjob, information for all jobs of that DAG is displayed in batch mode (in non-batch mode, only the condor_dagmanjob itself is displayed).
If no ownerrestrictions are present, the job matches the restriction list if it matches at least one restriction in the list. If ownerrestrictions are present, the job matches the list if it matches one of the ownerrestrictions andat least one non-ownerrestriction.
-debug
-batch
-nobatch
-global
-submitter submitter
-name name
-pool centralmanagerhostname[:portnumber]
-jobads file
-userlog file
-autocluster
-cputime
-currentrun
-dag
-expert
-grid
-goodput
-help [Universe | State]
-hold
-limit Number
-io
-long
-run
-stream-results
-totals
-version
-wide
-xml
-json
-attributes Attr1[,Attr2 ...]
-format fmt attr
-autoformat[:jlhVr,tng] attr1 [attr2 ...]or -af[:jlhVr,tng] attr1 [attr2 ...]
-analyze[:<qual>]
-better-analyze[:<qual>]
-machine name
-mconstraint expression
-slotads file
-userprios file
-nouserprios
-reverse-analyze
-verbose
The default output from condor_qis formatted to be human readable, not script readable. In an effort to make the output fit within 80 characters, values in some fields might be truncated. Furthermore, the HTCondor Project can (and does) change the formatting of this default output as we see fit. Therefore, any script that is attempting to parse data from condor_qis strongly encouraged to use the -formatoption (described above, examples given below).
Although -analyzeprovides a very good first approximation, the analyzer cannot diagnose all possible situations, because the analysis is based on instantaneous and local information. Therefore, there are some situations such as when several submitters are contending for resources, or if the pool is rapidly changing state which cannot be accurately diagnosed.
Options -goodput, -cputime, and -ioare most useful for standard universe jobs, since they rely on values computed when a job produces a checkpoint.
It is possible to to hold jobs that are in the X state. To avoid this it is best to construct a -constraint expressionthat option contains JobStatus != 3if the user wishes to avoid this condition.
The -formatoption provides a way to specify both the job attributes and formatting of those attributes. There must be only one conversion specification per -formatoption. As an example, to list only Jane Doe's jobs in the queue, choosing to print and format only the owner of the job, the command line arguments for the job, and the process ID of the job:
$ condor_q -submitter jdoe -format "%s" Owner -format
" %s " Args -format " ProcId = %d\n" ProcId
jdoe 16386 2800 ProcId = 0
jdoe 16386 3000 ProcId = 1
jdoe 16386 3200 ProcId = 2
jdoe 16386 3400 ProcId = 3
jdoe 16386 3600 ProcId = 4
jdoe 16386 4200 ProcId = 7
To display only the JobID's of Jane Doe's jobs you can use the following.
$ condor_q -submitter jdoe -format "%d." ClusterId
-format "%d\n" ProcId
27.0
27.1
27.2
27.3
27.4
27.7
An example that shows the analysis in summary format:
$ condor_q -analyze:summary
-- Submitter: submit-1.chtc.wisc.edu :
<192.168.100.43:9618?sock=11794_95bb_3> :
submit-1.chtc.wisc.edu
Analyzing matches for 5979 slots
Autocluster Matches Machine Running Serving
JobId Members/Idle Reqmnts Rejects Job Users Job Other User Avail Owner
---------- ------------ -------- ------------ ---------- ---------- -----
-----
25764522.0 7/0 5910 820 7/10 5046 34 smith
25764682.0 9/0 2172 603 9/9 1531 29 smith
25765082.0 18/0 2172 603 18/9 1531 29 smith
25765900.0 1/0 2172 603 1/9 1531 29 smith
An example that shows summary information by machine:
$ condor_q -ana:sum,rev
-- Submitter: s-1.chtc.wisc.edu :
<192.168.100.43:9618?sock=11794_95bb_3> : s-1.chtc.wisc.edu
Analyzing matches for 2885 jobs
Slot Slot's Req Job's Req Both
Name Type Matches Job Matches Slot Match %
------------------------ ---- ------------ ------------ ----------
slot1@INFO.wisc.edu Stat 2729 0 0.00
slot2@INFO.wisc.edu Stat 2729 0 0.00
slot1@aci-001.chtc.wisc.edu Part 0 2793 0.00
slot1_1@a-001.chtc.wisc.edu Dyn 2644 2792 91.37
slot1_2@a-001.chtc.wisc.edu Dyn 2623 2601 85.10
slot1_3@a-001.chtc.wisc.edu Dyn 2644 2632 85.82
slot1_4@a-001.chtc.wisc.edu Dyn 2644 2792 91.37
slot1@a-002.chtc.wisc.edu Part 0 2633 0.00
slot1_10@a-002.chtc.wisc.edu Den 2623 2601 85.10
An example with two independent DAGs in the queue:
$ condor_q
-- Schedd: wenger@manta.cs.wisc.edu : <128.105.14.228:35169?...
OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS
wenger DAG: 3696 2/12 11:55 _ 10 _ 10 3698.0 ... 3707.0
wenger DAG: 3697 2/12 11:55 1 1 1 10 3709.0 ... 3710.0
14 jobs; 0 completed, 0 removed, 1 idle, 13 running, 0 held, 0 suspended
Note that the "13 running" in the last line is two more than the total of the RUN column, because the two condor_dagmanjobs themselves are counted in the last line but not the RUN column.
Also note that the "completed" value in the last line does not correspond to the total of the DONE column, because the "completed" value in the last line only counts jobs that are completed but still in the queue, whereas the DONE column counts jobs that are no longer in the queue.
Here's an example with a held job, illustrating the addition of the HOLD column to the output:
$ condor_q
-- Schedd: wenger@manta.cs.wisc.edu : <128.105.14.228:9619?...
OWNER BATCH_NAME SUBMITTED DONE RUN IDLE HOLD TOTAL JOB_IDS
wenger CMD: /bin/slee 9/13 16:25 _ 3 _ 1 4 599.0 ...
4 jobs; 0 completed, 0 removed, 0 idle, 3 running, 1 held, 0 suspended
Here are some examples with a nested-DAG workflow in the queue, which is one of the most complicated cases. The workflow consists of a top-level DAG with nodes NodeA and NodeB, each with two two-proc clusters; and a sub-DAG SubZ with nodes NodeSA and NodeSB, each with two two-proc clusters.
First of all, non-batch mode with all of the node jobs in the queue:
$ condor_q -nobatch
-- Schedd: wenger@manta.cs.wisc.edu : <128.105.14.228:9619?...
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
591.0 wenger 9/13 16:05 0+00:00:13 R 0 2.4 condor_dagman -p 0
592.0 wenger 9/13 16:05 0+00:00:07 R 0 0.0 sleep 60
592.1 wenger 9/13 16:05 0+00:00:07 R 0 0.0 sleep 300
593.0 wenger 9/13 16:05 0+00:00:07 R 0 0.0 sleep 60
593.1 wenger 9/13 16:05 0+00:00:07 R 0 0.0 sleep 300
594.0 wenger 9/13 16:05 0+00:00:07 R 0 2.4 condor_dagman -p 0
595.0 wenger 9/13 16:05 0+00:00:01 R 0 0.0 sleep 60
595.1 wenger 9/13 16:05 0+00:00:01 R 0 0.0 sleep 300
596.0 wenger 9/13 16:05 0+00:00:01 R 0 0.0 sleep 60
596.1 wenger 9/13 16:05 0+00:00:01 R 0 0.0 sleep 300
10 jobs; 0 completed, 0 removed, 0 idle, 10 running, 0 held, 0 suspended
Now non-batch mode with the -dagoption (unfortunately, condor_qdoesn't do a good job of grouping procs in the same cluster together):
$ condor_q -nobatch -dag
-- Schedd: wenger@manta.cs.wisc.edu : <128.105.14.228:9619?...
ID OWNER/NODENAME SUBMITTED RUN_TIME ST PRI SIZE CMD
591.0 wenger 9/13 16:05 0+00:00:27 R 0 2.4 condor_dagman -
592.0 |-NodeA 9/13 16:05 0+00:00:21 R 0 0.0 sleep 60
593.0 |-NodeB 9/13 16:05 0+00:00:21 R 0 0.0 sleep 60
594.0 |-SubZ 9/13 16:05 0+00:00:21 R 0 2.4 condor_dagman -
595.0 |-NodeSA 9/13 16:05 0+00:00:15 R 0 0.0 sleep 60
596.0 |-NodeSB 9/13 16:05 0+00:00:15 R 0 0.0 sleep 60
592.1 |-NodeA 9/13 16:05 0+00:00:21 R 0 0.0 sleep 300
593.1 |-NodeB 9/13 16:05 0+00:00:21 R 0 0.0 sleep 300
595.1 |-NodeSA 9/13 16:05 0+00:00:15 R 0 0.0 sleep 300
596.1 |-NodeSB 9/13 16:05 0+00:00:15 R 0 0.0 sleep 300
10 jobs; 0 completed, 0 removed, 0 idle, 10 running, 0 held, 0 suspended
Now, finally, the non-batch (default) mode:
$ condor_q
-- Schedd: wenger@manta.cs.wisc.edu : <128.105.14.228:9619?...
OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS
wenger ex1.dag+591 9/13 16:05 _ 8 _ 5 592.0 ... 596.1
10 jobs; 0 completed, 0 removed, 0 idle, 10 running, 0 held, 0 suspended
There are several things about this output that may be slightly confusing:
Now here is non-batch mode after proc 0 of each node job has finished:
$ condor_q -nobatch
-- Schedd: wenger@manta.cs.wisc.edu : <128.105.14.228:9619?...
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
591.0 wenger 9/13 16:05 0+00:01:19 R 0 2.4 condor_dagman -p 0
592.1 wenger 9/13 16:05 0+00:01:13 R 0 0.0 sleep 300
593.1 wenger 9/13 16:05 0+00:01:13 R 0 0.0 sleep 300
594.0 wenger 9/13 16:05 0+00:01:13 R 0 2.4 condor_dagman -p 0
595.1 wenger 9/13 16:05 0+00:01:07 R 0 0.0 sleep 300
596.1 wenger 9/13 16:05 0+00:01:07 R 0 0.0 sleep 300
6 jobs; 0 completed, 0 removed, 0 idle, 6 running, 0 held, 0 suspended
The same state also with the -dagoption:
$ condor_q -nobatch -dag
-- Schedd: wenger@manta.cs.wisc.edu : <128.105.14.228:9619?...
ID OWNER/NODENAME SUBMITTED RUN_TIME ST PRI SIZE CMD
591.0 wenger 9/13 16:05 0+00:01:30 R 0 2.4 condor_dagman -
592.1 |-NodeA 9/13 16:05 0+00:01:24 R 0 0.0 sleep 300
593.1 |-NodeB 9/13 16:05 0+00:01:24 R 0 0.0 sleep 300
594.0 |-SubZ 9/13 16:05 0+00:01:24 R 0 2.4 condor_dagman -
595.1 |-NodeSA 9/13 16:05 0+00:01:18 R 0 0.0 sleep 300
596.1 |-NodeSB 9/13 16:05 0+00:01:18 R 0 0.0 sleep 300
6 jobs; 0 completed, 0 removed, 0 idle, 6 running, 0 held, 0 suspended
And, finally, that state in batch (default) mode:
$ condor_q
-- Schedd: wenger@manta.cs.wisc.edu : <128.105.14.228:9619?...
OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS
wenger ex1.dag+591 9/13 16:05 _ 4 _ 5 592.1 ... 596.1
6 jobs; 0 completed, 0 removed, 0 idle, 6 running, 0 held, 0 suspended
condor_qwill exit with a status value of 0 (zero) upon success, and it will exit with the value 1 (one) upon failure.
Center for High Throughput Computing, University of Wisconsin-Madison
Copyright (C) 1990-2016 Center for High Throughput Computing, Computer Sciences Department, University of Wisconsin-Madison, Madison, WI. All Rights Reserved. Licensed under the Apache License, Version 2.0.
May 2022 |