NAME

sge_status - Grid Engine job status values

DESCRIPTION

Job state

The following table lists the job states shown by qstat(1) and returned by drmaa_jobcontrol(3). The DRMAA state corresponds to the DRMAA_PS_state value that may be returned by drmaa_job_ps(3).

Category	State	SGE	DRMAA state
Pending	pending	qw, Rq	QUEUED_ACTIVE
	pending, user hold	hqw	USER_ON_HOLD
	pending, system hold	hqw	SYSTEM_ON_HOLD
	pending, user and system hold	hqw	USER_SYSTEM_ON_HOLD
	pending, user hold, re-queue	hRwq	USER_ON_HOLD
	pending, system hold, re-queue	hRwq	SYSTEM_ON_HOLD
	pending, user and system hold, re-queue	hRwq	USER_SYSTEM_ON_HOLD
Running / transferring	running, transferring	r, hr, t	RUNNING
Running / transferring	running, re-run / transferring	Rr, Rt	RUNNING
Suspended	job suspended	s, ts	USER_SUSPENDED
	queue suspended	S, tS	SYSTEM_SUSPENDED
	queue suspended by alarm	T, tT	SYSTEM_SUSPENDED
	all suspended with re-run	Rs, Rts, RS, RtS, RT, RtT	SYSTEM_SUSPENDED
Error	all pending states with error	Eqw, Ehqw, EhRqw	FAILED
Deleting	all running and suspended states with deletion	dr, dt, dRr, dRt, ds, dS, dT, dRs, dRS, dRT	same as equivalent DRMAA states without the "d"
Finished	job finished normally	z	DONE
Unkown	status cannot be determined		UNDETERMINED

The following table lists the "failed" values reported by qacct(1) (see accounting(5)), their description, also reported by qacct, whether the resource usage accounting data are valid for the job ("OK"), and an explanation. The host's messages file or the shepherd trace file (preserved with execd_params KEEP_ACTIVE in sge_conf(5)) may provide more information about errors.

Code	Description	OK	Explanation
0	no failure	Y	ran and exited normally
1	assumedly before job	N	failed early in execd
3	before writing config	N	failed before execd set up local spool
4	before writing PID	N	shepherd failed to record its pid - filesystem problem?
6	setting processor set	N	failed setting up processor set (obsolete)
7	before prolog	N	failed before prolog
8	in prolog	N	failed in prolog
9	before pestart	N	failed before starting PE
10	in pestart	N	failed in PE starter
11	before job	N	failed in shepherd before starting job
12	before pestop	Y	ran, but failed before calling PE stop procedure
13	in pestop	Y	ran, but PE stop procedure failed
14	before epilog	Y	ran, but failed before calling epilog
15	in epilog	Y	ran, but failed in epilog
16	releasing processor set	Y	ran, but processor set could not be released (obsolete)
17	through signal	Y	job killed by signal (possibly qdel)
18	shepherd returned error	N	shepherd died somehow
19	before writing exit_status	N	shepherd didn't write reports correctly - probably program or machine crash
20	found unexpected error file	?	shepherd encountered a problem
21	in recognizing job	N	qmaster asked about an unknown job (not in accounting?)
24	migrating (checkpointing jobs)	Y	ran, will be migrated
25	rescheduling	Y	ran, will be rescheduled
26	opening output file	N	failed opening stderr/stdout file
27	searching requested shell	N	failed finding specified shell
28	changing to working directory	N	failed changing to start directory
29	AFS setup	N	failed setting up AFS security
30	application error returned	Y	ran and exited 100 - maybe re-scheduled
31	accessing sgepasswd file	N	failed because sgepasswd not readable (MS Windows)
32	entry is missing in password file	N	failed because user not in sgepasswd (MS Windows)
33	wrong password	N	failed because of wrong password against sgepasswd (MS Windows)
34	communicating with Grid Engine Helper Service	N	failed because of failure of helper service (MS Windows)
35	before job in Grid Engine Helper Service	N	failed because of failure running helper service (MS Windows)
36	checking configured daemons	N	failed because of configured remote startup daemon
37	qmaster enforced h_rt, h_cpu, or h_vmem limit	Y	ran, but killed due to exceeding run time limit
38	adding supplementary group	N	failed adding supplementary gid to job
100	assumedly after job	Y	ran, but killed by a signal (perhaps due to exceeding resources), task died, shepherd died (e.g. node crash), etc.

See sge_shepherd(8) for the effect of non-zero return codes from the various methods (prolog etc.) executed by the shepherd.

NAME

DESCRIPTION

Job state

SEE ALSO