DOKK / manpages / debian 12 / spaln / sortgrcd.1.en
sortgrcd(1) General Commands Manual sortgrcd(1)

sortgrcd - Postprocess of the output of spaln with -O12 option, Version 2

sortgrcd [options] xxx1.grd(.gz) [xxx2.grd(.gz) ...]

sortgrcd is used to recover the output of spaln with -O12 option, to apply some filtering, and also to rearrange the outputs of multiple Spaln runs.

Minimum cover rate = % nucleotides in predicted exons / length of query (x 3 if query is protein) (0-100)
Filter level #=0: no; #=1: mild; #=2: medium; #=3: stringent (0)
Minimum alignment score
Maximum total number of mismatches near boundaries
Maximum number of non-canonical boundaries
Output format. 0:Gff3, 4:Native, 5:Intron 15: unique intron
Minimum overall % sequence identity (0-100)
sort order of chromosomes/contigs a:alphabetical, b:abundance, c:input order r:reverse for minus strand
Maximum total number of unpaired bases in gaps
Maximum internal memory size used for core sort. Suffix k (or K) or m (or M) may be attached to specify kilo or mega bytes.
Maximum number of mismatches within 10bp from the nearest exon-intron boundary
Allow non-canonical (other than GT..AG, GC..AG, AT..AC) intron ends (0: no)
Maximum number of unpaired (gap) sites within 10bp from the nearest exon-intron boundary

The output format of spaln -O12 has been changed since version 2; in addition to *.grd and *.erd files, *.qrd file will be generated. This change has removed the limitations on the lengths of the identifiers of both target (genomic) and query sequences. The database files that was specified by -d option of spaln must not be changed before running sortgrcd.

(1) "A Space-Efficient and Accurate Method for Mapping and Aligning cDNA Sequences onto Genomic Sequence", O. Gotoh, Nucleic Acid Res., 36 (8), 2630-2638 (2008).
(2) "Direct Mapping and Alignment of Protein Sequences onto Genomic Sequence", O. Gotoh, Bioinformatics, 24 (21) 2438-2444 (2008).

Osamu Gotoh <o.gotoh@aist.go.jp>

2018-09-06