6.3.3.4. psl2bed¶
The psl2bed
script converts 0-based, half-open [start-1, end)
Pattern Space Layout (PSL) to unsorted, 0-based, half-open [start-1, end)
extended BED-formatted data.
For convenience, we also offer psl2starch
, which performs the extra step of creating a Starch-formatted archive.
6.3.3.4.1. Dependencies¶
The psl2bed
script requires Python, version 2.5 or greater.
This script is also dependent on input that follows the PSL specification.
Tip
Conversion of data which are PSL-like, but which do not follow the specification can cause IOError
and other runtime exceptions. If you run into problems, please check that your input follows the PSL specification.
6.3.3.4.2. Source¶
The psl2bed
and psl2starch
conversion scripts are part of the binary and source downloads of BEDOPS. See the Installation documentation for more details.
6.3.3.4.3. Usage¶
The psl2bed
script parses PSL from standard input and prints sorted BED to standard output. The psl2starch
script uses an extra step to parse GFF to a compressed BEDOPS Starch-formatted archive, which is also directed to standard output.
Tip
By default, all conversion scripts now output sorted BED data ready for use with BEDOPS utilities. If you do not want to sort converted output, use the --do-not-sort
option. Run the script with the --help
option for more details.
Tip
If you are sorting data larger than system memory, use the --max-mem
option to limit sort memory usage to a reasonable fraction of available memory, e.g., --max-mem 2G
or similar. See --help
for more details.
6.3.3.4.4. Example¶
To demonstrate these scripts, we use a sample GFF input called foo.gff
(see the Downloads section to grab this file).
psLayout version 3
match mis- rep. N's Q gap Q gap T gap T gap strand Q Q Q Q T T T T block blockSizes qStarts tStarts
match match count bases count bases name size start end name size start end count
---------------------------------------------------------------------------------------------------------------------------------------------------------------
35 0 0 0 0 0 0 0 + foo 50 15 50 chrX 155270560 40535836 40535871 1 35, 15, 40535836,
34 2 0 0 0 0 0 0 + foo 50 14 50 chrX 155270560 68019028 68019064 1 36, 14, 68019028,
33 2 0 0 0 0 0 0 + foo 50 14 49 chrX 155270560 43068135 43068170 1 35, 14, 43068135,
35 2 0 0 0 0 0 0 + foo 50 13 50 chr8 146364022 131572122 131572159 1 37, 13, 131572122,
30 0 0 0 0 0 0 0 + foo 50 14 44 chr6 171115067 127685756 127685786 1 30, 14, 127685756,
30 0 0 0 0 0 0 0 + foo 50 14 44 chr6 171115067 93161871 93161901 1 30, 14, 93161871,
31 0 0 0 0 0 0 0 + foo 50 13 44 chr5 180915260 119897315 119897346 1 31, 13, 119897315,
30 0 0 0 0 0 0 0 + foo 50 14 44 chr5 180915260 123254725 123254755 1 30, 14, 123254725,
...
We can convert it to sorted BED data in the following manner:
$ psl2bed --headered < foo.psl
chr1 30571100 30571135 foo 50 - 35 0 0 0 0 0 0 0 15 50 249250621 1 35, 0, 30571100,
chr1 69592160 69592195 foo 50 - 34 1 0 0 0 0 0 0 15 50 249250621 1 35, 0, 69592160,
chr1 107200050 107200100 foo 50 + 50 0 0 0 0 0 0 0 0 50 249250621 1 50, 0, 107200050,
chr11 12618347 12618389 foo 50 + 39 3 0 0 0 0 0 0 8 50 135006516 1 42, 8, 12618347,
chr11 32933028 32933063 foo 50 + 35 0 0 0 1 1 0 0 8 44 135006516 2 4,31, 8,13, 32933028,32933032,
chr11 80116421 80116457 foo 50 + 35 1 0 0 0 0 0 0 14 50 135006516 1 36, 14, 80116421,
chr11 133952291 133952327 foo 50 + 34 2 0 0 0 0 0 0 14 50 135006516 1 36, 14, 133952291,
chr13 99729482 99729523 foo 50 + 39 2 0 0 0 0 0 0 8 49 115169878 1 41, 8, 99729482,
chr13 111391852 111391888 foo 50 + 34 2 0 0 0 0 0 0 14 50 115169878 1 36, 14, 111391852,
chr16 8149657 8149694 foo 50 + 36 1 0 0 0 0 0 0 13 50 90354753 1 37, 13, 8149657,
...
Note
By default, the psl2bed
and psl2starch
scripts work with headerless PSL data. If you have headered PSL output, use the --headered
operator with either conversion script, as shown in the example above.