6.3.3.4. psl2bed

The psl2bed script converts 0-based, half-open [start-1, end) Pattern Space Layout (PSL) to unsorted, 0-based, half-open [start-1, end) extended BED-formatted data.

For convenience, we also offer psl2starch, which performs the extra step of creating a Starch-formatted archive.

6.3.3.4.1. Dependencies

The psl2bed script requires Python, version 2.5 or greater.

This script is also dependent on input that follows the PSL specification.

Tip

Conversion of data which are PSL-like, but which do not follow the specification can cause IOError and other runtime exceptions. If you run into problems, please check that your input follows the PSL specification.

6.3.3.4.2. Source

The psl2bed and psl2starch conversion scripts are part of the binary and source downloads of BEDOPS. See the Installation documentation for more details.

6.3.3.4.3. Usage

The psl2bed script parses PSL from standard input and prints sorted BED to standard output. The psl2starch script uses an extra step to parse GFF to a compressed BEDOPS Starch-formatted archive, which is also directed to standard output.

Tip

By default, all conversion scripts now output sorted BED data ready for use with BEDOPS utilities. If you do not want to sort converted output, use the --do-not-sort option. Run the script with the --help option for more details.

Tip

If you are sorting data larger than system memory, use the --max-mem option to limit sort memory usage to a reasonable fraction of available memory, e.g., --max-mem 2G or similar. See --help for more details.

6.3.3.4.4. Example

To demonstrate these scripts, we use a sample GFF input called foo.gff (see the Downloads section to grab this file).

psLayout version 3

match   mis-    rep.    N's     Q gap   Q gap   T gap   T gap   strand  Q               Q       Q       Q       T               T       T       T       block   blockSizes      qStarts  tStarts
        match   match           count   bases   count   bases           name            size    start   end     name            size    start   end     count
---------------------------------------------------------------------------------------------------------------------------------------------------------------
35      0       0       0       0       0       0       0       +       foo     50      15      50      chrX    155270560       40535836        40535871        1       35,     15,     40535836,
34      2       0       0       0       0       0       0       +       foo     50      14      50      chrX    155270560       68019028        68019064        1       36,     14,     68019028,
33      2       0       0       0       0       0       0       +       foo     50      14      49      chrX    155270560       43068135        43068170        1       35,     14,     43068135,
35      2       0       0       0       0       0       0       +       foo     50      13      50      chr8    146364022       131572122       131572159       1       37,     13,     131572122,
30      0       0       0       0       0       0       0       +       foo     50      14      44      chr6    171115067       127685756       127685786       1       30,     14,     127685756,
30      0       0       0       0       0       0       0       +       foo     50      14      44      chr6    171115067       93161871        93161901        1       30,     14,     93161871,
31      0       0       0       0       0       0       0       +       foo     50      13      44      chr5    180915260       119897315       119897346       1       31,     13,     119897315,
30      0       0       0       0       0       0       0       +       foo     50      14      44      chr5    180915260       123254725       123254755       1       30,     14,     123254725,
...

We can convert it to sorted BED data in the following manner:

$ psl2bed --headered < foo.psl
chr1    30571100        30571135        foo     50      -       35      0       0       0       0       0       0       0       15      50      249250621       1       35,     0,      30571100,
chr1    69592160        69592195        foo     50      -       34      1       0       0       0       0       0       0       15      50      249250621       1       35,     0,      69592160,
chr1    107200050       107200100       foo     50      +       50      0       0       0       0       0       0       0       0       50      249250621       1       50,     0,      107200050,
chr11   12618347        12618389        foo     50      +       39      3       0       0       0       0       0       0       8       50      135006516       1       42,     8,      12618347,
chr11   32933028        32933063        foo     50      +       35      0       0       0       1       1       0       0       8       44      135006516       2       4,31,   8,13,   32933028,32933032,
chr11   80116421        80116457        foo     50      +       35      1       0       0       0       0       0       0       14      50      135006516       1       36,     14,     80116421,
chr11   133952291       133952327       foo     50      +       34      2       0       0       0       0       0       0       14      50      135006516       1       36,     14,     133952291,
chr13   99729482        99729523        foo     50      +       39      2       0       0       0       0       0       0       8       49      115169878       1       41,     8,      99729482,
chr13   111391852       111391888       foo     50      +       34      2       0       0       0       0       0       0       14      50      115169878       1       36,     14,     111391852,
chr16   8149657 8149694 foo     50      +       36      1       0       0       0       0       0       0       13      50      90354753        1       37,     13,     8149657,
...

Note

By default, the psl2bed and psl2starch scripts work with headerless PSL data. If you have headered PSL output, use the --headered operator with either conversion script, as shown in the example above.

6.3.3.4.5. Downloads