RNAhybrid - calculate secondary structure hybridisations of
RNAs
RNAhybrid [-h] [-b hit_number] [-e
energy_cutoff] [-p p-value_cutoff] [-c]
[-d xi,theta] [-s set_name] [-f
from,to] [-m max_target_length] [-n
max_query_length] [-u iloop_upper_limit]
[-v bloop_upper_limit] [-g (ps|png|jpg|all)] [-t
target_file] [-q query_file]
[target] [query]
RNAhybrid is a tool for finding minimum free energy (mfe)
hybridisations of a long (target) and a short (query) RNA. The hybridisation
is performed in a kind of domain mode, ie. the short sequence is hybridised
to the best fitting parts of the long one. The tool is primarily meant as a
means for microRNA target prediction. In addition to mfes, the program
calculates p-values based on extreme value distributions of length
normalised energies.
- -h
- Give a short summary of command line options.
- -b hit_number
- Maximal number of hits to show. hit_number hits with increasing
minimum free energy (reminder: larger energies are worse) are shown,
unless the -e option is used and the energy cut-off has been exceeded (see
-e option below) or there are no more hits. Hits may only overlap at
dangling bases (5' or 3' unpaired end of target).
- -c
- Produce compact output. For each target/query pair one line of output is
generated. Each line is a colon (:) separated list of the following
fields: target name, query name, minimum free energy, position in target,
alignment line 1, line 2, line 3, line 4. If a target or a query is given
on command line (ie. no -t or -q respectively), its name in the output
will be "command line".
- -d xi,theta
- xi and theta are the position and shape parameters, respectively, of an
extreme value distribution (evd). p-values of duplex energies are assumed
to be distributed according to such an evd. For a length normalised energy
en, we have P[X <= en] = 1 - exp(-exp(-(-en-xi)/theta)), where en =
e/log(m*n) with m and n being the lengths of the target and the query,
respectively. If the -d option is omitted, xi and theta are estimated from
the maximal duplex energy of the query, assuming a linear dependence. The
parameters of this linear dependence are coded into the program, but the
option -s has to be given to choose from the appropriate set. Note that
the evd is mirrored, since good mfes are large negative values.
- -s set_name
- Used for quick estimate of extreme value distribution parameters (see -d
option above). Tells RNAhybrid which target dataset to assume. Valid
parameters are 3utr_fly, 3utr_worm and 3utr_human.
- -e
energy_cutoff
- Hits with increasing minimum free energy (reminder: larger energies are
worse) less than or equal to energy_cutoff are shown, unless the -b
option is used and the number of already reported hits has reached the
maximal hit_number (see -b option above). Hits may only overlap at
dangling bases (5' or 3' unpaired end of target).
- -p
p-value_cutoff
- Only hits with p-values not larger than p-value_cutoff are
reported. See also options -d and -s.
- -f from,to
- Forces all structures to have a helix from position from to
position to with respect to the query. The first base has position
1.
- -m
max_target_length
- The maximum allowed length of a target sequence. The default value is
2000. This option only has an effect if a target file is given with the -t
option (see below).
- -n
max_query_length
- The maximum allowed length of a query sequence. The default value is 30.
This option only has an effect if a query file is given with the -q option
(see below).
- -u
iloop_upper_limit
- The maximally allowed number of unpaired nucleotides in either side of an
internal loop.
- -v
bloop_upper_limit
- The maximally allowed number of unpaired nucleotides in a bulge loop.
- -g (ps|png|jpg|all)
- Produce a plot of the hybridisation, either in ps, png or jpg format, or
for all formats together. The plots are saved in files whose names are
created from the target and query names ("command_line" if given
on the command line). This option only works, if the appropriate graphics
libraries are present.
- -t
target_file
- Each of the target sequences in target_file is subject to
hybridisation with each of the queries (which either are from the
query_file or is the one query given on command line; see -q
below). The sequences in the target_file have to be in FASTA
format, ie. one line starting with a > and directly followed by a name,
then one or more following lines with the sequence itself. Each individual
sequence line must not have more than 1000 characters. If no -t is given,
either the last argument (if a -q is given) or the second last argument
(if no -q is given) to RNAhybrid is taken as a target.
- -q query_file
- See -t option above.
The energy parameters are taken from:
Mathews DH, Sabina J, Zuker M, Turner DH. "Expanded sequence
dependence of thermodynamic parameters improves prediction of RNA secondary
structure" J Mol Biol., 288 (5), pp 911-940, 1999
The graphical output uses code from the Vienna RNA package:
Hofacker IL. "Vienna RNA secondary structure server."
Nucleic Acids Research, 31 (13), pp 3429-3431, 2003
This man page documents version 2.0 of RNAhybrid.
Marc Rehmsmeier, Peter Steffen, Matthias Hoechsmann.
Character dependent energy values are only defined for
[acgtuACGTU]. All other characters lead to values of zero in these
cases.
In suboptimal hits, dangling ends appear as Ns if they were in the
first or last hybridising position of a previous hit.
RNAcalibrate, RNAeffective