RNAcalibrate - calibrate statistics of secondary structure
hybridisations of RNAs
RNAcalibrate [-h] [-d frequency_file] [-f
from,to] [-k sample_size] [-l
mean,std] [-m max_target_length] [-n
max_query_length] [-u iloop_upper_limit]
[-v bloop_upper_limit] [-s] [-t
target_file] [-q query_file]
[target] [query]
RNAcalibrate is a tool for calibrating minimum free energy
(mfe) hybridisations performed with RNAhybrid. It searches a random database
that can be given on the command line or otherwise generates random
sequences according to given sample size, length distribution parameters and
dinucleotide frequencies. To the empirical distribution of length normalised
minimum free energies, parameters of an extreme value distribution (evd) are
fitted. The output gives for each miRNA its name (or
"command_line" if it was submitted on the command line), the
number of data points the evd fit was done on, the location and the scale
parameter. The location and scale parameters of the evd can then be given to
RNAhybrid for the calculation of mfe p-values.
- -h
- Give a short summary of command line options.
- -d
frequency_file
- Generate random sequences according to dinucleotide frequencies given in
frequency_file. See example directory for example files.
- -f from,to
- Forces all structures to have a helix from position from to
position to with respect to the query. The first base has position
1.
- -k
sample_size
- Generate sample_size random sequences. Default value is 5000.
- -l mean,std
- Generate random sequences with a normal length distribution of mean
mean and standard deviation std. Default values are 500 and
300, respectively.
- -m
max_target_length
- The maximum allowed length of a target sequence. The default value is
2000. This option only has an effect if a target file is given with the -t
option (see below).
- -n
max_query_length
- The maximum allowed length of a query sequence. The default value is 30.
This option only has an effect if a query file is given with the -q option
(see below).
- -u
iloop_upper_limit
- The maximally allowed number of unpaired nucleotides in either side of an
internal loop.
- -v
bloop_upper_limit
- The maximally allowed number of unpaired nucleotides in a bulge loop.
- -s
- Generate random sequences according to the dinucleotide distribution of
given targets (either with the -t option or on command line. If no -t is
given, either the last argument (if a -q is given) or the second last
argument (if no -q is given) to RNAcalibrate is taken as a target). See -t
option.
- -t
target_file
- Without the -s option, each of the target sequences in target_file
is subject to hybridisation with each of the queries (which either are
from the query_file or is the one query given on command line; see
-q below). The sequences in the target_file have to be in FASTA
format, ie. one line starting with a > and directly followed by a name,
then one or more following lines with the sequence itself. Each individual
sequence line must not have more than 1000 characters.
With the -s option, the target (or target file) dinucleotide
distribution is counted, and random sequences are generated according to
this distribution.
If no -t is given, random sequences are generated as described
above (see -d option).
- -q query_file
- See -t option above. If no -q is given, the last argument to RNAcalibrate
is taken as a query.
The energy parameters are taken from:
Mathews DH, Sabina J, Zuker M, Turner DH. "Expanded sequence
dependence of thermodynamic parameters improves prediction of RNA secondary
structure" J Mol Biol., 288 (5), pp 911-940, 1999
This man page documents version 2.0 of RNAcalibrate.
Marc Rehmsmeier, Peter Steffen, Matthias Hoechsmann.
Character dependent energy values are only defined for
[acgtuACGTU]. All other characters lead to values of zero in these
cases.