BaitFilter-v1.0.6 - manual page for BaitFilter-v1.0.6
Welcome to Bait-Filter, version 1.0.6.
USAGE:
- ./BaitFilter-v1.0.6
- -i <string> [-o <string>] [-c <string>] [-m
<string>] [--blast-second-hit-evalue <floating point number>]
[--blast-first-hit-evalue <floating point number>]
[--blast-min-hit-coverage-of-baits-in-tiling-stack <floating point
number>] [--ref-blast-db <string>] [--blast-extra-commandline
<string>] [--blast-evalue-cutoff <floating point number>] [-B
<string>] [-t <positive integer>] [--ID-prefix <string>]
[-S] [--verbosity <unsigned integer>] [-b <string>] [--]
[--version] [-h]
Where:
-i <string>, --input-bait-file-name
<string>
- (required)
- Name of the input bait locus file. This is the bait file
- obtained from the BaitFisher program or from a previous filter run with
BaitFilter.
-o <string>, --output-bait-file-name
<string>
- Name of the output bait file. All modes, except the conversion mode,
produce files in the BaitFisher format.
-c <string>, --convert <string>
- Allows the user to produce the final output file which can be uploaded at
a bait producing company. In this mode, BaitFilter reads the input bait
file and instead of doing a filtering step, it produces a custom bait file
that can be uploaded at the baits producing company. In order to avoid
confusion, a filtering step cannot be done in the same run as the
conversion. If you want to filter a bait file and convert the output, you
will need to call this program more than once, first to do the filtering
and second to do the conversion. Allowed conversion parameters currently
are: "four-column-upload".
- New output formats can be added upon request. Please contact the author:
Christoph Mayer, Email: Mayer Christoph
<c.mayer.zfmk@uni-bonn.de>
-m <string>, --mode <string>
- Apart from the input file option, the mode option is the most important
option. This option specifies which filter mode BaitFilter uses. (See the
user manual for more details):
- "ab":
- Retain only the best bait locus for each alignment file
- when using the optimality
criterion
- to minimize the total
- number of required baits.
- "as":
- Retain only the best bait locus for each alignment file
- when using the optimality
criterion
- to maximize the number
- of sequences the result is based on.
- "fb":
- Retain only the best bait locus for each feature (e.g. CDS)
- when using the optimality
criterion
- to minimize the total
- number of required baits. Only applicable if alignment cutting has been
used in BaitFisher.
- "fs":
- Retain only the best bait locus for each feature (e.g. CDS)
- when using the optimality
criterion
- to maximize the number
- of sequences the result is based on. Only applicable if alignment cutting
has been used in BaitFisher.
- "blast-a": Remove all bait regions of all ALIGNMENTs for which
one or more baits have at least two good hits to a reference genome. (Not
recommended.)
- "blast-f": Remove all bait regions of all FEATUREs for which one
or more baits have at least two good hits to a reference genome. (Not
recommended.)
- "blast-l": Remove only the bait REGIONs that contain a bait that
has multiple good hits to a reference genome. (Recommended over blast-f
and blast-a.)
- "blast-c": Conduct a coverage filter run without a search for
multiple hits. Requires the
blast-min-hit-coverage-of-baits-in-tiling-stack option to be
specified.
- "thin-b":
- Thin out a bait file to every Nth bait region, by finding
- the start position that minimizes the number of baits.
- "thin-s":
- Thin out a bait file to every Nth bait region, by finding
- the start position that maximizes the number of sequences.
- "thin-b-old":
- Similar to thin-b, but treats all loci as if they come
- from one alignment file. Identical to behaviour of thin-b in version 1.0.5
or earlier.
- "thin-s-old":
- Similar to thin-s, but treats all loci as if they come
- from one alignment file. Identical to behaviour of thin-b in version 1.0.5
or earlier.
--blast-second-hit-evalue <floating point
number>
- Maximum E-value for the second or second best hit. A bait is characterised
to bind ambiguously, if we have at least two good hits. This option is the
E-value threshold for the second best hit to different loci of the
genome.This option is the E-value threshold for the second best hit.
Default: 0.000001
--blast-first-hit-evalue <floating point
number>
- Maximum E-value for the first or best hit of the bait against the genome.
A bait is characterized to bind ambiguously, if we have at least two good
hits to different loci of the genome. This option is the E-value threshold
for the first/best hit. Default: 0.000001
--blast-min-hit-coverage-of-baits-in-tiling-stack
<floating point
- number>
- Can be specified together with the following modes (-m option):
blast-a, blast-f, blast-l, blast-c. In all these modes, a blast analysis
of all baits against a reference genome is conducted. This option
specifies a minimum query hit coverage which at least one bait has to have
in each tiling stack (i.e. the column in the tiling design). Otherwise the
bait region is discarded. If not specified, no hit coverage is checked.
The coverage is determined for each bait by dividing the length of the
best hit of this bait against the specified genome by the length of this
bait. Then the highest coverage is determined for each bait stack of the
tiling design. If this option is used together with another filter, it is
important to know the order in which the two are applied, since the order
matters for the final result:For the mode options: blast-a, blast-f,
blast-l the hit coverage is checked after filtering for baits with
multiple good hits to the reference genome.
--ref-blast-db <string>
- Base name to a blast data base file. This name is passed to the blast
command. This is the name of the fasta file of your reference genome.
IMPORTANT: The makeblastdb program has to be called before starting the
Bait-Filter program. makeblastdb takes the fasta file and creates data
base files out of it. Cannot be specified together with the
blast-result-file option.
--blast-extra-commandline <string>
- When invoking the blast command, extra command line parameters can be
passed to the blast program with the aid of this option. As an example ,
this option allows to specify the number of threads the blast program
should use. Example: --blast-extra-commandline "-num_threads
20" sets the number of threads to 20.
--blast-evalue-cutoff <floating point number>
- When conducting a blast search, a maximum E-value can be specified when
calling the blast program. The effect is that hits with a higher E-value
are not reported. BaitFilter always specifies such an E-value when calling
the blast program. The default E-value passed by BaitFilter to the blast
program is twice the --blast-second-hit-evalue. If a coverage
filter is requested the default value is set to 0.001 if twice the value
of --blast-second-hit-evalue is smaller than 0.001. This should
guarantee that all hits necessary for the blast and/or coverage filter are
found. If the user wants to set a different E-value threshold, this can be
specified with this option. With version 1.0.6 of this program, the value
is automatically changed to be larger or equal to 0.001 if the coverage
filter is used. This makes the usage of this option unnecessary in most
cases.
-B <string>, --blast-executable
<string>
- Name of or path+name to the blast executable. Default: blastn. Minimum
blast version number: Blast+ 2.2.x. Default: blastn. Cannot be specified
together with the blast-result-file option.
-t <positive integer>,
--thinning-step-width <positive integer>
- Thin out the bait file by retaining only every Nth bait region. The
integer after the option specifies the step width N. If one of the modes
thin-b (thin-b-old), or thin-s (thin-s-old) is active, this option is
required, otherwise it is not allowed to set this parameter.
--ID-prefix <string>
- In the conversion mode to the four-column-upload file format, each
converted file should get a unique ProbeID prefix, since even among
multiple files, ProbeIDs are not allowed to be identical. With this option
the user is able to specify a prefix string to all probe IDs in the
four-column-upload file created by BaitFilter.
-S, --stats
- Compute bait file characteristics for the input file and report these.
This mode is automatically used for all modes specified with -m
option or the conversion mode specified with -c option. The purpose
of the -S option is to compute stats without having to filter or
convert the input file. In particular, the -S mode does not require
specifying an output file.
- This option has no effect if combined with the -m or -c
modes.
--verbosity <unsigned integer>
- The verbosity option controls the amount of information Bait-Filter writes
to the console while running. 0: Print only welcome message and essential
error messages that lead to exiting the program. 1: report also warnings,
2: report also progress, 3: report more detailed progress, >10: debug
output. Maximum 10000: write all possible diagnostic output. A value of 2
is required if startup parameters should be reported.
-b <string>, --blast-result-file
<string>
- Conducting a blast analysis of all baits against a reference genome can
take a long time. If different filtering parameters, e.g. different
coverage thresholds are to be compared, the same blast has to be done
multiple times. With this argument, the blast will be skipped and the
specified blast result file will be used. This option has to be used with
caution! No checks are done (so far) to ensure that the blast result file
corresponds to the specified bait file. If a BaitFilter run was conducted
which did a blast search, BaitFilter will not delete the blast result file
after the run was completed. The result file with the name
blast_result.txt will remain in the working directory. It can be moved or
renamed and with this option it can be specified as the input file for
further BaitFilter runs. If you have the slightest doubt whether you are
using the correct blast result file, you should not use this option. This
option is only allowed in modes that would normally do a blast search.
This option cannot be specified together with the blast-executable,
blast-evalue-cutoff, blast-extra-commandline, ref-blast-db options, since
these are options specific to runs in which a blast search is
conducted.
--, --ignore_rest
- Ignores the rest of the labeled arguments following this flag.
--version
- Displays version information and exits.
-h, --help
- Displays usage information and exits.
- The Bait-Filter program has been designed to post process the output of
the BaitFisher program in order select appropriate bait regions and to
create the final bait set. BaitFilter offers several filtering and
conversion modes. If multiple filtering steps and a final conversion are
required, BaitFilter will have to be started multiple times and the output
of the different runs are used as input in the next step.
- The BaitFisher program designs baits for every locus for which a bait
design is possible for a full bait region. A bait region can start at
every nucleotide as long as the remaining sequence is long enough. This
output has to be reduced and the purpose of BaitFilter is to find for each
feature, gene or alignment the optimal locus or the optimal loci for the
bait regions. Before determining the locus with the fewest number of baits
or the largest sequence coverage, one might want to determine which baits
are expected to bind specifically in a given reference genome. This is
achieved by conducting a Blast search of the baits against a genome. Baits
which are highly similar to at least two loci of the genome can be
determined and their bait regions can be removed. The blast search result
can also be used to specify a minimum hit coverage of the baits in a bait
region against the reference genome. After removing bait regions at
inferior loci, the optimal bait region starting locus (start coordinate)
can be inferred with the aid of different criteria in a subsequent run of
BaitFilter. As input, BaitFilter requires a bait file generated by the
BaitFisher program or a BaitFile generated by a previous filtering run of
BaitFilter. This bait file is specified with the -i command line
parameter (see below). Furthermore, the user has to specify an output file
name with the -o parameter and a filter mode with the -m
parameter.
- To convert a file to final and uploadable output format, see the -c
option below.
- To compute a bait file statistics of an input file, see the -S
option below.
- The different filter modes provided by BaitFilter are the following:
- 1a) Retain only the best bait locus per alignment file. Criterion:
Minimize number of required baits.
- 1b) Retain only the best bait locus per alignment file. Criterion:
Maximize number of sequences.
- 2a) Retain only best bait locus per feature (requires that features were
selected in BaitFisher). Criterion: Minimize number of required
baits.
- 2b) Retain only best bait locus per feature (requires that features were
selected in BaitFisher). Criterion: Maximize number of sequences.
- 3) Use a blast search of the bait sequences against a reference genome to
detect putative non-unique target loci. Non unique target sites will have
multiple good hits against the reference genome. Furthermore, a minimum
coverage of the best blast hit of bait sequence against the genome can be
specified. Note that all blast modes require additional command line
parameters! These modes remove bait regions for which multiple good blast
hits where found or for which baits have insufficiently long hits.
Different versions of this mode are available:
- 3a) If a single bait is not unique, remove all bait regions from the
current gene.
- 3b) If a single bait is not unique, remove all bait regions from the
current feature (if applicable).
- 3c) If a single bait is not unique, remove only the bait region that
contains this bait.
- 4) Thin out the given bait file: Retain only every Nth bait region, where
N has to be specified by the user. Two submodes are available:
- 4a) Thin out bait regions by retaining only every Nth bait region in a
bait file. The starting offset will by chosen such that the number of
required baits is minimized.
- 4b) Thin out bait regions by retaining only every Nth bait region in a
bait file. The starting offset will by chosen such that the number of
sequences the result is based on is maximized.
Welcome to Bait-Filter, version 1.0.6.
./BaitFilter-v1.0.6 version: 1.0.6
The full documentation for BaitFilter-v1.0.6 is maintained
as a Texinfo manual. If the info and BaitFilter-v1.0.6
programs are properly installed at your site, the command
- info BaitFilter-v1.0.6
should give you access to the complete manual.