lambda2_mkindexp - the Local Aligner for Massive Biological
DatA
lambda2 mkindexp [OPTIONS] -d DATABASE.fasta [-i
INDEX.lambda]
Lambda is a local aligner optimized for many query sequences and
searches in protein space. It is compatible to BLAST, but much faster than
BLAST and many other comparable tools.
Detailed information is available in the wiki:
<https://github.com/seqan/lambda/wiki>
This is the indexer_binary for creating lambda-compatible
databases.
- -h, --help
- Display the help message.
- -hh,
--full-help
- Display the help message with advanced options.
- --version
- Display version information.
- --copyright
- Display long copyright information.
- -v, --verbosity
INTEGER
- Display more/less diagnostic output during operation: 0 [only errors]; 1
[default]; 2 [+run-time, options and statistics]. In range [0..2].
Default: 1.
- -d, --database
INPUT_FILE
- Database sequences. Valid filetypes are: .sam[.*], .raw[.*],
.gbk[.*], .frn[.*], .fq[.*], .fna[.*],
.ffn[.*], .fastq[.*], .fasta[.*], .faa[.*],
.fa[.*], .embl[.*], and .bam, where * is any of the
following extensions: gz, bz2, and bgzf for
transparent (de)compression.
- -m, --acc-tax-map
INPUT_FILE
- An NCBI or UniProt accession-to-taxid mapping file. Download from
ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/ or
ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/
. Valid filetypes are: .dat[.*] and .accession2taxid[.*],
where * is any of the following extensions: gz, bz2, and
bgzf for transparent (de)compression.
- -x, --tax-dump-dir
INPUT_DIRECTORY
- A directory that contains nodes.dmp and names.dmp; unzipped from
ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz
- -i, --index
OUTPUT_DIRECTORY
- The output directory for the index files (defaults to
"DATABASE.lambda"). Valid filetype is: .lambda.
- --db-index-type
STRING
- Suffix array or full-text minute space. One of fm and bifm.
Default: fm.
- --truncate-ids
BOOL
- Truncate IDs at first whitespace. This saves a lot of space and is
irrelevant for all LAMBDA output formats other than BLAST Pairwise (.m0).
One of 1, ON, TRUE, T, YES, 0,
OFF, FALSE, F, and NO. Default:
on.
- -a, --input-alphabet
STRING
- Alphabet of the database sequences (specify to override auto-detection);
if input is Dna, it will be translated. One of auto, dna5,
and aminoacid. Default: auto.
- -g, --genetic-code
INTEGER
- The translation table to use if input is Dna. See
https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c for ids
(default is generic). Default: 1.
- -r,
--alphabet-reduction STRING
- Alphabet Reduction for seeding phase. One of none and
murphy10. Default: murphy10.
- --algorithm
STRING
- Algorithm for SA construction (also used for FM; see Memory Requirements
below!). One of mergesort, quicksortbuckets,
quicksort, radixsort, and skew7ext. Default:
radixsort.
- -t, --threads
INTEGER
- number of threads to run concurrently. Default: autodetected.
- --tmp-dir
OUTPUT_DIRECTORY
- temporary directory used by skew, defaults to working directory.
Please see the wiki (<https://github.com/seqan/lambda/wiki>)
for more information on which indexes to chose and which algorithms to
pick.
Note that the indexes created are binary and not compatible
between different CPU endiannesses. Also the on-disk format is still subject
to change between Lambda versions.
lambda2 mkindexp Copyright: 2013-2019 Hannes Hauswedell,
released under the GNU AGPL v3 (or later); 2016-2019 Knut Reinert and Freie
Universität Berlin, released under the 3-clause-BSDL
SeqAn Copyright: 2006-2015 Knut Reinert, FU-Berlin; released under the
3-clause BSDL.
In your academic works please cite: Hauswedell et al (2014); doi:
10.1093/bioinformatics/btu439
For full copyright and/or warranty information see --copyright.