gmap_build - Tool for genome database creation for GMAP or
GSNAP
gmap_build [options...] -d <genome>
[-c <transcriptome> -T <transcript_fasta>]
<genome_fasta_files>
gmap_build: Builds a gmap database for a genome to be used by GMAP
or GSNAP. Part of GMAP package, version 2021-12-17.
You are free to name <genome> and <transcriptome> as
you wish. You will use the same names when performing alignments
subsequently using GMAP or GSNAP.
Note: If adding a transcriptome to an existing genome, then there
is no need to specify the genome_fasta_files. This way you can add
transcriptome information to an existing genome database.
- -D,
--dir=STRING
- Destination directory for installation (defaults to gmapdb directory
specified at configure time)
- -d,
--genomedb=STRING
- Genome name (required)
- -n,
--names=STRING
- Substitute names for contigs, provided in a file.
- The file can have two formats:
- 1.
- A file with one column per line, with each line corresponding to a FASTA
file, in the order given to gmap_build. The chromosome name for each FASTA
file will be replaced with the desired chromosome name in the file. Every
chromosome in the FASTA must have a corresponding line in the file. This
is useful if you want to rename chromosomes with a systematic numbering
pattern.
- 2.
- A file with two columns per line, separated by white space. In each line,
the original FASTA chromosome name should be in column 1 and the desired
chromosome name will be in column 2.
- The meaning of file format 2 depends on whether --limit-to-names is
specified. If so, the genome build will be limited to those chromosomes in
this file. Otherwise, all chromosomes in the FASTA file will be included,
but only those chromosomes in this file will be re-named, which provides
an easy way to change just a few chromosome names.
- This file can be combined with the --sort=names option, in
which the order of chromosomes is that given in the file. In this case,
every chromosome must be listed in the file, and for chromosome names that
should not be changed, column 2 can be blank (or the same as column 1).
The option of a blank column 2 is allowed only when specifying
--sort=names, because otherwise, the program cannot
distinguish between a 1-column and 2-column names file.
- -L,
--limit-to-names
- Determines whether to limit the genome build to the lines listed in the
--names file. You can limit a genome build to certain chromosomes
with this option, plus a --names file that either renames
chromosomes, or lists the same names in both columns for the desired
chromosomes.
- -k,
--kmer=INT
- k-mer value for genomic index (allowed: 15 or less, default is 15)
- -q INT
- sampling interval for genomoe (allowed: 1-3, default 3)
- -s,
--sort=STRING
- Sort chromosomes using given method: none - use chromosomes as found in
FASTA file(s) (default) alpha - sort chromosomes alphabetically (chr10
before chr 1) numeric-alpha - chr1, chr1U, chr2, chrM, chrU, chrX, chrY
chrom - chr1, chr2, chrM, chrX, chrY, chr1U, chrU names - sort chromosomes
based on file provided to --names flag
- -g, --gunzip
- Files are gzipped, so need to gunzip each file first
- -E,
--fasta-pipe=STRING
- Interpret argument as a command, instead of a list of FASTA files
- -Q, --fastq
- Files are in FASTQ format
- -R, --revcomp
- Reverse complement all contigs
- -w INT
- Wait (sleep) this many seconds after each step (default 2)
- -o,
--circular=STRING
- Circular chromosomes (either a list of chromosomes separated by a comma,
or a filename containing circular chromosomes, one per line). If you use
the --names feature, then you should use the substitute name of the
chromosome, not the original name, for this option. (NOTE: This behavior
is different from previous versions, and starts with version
2020-10-20.)
- -2, --altscaffold=STRING
- File with alt scaffold info, listing alternate scaffolds, one per line,
tab-delimited, with the following fields: (1) alt_scaf_acc, (2)
parent_name, (3) orientation, (4) alt_scaf_start, (5) alt_scaf_stop, (6)
parent_start, (7) parent_end.
- -e,
--nmessages=INT
- Maximum number of messages (warnings, contig reports) to report (default
50)
- -M,
--mdflag=STRING
- Use MD file from NCBI for mapping contigs to chromosomal coordinates
- -C,
--contigs-are-mapped
- Find a chromosomal region in each FASTA header line. Useful for contigs
that have been mapped to chromosomal coordinates. Ignored if the
--mdflag is provided.