metaphlan2_strainer - METAgenomic PHyLogenetic ANalysis for
metagenomic taxonomic profiling (strainer)
metaphlan2_strainer.py [-h] --ifn_samples
IFN_SAMPLES [IFN_SAMPLES ...] --mpa_pkl MPA_PKL --output_dir
OUTPUT_DIR [--ifn_markers IFN_MARKERS] [--nprocs_main NPROCS_MAIN]
[--nprocs_load_samples NPROCS_LOAD_SAMPLES] [--nprocs_align_clean
NPROCS_ALIGN_CLEAN] [--nprocs_raxml NPROCS_RAXML] [--bootstrap_raxml
BOOTSTRAP_RAXML] [--ifn_ref_genomes IFN_REF_GENOMES [IFN_REF_GENOMES ...]]
[--N_in_marker N_IN_MARKER] [--marker_strip_length MARKER_STRIP_LENGTH]
[--marker_in_clade MARKER_IN_CLADE] [--sample_in_clade SAMPLE_IN_CLADE]
[--sample_in_marker SAMPLE_IN_MARKER] [--gap_in_trailing_col
GAP_IN_TRAILING_COL] [--gap_trailing_col_limit GAP_TRAILING_COL_LIMIT]
[--gap_in_internal_col GAP_IN_INTERNAL_COL] [--gap_in_sample GAP_IN_SAMPLE]
[--N_col N_COL] [--N_count N_COUNT] [--long_gap_length LONG_GAP_LENGTH]
[--long_gap_percentage LONG_GAP_PERCENTAGE] [--p_value P_VALUE] [--clades
CLADES [CLADES ...]] [--marker_list_fn MARKER_LIST_FN] [--print_clades_only]
[--alignment_program {muscle,mafft}] [--relaxed_parameters]
[--relaxed_parameters2] [--keep_alignment_files]
[--keep_full_alignment_files] [--save_sample2fullfreq] [--use_threads]
Metaphlan2_strainer is a computational tool for tracking
individual strains across large set of samples. The input of
metaphlan2_strainer is a set of metagenomic samples and the output is a set
of phylogenetic. For each sample, metaphlan2_strainer extracts the strain of
a specific species by merging and concatenating all reads mapped against
that species markers in the MetaPhlAn2 database.
- -h, --help
- show this help message and exit
- --ifn_samples
IFN_SAMPLES [IFN_SAMPLES ...]
- The list of sample files (space separated).The wildcard can also be
used.
- --mpa_pkl
MPA_PKL
- The database of metaphlan3.py.
- --output_dir
OUTPUT_DIR
- The output directory.
- --ifn_markers
IFN_MARKERS
- The marker file in fasta format.
- --nprocs_main
NPROCS_MAIN
- The number of processors are used for the main threads. Default 1.
- --nprocs_load_samples
NPROCS_LOAD_SAMPLES
- The number of processors are used for loading samples. Default
nprocs_main.
- --nprocs_align_clean
NPROCS_ALIGN_CLEAN
- The number of processors are used for aligning and cleaning markers.
Default nprocs_main.
- --nprocs_raxml
NPROCS_RAXML
- The number of processors are used for running raxml. Default
nprocs_main.
- --bootstrap_raxml
BOOTSTRAP_RAXML
- The number of runs for bootstraping when building the tree. Default
0.
- --ifn_ref_genomes
IFN_REF_GENOMES [IFN_REF_GENOMES ...]
- The reference genome file names. They are separated by spaces.
- --N_in_marker
N_IN_MARKER
- The consensus markers with the rate of N nucleotides greater than this
threshold are removed. Default 0.2.
- --marker_strip_length
MARKER_STRIP_LENGTH
- The number of nucleotides will be deleted from each of two ends of a
marker. Default 50.
- --marker_in_clade
MARKER_IN_CLADE
- In each sample, the clades with the rate of present markers less than this
threshold are removed. Default 0.8.
- --sample_in_clade
SAMPLE_IN_CLADE
- Only clades present in at least sample_in_clade samples are kept. Default
2.
- --sample_in_marker
SAMPLE_IN_MARKER
- If the percentage of samples that a marker present in is less than this
threshold, that marker is removed. Default 0.8.
- --gap_in_trailing_col
GAP_IN_TRAILING_COL
- If the number of the trailing nucleotide columns in aligned markers with
the percentage of gaps greater than gap_in_trailing_col is less than
gap_trailing_col_limit, these columns will be removed. Default 0.2.
- --gap_trailing_col_limit
GAP_TRAILING_COL_LIMIT
- If the number of the trailing nucleotide columns in aligned markers with
the percentage of gaps greater than gap_in_trailing_col is less than
gap_trailing_col_limit, these columns will be removed. Default 101.
- --gap_in_internal_col
GAP_IN_INTERNAL_COL
- The internal nucleotide columns in aligned markers with the percentage of
gaps greater than gap_in_internal_col will be removed. Default 0.3.
- --gap_in_sample
GAP_IN_SAMPLE
- The samples with full sequences from all markers and having the percentage
of gaps greater than this threshold will be removed. Default 0.2.
- --N_col
N_COL
- In aligned markers, if the percentage of nucleotide columns containing
more than N_count Ns less than this threshold, these columns will be
removed. Default 0.8.
- --N_count
N_COUNT
- In aligned markers, if the percentage of nucleotide columns containing
more than N_count Ns less than N_col threshold, these columns will be
removed. Default 0.
- --long_gap_length
LONG_GAP_LENGTH
- In each concatenated sequence of a sample, sequential gap positions is a
gap group. A gap group with length greater than this threshold is
considered as a long gap group. If the ratio between the number of unique
positions in all long gap groups and the concatenated sequence length is
less than long_gap_percentage, these positions will be removed from all
concatenated sequences. Default 2.
- --long_gap_percentage
LONG_GAP_PERCENTAGE
- Combining this threshold with long_gap_length to removed long gaps.
Default 0.8.
- --p_value
P_VALUE
- The p_value to reject a non-polymorphic site.Default 0.05.
- --clades CLADES
[CLADES ...]
- The clades (space separated) for which the script will compute the marker
alignments in fasta format and the phylogenetic trees. If a file name is
specified, the clade list in that file where each clade name is on a line
will be read.Default "automatically identify all clades".
- --marker_list_fn
MARKER_LIST_FN
- The file name containing the list of considered markers. The other markers
will be discarded. Default "None".
- --print_clades_only
- Only print the potential clades and stop without building any tree. This
option is useful when you want to check quickly all possible clades and
rerun only for some specific ones. Default "False".
- --alignment_program
{muscle,mafft}
- The alignment program. Default "muscle".
- --relaxed_parameters
- Set marker_in_clade=0.5, sample_in_marker=0.5, N_in_marker=0.5,
gap_in_sample=0.5. Default "False".
- --relaxed_parameters2
- Set marker_in_clade=0.2, sample_in_marker=0.2, N_in_marker=0.8,
gap_in_sample=0.8. Default "False".
- --keep_alignment_files
- Keep the alignment files of all markers before cleaning step.
- --keep_full_alignment_files
- Keep the alignment files of all markers before truncating the starting and
ending parts, and cleaning step. This is equivalent to
--keep_alignment_files --marker_strip_length 0
- --save_sample2fullfreq
- Save sample2fullfreq to a msgpack file sample2fullfreq.msgpack.
- --use_threads
- Use multithreading. Default "Use multiprocessing".
This manpage was written by Andreas Tille for the Debian
distribution and can be used for any other usage of the program.