macsyfinder - detection of macromolecular systems in protein
datasets
macsyfinder [-h] [--sequence-db SEQUENCE_DB] [--db-type
{unordered_replicon,ordered_replicon,gembase,unordered}]
[--replicon-topology {linear,circular}] [--topology-file TOPOLOGY_FILE]
[--idx] [--inter-gene-max-space INTER_GENE_MAX_SPACE INTER_GENE_MAX_SPACE]
[--min-mandatory-genes-required MIN_MANDATORY_GENES_REQUIRED
MIN_MANDATORY_GENES_REQUIRED] [--min-genes-required MIN_GENES_REQUIRED
MIN_GENES_REQUIRED] [--max-nb-genes MAX_NB_GENES MAX_NB_GENES] [--multi-loci
MULTI_LOCI] [--hmmer HMMER_EXE] [--index-db INDEX_DB_EXE] [--e-value-search
E_VALUE_RES] [--i-evalue-select I_EVALUE_SEL] [--coverage-profile
COVERAGE_PROFILE] [-d DEF_DIR] [-o OUT_DIR] [-r RES_SEARCH_DIR]
[--res-search-suffix RES_SEARCH_SUFFIX] [--res-extract-suffix
RES_EXTRACT_SUFFIX] [-p PROFILE_DIR] [--profile-suffix PROFILE_SUFFIX] [-w
WORKER_NB] [-v] [--log LOG_FILE] [--config CFG_FILE] [--previous-run
PREVIOUS_RUN] systems [systems ...]
MacSyFinder is a program to model and detect macromolecular
systems, genetic pathways... in protein datasets. In prokaryotes, these
systems have often evolutionarily conserved properties: they are made of
conserved components, and are encoded in compact loci (conserved genetic
architecture). The user models these systems with MacSyFinder to reflect
these conserved features, and to allow their efficient detection
- systems
- The systems to detect. This is an obligatory option with no keyword
associated to it. To detect all the protein secretion systems and related
appendages: set to "all" (case insensitive). Otherwise, a single
or multiple systems can be specified. For example: "T2SS
T4P".
- --sequence-db
SEQUENCE_DB
- Path to the sequence dataset in fasta format.
- --db-type
{unordered_replicon,ordered_replicon,gembase,unordered}
- The type of dataset to deal with. "unordered_replicon"
corresponds to a non-assembled genome, "unordered" to a
metagenomic dataset, "ordered_replicon" to an assembled genome,
and "gembase" to a set of replicons where sequence identifiers
follow this convention: ">RepliconName SequenceID".
- --replicon-topology
{linear,circular}
- The topology of the replicons (this option is meaningful only if the
db_type is 'ordered_replicon' or 'gembase'.
- --topology-file
TOPOLOGY_FILE
- Topology file path. The topology file allows one to specify a topology
(linear or circular) for each replicon (this option is meaningful only if
the db_type is 'ordered_replicon' or 'gembase'. A topology file is a
tabular file with two columns: the 1st is the replicon name, and the 2nd
the corresponding topology: "RepliconA linear"
- --idx
- Forces to build the indexes for the sequence dataset even if they were
presviously computed and present at the dataset location (default =
False)
- --inter-gene-max-space
INTER_GENE_MAX_SPACE INTER_GENE_MAX_SPACE
- Co-localization criterion: maximum number of components non-matched by a
profile allowed between two matched components for them to be considered
contiguous. Option only meaningful for 'ordered' datasets. The first value
must match to a system, the second to a number of components. This option
can be repeated several times: "--inter-gene-max-space T2SS 12
--inter-gene-max-space Flagellum 20"
- --min-mandatory-genes-required
MIN_MANDATORY_GENES_REQUIRED MIN_MANDATORY_GENES_REQUIRED
- The minimal number of mandatory genes required for system assessment. The
first value must correspond to a system name, the second value to an
integer. This option can be repeated several times:
"--minmandatory-genes-required T2SS 15
--min-mandatorygenes-required Flagellum 10"
- --min-genes-required
MIN_GENES_REQUIRED MIN_GENES_REQUIRED
- The minimal number of genes required for system assessment (includes both
'mandatory' and 'accessory' components). The first value must correspond
to a system name, the second value to an integer. This option can be
repeated several times: "--min-genesrequired T2SS 15
--min-genes-required Flagellum 10"
- --max-nb-genes
MAX_NB_GENES MAX_NB_GENES
- The maximal number of genes required for system assessment. The first
value must correspond to a system name, the second value to an integer.
This option can be repeated several times: "--max-nb-genes T2SS 5
--max-nb-genes Flagellum 10
- --multi-loci
MULTI_LOCI
- Allow the storage of multi-loci systems for the specified systems. The
systems are specified as a comma separated list (--multi-loci
sys1,sys2) default is False
- -w WORKER_NB, --worker
WORKER_NB
- Number of workers to be used by MacSyFinder. In the case the user wants to
run MacSyFinder in a multithread mode. (0 mean all cores will be used,
default 1)
- -v,
--verbosity
- Increases the verbosity level. There are 4 levels: Error messages
(default), Warning (-v), Info (-vv) and
Debug.(-vvv)
- --log LOG_FILE
- Path to the directory where to store the 'macsyfinder.log' log file.
- --config
CFG_FILE
- Path to a putative MacSyFinder configuration file to be used.
- --previous-run
PREVIOUS_RUN
- Path to a previous MacSyFinder run directory. It allows one to skip the
Hmmer search step on same dataset, as it uses previous run results and
thus parameters regarding Hmmer detection. The configuration file from
this previous run will be used. (conflict with options --config,
--sequence-db, --profile-suffix, --resextract-suffix,
--e-value-res, --db-type, --hmmer)
For more details, visit the MacSyFinder website and see the
MacSyFinder documentation.