scoary - pangenome-wide association studies
scoary [-h] [-t TRAITS] [-g GENES] [-n NEWICKTREE] [-s START_COL]
[--delimiter DELIMITER] [-r RESTRICT_TO] [-o OUTDIR] [-u] [-p P_VALUE_CUTOFF
[P_VALUE_CUTOFF ...]] [-c [{I,B,BH,PW,EPW,P} [{I,B,BH,PW,EPW,P} ...]]] [-m
MAX_HITS] [--include_input_columns GRABCOLS] [-w] [--no-time] [-e PERMUTE]
[--no_pairwise] [--collapse] [--threads THREADS] [--test] [--citation]
[--version]
- -t TRAITS, --traits
TRAITS
- Input trait table (comma-separated-values). Trait presence is indicated by
1, trait absence by 0. Assumes strain names in the first column and trait
names in the first row
- -g GENES, --genes
GENES
- Input gene presence/absence table (comma-separatedvalues) from ROARY.
Strain names must be equal to those in the trait table
- -n NEWICKTREE,
--newicktree NEWICKTREE
- Supply a custom tree (Newick format) for phylogenetic analyses instead
instead of calculating it internally.
- -s START_COL,
--start_col START_COL
- On which column in the gene presence/absence file do individual strain
info start. Default=15. (1-based indexing)
- --delimiter
DELIMITER
- The delimiter between cells in the gene presence/absence and trait files,
as well as the output file.
- -r RESTRICT_TO,
--restrict_to RESTRICT_TO
- Use if you only want to analyze a subset of your strains. Scoary will read
the provided comma-separated table of strains and restrict analyzes to
these.
- -o OUTDIR, --outdir
OUTDIR
- Directory to place output files. Default = .
- -u,
--upgma_tree
- This flag will cause Scoary to write the calculated UPGMA tree to a newick
file
- -p P_VALUE_CUTOFF
[P_VALUE_CUTOFF ...], --p_value_cutoff P_VALUE_CUTOFF [P_VALUE_CUTOFF
...]
- P-value cut-off / alpha level. For Fishers, Bonferronis, and
Benjamini-Hochbergs tests, SCOARY will not report genes with higher
p-values than this. For empirical p-values, this is treated as an alpha
level instead. I.e. 0.02 will filter all genes except the lower and upper
percentile from this test. Run with "-p 1.0" to report all
genes. Accepts standard form (e.g. 1E-8). Provide a single value (applied
to all) or exactly as many values as correction criteria and in
corresponding order. (See example under correction). Default = 0.05
- -c [{I,B,BH,PW,EPW,P}
[{I,B,BH,PW,EPW,P} ...]], --correction [{I,B,BH,PW,EPW,P}
[{I,B,BH,PW,EPW,P} ...]]
- Apply the indicated filtration measure. Allowed values are I, B, BH, PW,
EPW, P. I=Individual (naive) p-value. B=Bonferroni adjusted p-value.
BH=BenjaminiHochberg adjusted p. PW=Best (lowest) pairwise comparison.
EPW=Entire range of pairwise comparison p-values. P=Empirical p-value from
permutations. You can enter as many correction criteria as you would like.
These will be associated with the p_value_cutoffs you enter. For example
"-c I EPW -p 0.1 0.05" will apply the following cutoffs:
Naive p-value must be lower than 0.1 AND the entire range of pairwise
comparison values are below 0.05 for this gene. Note that the empirical
p-values should be interpreted at both tails. Therefore, running "-c
P -p 0.05" will apply an alpha of 0.05 to the empirical
(permuted) p-values, i.e. it will filter everything except the upper and
lower 2.5 percent of the distribution. Default = Individual p-value.
(I)
- -m MAX_HITS,
--max_hits MAX_HITS
- Maximum number of hits to report. SCOARY will only report the top max_hits
results per trait
- --include_input_columns
GRABCOLS
- Grab columns from the input Roary file. and puts them in the output.
Handles comma and ranges, e.g. --include_input_columns 4,6,8,16-23.
The special keyword ALL will include all relevant input columns in the
output
- -w,
--write_reduced
- Use with -r if you want Scoary to create a new gene presence
absence file from your reduced set of isolates. Note: Columns 1-14 (No.
sequences, Avg group size nuc etc) in this file do not reflect the reduced
dataset. These are taken from the full dataset.
- --no-time
- Output file in the form TRAIT.results.csv, instead of TRAIT_TIMESTAMP.csv.
When used with the -w argument will output a reduced gene matrix in
the form gene_presence_absence_reduced.csv rather than
gene_presence_absence_reduced_TIMESTAMP.csv
- -e PERMUTE, --permute
PERMUTE
- Perform N number of permutations of the significant results post-analysis.
Each permutation will do a label switching of the phenotype and a new
p-value is calculated according to this new dataset. After all N
permutations are completed, the results are ordered in ascending order,
and the percentile of the original result in the permuted p-value
distribution is reported.
- --no_pairwise
- Do not perform pairwise comparisons. Inthis mode, Scoary will perform
population structure-naive calculations only. (Fishers test, ORs etc).
Useful for summary operations and exploring sets. (Genes unique in groups,
intersections etc) but not causal analyses.
- --collapse
- Add this to collapse correlated genes (genes that have identical
distribution patterns in the sample) into merged units.
- --threads
THREADS
- Number of threads to use. Default = 1
- --test
- Run Scoary on the test set in exampledata, overriding all other
parameters.
- --citation
- Show citation information, and exit.
- --version
- Display Scoary version, and exit.
by Ola Brynildsrud (olbb@fhi.no)
This manpage was written by Andreas Tille for the Debian
distribution and can be used for any other usage of the program.