NAME

hhsearch - search a database of HMMs with a query alignment or query HMM

SYNOPSIS

hhsearch -i query -d database [options]

DESCRIPTION

HHsearch 3.3.0 Search a database of HMMs with a query alignment or query HMM (c) The HH-suite development team Steinegger M, Meier M, Mirdita M, V??hringer H, Haunsberger S J, and S??ding J (2019) HH-suite3 for fast remote homology detection and deep protein annotation. BMC Bioinformatics, doi:10.1186/s12859-019-3019-7

-i <file>: input/query multiple sequence alignment (a2m, a3m, FASTA) or HMM

<file> may be 'stdin' or 'stdout' throughout. Options:

-d <name>: database name (e.g. uniprot20_29Feb2012) Multiple databases may be specified with '-d <db1> -d <db2> ...'
-e: [0,1] E-value cutoff for inclusion in result alignment (def=0.001)

Input alignment format:

-M a2m: use A2M/A3M (default): upper case = Match; lower case = Insert;

: '-' = Delete; '.' = gaps aligned to inserts (may be omitted)

-M first: use FASTA: columns with residue in 1st sequence are match states
-M [0,100]: use FASTA: columns with fewer than X% gaps are match states
-tags/-notags: do NOT / do neutralize His-, C-myc-, FLAG-tags, and trypsin recognition sequence to background distribution (def=-notags)

Output options:

-o <file>: write results in standard format to file (default=<infile.hhr>)
-oa3m <file>: write result MSA with significant matches in a3m format

-blasttab <name> write result in tabular BLAST format (compatible to -m 8 or -outfmt 6 output)

1: 2 3 4 5 6 7 8 9 10 11 12

: query target #match/tLen alnLen #mismatch #gapOpen qstart qend tstart tend eval score

-opsi <file>: write result MSA of significant matches in PSI-BLAST format
-ohhm <file>: write HHM file for result MSA of significant matches
-add_cons: generate consensus sequence as master sequence of query MSA (default=don't)
-hide_cons: don't show consensus sequence in alignments (default=show)
-hide_pred: don't show predicted 2ndary structure in alignments (default=show)
-hide_dssp: don't show DSSP 2ndary structure in alignments (default=show)
-show_ssconf: show confidences for predicted 2ndary structure in alignments
-Ofas <file>: write pairwise alignments in FASTA xor A2M (-Oa2m) xor A3M (-Oa3m) format
-seq <int>: max. number of query/template sequences displayed (default=1)
-aliw <int>: number of columns per line in alignment list (default=80)
-p [0,100]: minimum probability in summary and alignment list (default=20)
-E [0,inf[: maximum E-value in summary and alignment list (default=1E+06)
-Z <int>: maximum number of lines in summary hit list (default=500)
-z <int>: minimum number of lines in summary hit list (default=10)
-B <int>: maximum number of alignments in alignment list (default=500)
-b <int>: minimum number of alignments in alignment list (default=10)

Filter options applied to query MSA, database MSAs, and result MSA

-all: show all sequences in result MSA; do not filter result MSA
-id: [0,100] maximum pairwise sequence identity (def=90)
-diff [0,inf[: filter MSAs by selecting most diverse set of sequences, keeping at least this many seqs in each MSA block of length 50 Zero and non-numerical values turn off the filtering. (def=100)
-cov: [0,100] minimum coverage with master sequence (%) (def=0)
-qid: [0,100] minimum sequence identity with master sequence (%) (def=0)
-qsc: [0,100] minimum score per column with master sequence (default=-20.0)
-neff [1,inf]: target diversity of multiple sequence alignment (default=off)
-mark: do not filter out sequences marked by ">@"in their name line

HMM-HMM alignment options:

-norealign: do NOT realign displayed hits with MAC algorithm (def=realign)
-ovlp <int>: banded alignment: forbid <ovlp> largest diagonals |i-j| of DP matrix (def=0)
-mact [0,1[: posterior prob threshold for MAC realignment controlling greediness at alignment ends: 0:global >0.1:local (default=0.35)
-glob/-loc: use global/local alignment mode for searching/ranking (def=local)
-realign: realign displayed hits with max. accuracy (MAC) algorithm
-excl <range>: exclude query positions from the alignment, e.g. '1-33,97-168'
-realign_max <int>: realign max. <int> hits (default=500)
-alt <int>: show up to this many alternative alignments with raw score > smin(def=4)
-smin <float>: minimum raw score for alternative alignments (def=20.0)
-shift [-1,1]: profile-profile score offset (def=-0.03)
-corr [0,1]: weight of term for pair correlations (def=0.10)
-sc: <int> amino acid score (tja: template HMM at column j) (def=1)
0: = log2 Sum(tja*qia/pa) (pa: aa background frequencies)
1: = log2 Sum(tja*qia/pqa) (pqa = 1/2*(pa+ta) )
2: = log2 Sum(tja*qia/ta) (ta: av. aa freqs in template)
3: = log2 Sum(tja*qia/qa) (qa: av. aa freqs in query)
5: local amino acid composition correction
-ssm {0,..,4}: 0: no ss scoring
1,2: ss scoring after or during alignment: [default=2]

: 3,4: ss scoring after or during alignment, predicted vs. predicted

-ssw [0,1]: weight of ss score (def=0.11)
-ssa [0,1]: SS substitution matrix = (1-ssa)*I + ssa*full-SS-substition-matrix [def=1.00)
-wg: use global sequence weighting for realignment!

Gap cost options:

-gapb [0,inf[: Transition pseudocount admixture (def=1.00)
-gapd [0,inf[: Transition pseudocount admixture for open gap (default=0.15)
-gape [0,1.5]: Transition pseudocount admixture for extend gap (def=1.00)
-gapf ]0,inf]: factor to increase/reduce gap open penalty for deletes (def=0.60)
-gapg ]0,inf]: factor to increase/reduce gap open penalty for inserts (def=0.60)
-gaph ]0,inf]: factor to increase/reduce gap extend penalty for deletes(def=0.60)
-gapi ]0,inf]: factor to increase/reduce gap extend penalty for inserts(def=0.60)
-egq: [0,inf[ penalty (bits) for end gaps aligned to query residues (def=0.00)
-egt: [0,inf[ penalty (bits) for end gaps aligned to template residues (def=0.00)

Pseudocount (pc) options:

: Context specific hhm pseudocounts:

-pc_hhm_contxt_mode {0,..,3}: position dependence of pc admixture 'tau' (pc mode, default=2)
0: no pseudo counts:: tau = 0
1: constant: tau = a

: 2: diversity-dependent: tau = a/(1+((Neff[i]-1)/b)^c) 3: CSBlast admixture: tau = a(1+b)/(Neff[i]+b) (Neff[i]: number of effective seqs in local MSA around column i)

-pc_hhm_contxt_a: [0,1] overall pseudocount admixture (def=0.9)
-pc_hhm_contxt_b: [1,inf[ Neff threshold value for mode 2 (def=4.0)
-pc_hhm_contxt_c: [0,3] extinction exponent c for mode 2 (def=1.0)

: Context independent hhm pseudocounts (used for templates; used for query if contxt file is not available):

-pc_hhm_nocontxt_mode {0,..,3}: position dependence of pc admixture 'tau' (pc mode, default=2)
0: no pseudo counts:: tau = 0
1: constant: tau = a

: 2: diversity-dependent: tau = a/(1+((Neff[i]-1)/b)^c) (Neff[i]: number of effective seqs in local MSA around column i)

-pc_hhm_nocontxt_a: [0,1] overall pseudocount admixture (def=1.0)
-pc_hhm_nocontxt_b: [1,inf[ Neff threshold value for mode 2 (def=1.5)
-pc_hhm_nocontxt_c: [0,3] extinction exponent c for mode 2 (def=1.0)

: Context-specific pseudo-counts:

-nocontxt: use substitution-matrix instead of context-specific pseudocounts

-contxt <file> context file for computing context-specific pseudocounts (default=)

-csw: [0,inf] weight of central position in cs pseudocount mode (def=1.6)
-csb: [0,1] weight decay parameter for positions in cs pc mode (def=0.9)

Other options:

-v <int>: verbose mode: 0:no screen output 1:only warnings 2: verbose (def=2)
-cpu <int>: number of CPUs to use (for shared memory SMPs) (default=2)

-scores <file> write scores for all pairwise comparisons to file

-atab: <file> write all alignments in tabular layout to file
-maxseq <int>: max number of input rows (def=65535)
-maxres <int>: max number of HMM columns (def=20001)

-maxmem [1,inf[ limit memory for realignment (in GB) (def=3.0)

Example: hhsearch -i a.1.1.1.a3m -d scop70_1.71

Download databases from <http://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/hhsuite_dbs/>.

February 2023

hhsearch 3.3.0+ds