cd-hit-2d-para.pl - divide a big clustering job into pieces to run
cd-hit-2d or cd-hit-est-2d jobs
cd-hit-2d-para.pl options
- This script divide a big clustering job into pieces and submit jobs to
remote computers over a network to make it parallel. After all the jobs
finished, the script merge the clustering results as if you just run a
single cd-hit-2d or cd-hit-est-2d.
- You can also use it to divide big jobs on a single computer if your
computer does not have enough RAM (with -L option).
- 1 When run this script over a network, the directory where you
- run the scripts and the input files must be available on all the remote
hosts with identical path.
- 2 If you choose "ssh" to submit jobs, you have to have
- passwordless ssh to any remote host, see ssh manual to know how to set up
passwordless ssh.
- 3 I suggest to use queuing system instead of ssh,
- I currently support PBS and SGE
- 4 cd-hit-2d cd-hit-est-2d cd-hit-div cd-hit-div.pl must be
- in same directory where this script is in.
Options
- -i
- input filename for 1st db in fasta format, required
-i2 input filename for 2nd db in fasta format,
required
- -o
- output filename, required
- --P
- program, "cd-hit-2d" or "cd-hit-est-2d", default
"cd-hit-2d"
- --B
- filename of list of hosts, requred unless -Q or -L option is
supplied
- --L
- number of cpus on local computer, default 0 when you are not running it
over a cluster, you can use this option to divide a big clustering jobs
into small pieces, I suggest you just use "--L 1" unless you
have enough RAM for each cpu
- --S
- Number of segments to split 1st db into, default 2
--S2 Number of segments to split 2nd db into, default
8
- --Q
- number of jobs to submit to queue queuing system, default 0 by default,
the program use ssh mode to submit remote jobs
- --T
- type of queuing system, "PBS", "SGE" are supported,
default PBS
- --R
- restart file, used after a crash of run
- -h
- print this help
More cd-hit-2d/cd-hit-est-2d options can be speicified in command
line
- Questions, bugs, contact Weizhong Li at liwz@sdsc.edu