NAME

rockhopper - system for analyzing bacterial RNA-seq data (command line tool)

SYNOPSIS

rockhopper [options]

DESCRIPTION

rockhopper is a comprehensive and user-friendly system for computational analysis of bacterial RNA-seq data. As input, it takes RNA sequencing reads output by high-throughput sequencing technology (FASTQ, QSEQ, FASTA, SAM, or BAM files).

REQUIRED ARGUMENTS

exp1A.fastq,exp1B.fastq,exp1C.fastq exp2A.fastq,exp2B.fastq: a comma separated list of sequencing files (in FASTQ, QSEQ, FASTA, SAM, or BAM format) for replicate experiments, one list per experimental condition (mate-pair files should be delimited by '%')

REFERENCE BASED ASSEMBLY VS. DE NONO ASSEMBLY

If the -g option is used, then rockhopper aligns reads to one or more reference genomes, otherwise, rockhopper performs de novo transcript assembly.

-g <DIR1,DIR2>: a comma separated list of directories, each containing a genome file (*.fna), gene file (*.ptt), and rna file (*.rnt)

OPTIONAL ARGUMENTS FOR EITHER REFERENCE BASED ASSEMBLY OR DE NOVO ASSEMBLY

-c <boolean>: reverse complement single-end reads (default is false)
-ff, -fr, -rf, -rr: orientation of two mate reads for paired-end read, f=forward and r=reverse_complement (default is fr)
-d <integer>: maximum number of bases between mate pairs for paired-end reads (default is 500)
-a <boolean>: identify 1 alignment (true) or identify all optimal alignments (false), (default is true)
-p <integer>: number of processors (default is self-identification of processors)
-e <boolean>: compute differential expression for transcripts in pairs of experimental conditions (default is true)
-s <boolean>: RNA-seq experiments are strand specific (true) or strand ambiguous (false), (default is true)
-L <comma separated list>: labels for each condition
-o <DIR>: directory where output files are written (default is Rockhopper_Results/)
-v <boolean>: verbose output including raw/normalized counts aligning to each gene (default is false)
-SAM: output a SAM format file
-TIME: output time taken to execute program

OPTIONAL ARGUMENTS FOR REFERENCE BASED ASSEMBLY ONLY

-m <number>: allowed mismatches as percent of read length (default is 0.15)
-l <number>: minimum seed as percent of read length (default is 0.33)
-y <boolean>: compute operons (default is true)
-t <boolean>: identify transcript boundaries including UTRs and ncRNAs (default is true)
-z <number>: minimum expression of UTRs and ncRNAs, a number in range [0.0, 1.0] (default is 0.5)

OPTIONAL ARGUMENTS FOR DE NOVO ASSEMBLY ONLY

-k <integer>: size of k-mer, range of values is 15 to 31 (default is 25)
-j <integer>: minimum length required to use a sequencing read after trimming/processing (default is 35)
-n <integer>: size of k-mer hashtable is ~ 2^n (default is 25). HINT: should normally be 25 or, if more memory is available, 26. WARNING: if increased above 25 then more than 1.2M of memory must be allocated
-b <integer>: minimum number of full length reads required to map to a de novo assembled trancript (default is 20)
-u <integer>: minimum length of de novo assembled transcripts (default is 2*k)
-w <integer>: minimum count of k-mer to use it to seed a new de novo assembled transcript (default is 50)
-x <integer>: minimum count of k-mer to use it to extend an existing de novo assembled transcript (default is 5)

EXAMPLES

reference based assembly with single-end reads
% rockhopper <options> -g genome_DIR1,genome_DIR2 aerobic_replicate1.fastq,aerobic_replicate2.fastq anaerobic_replicate1.fastq,anaerobic_replicate2.fastq

de novo assembly with single-end reads
% rockhopper <options> aerobic_replicate1.fastq,aerobic_replicate2.fastq anaerobic_replicate1.fastq,anaerobic_replicate2.fastq