TOPPIC(1) | TopPIC suite | TOPPIC(1) |
toppic - Top-down mass spectrometry based Proteoform Identification and Characterization
TopPIC (Top-down mass spectrometry based Proteoform Identification and Characterization) identifies and characterizes proteoforms at the proteome level by searching top-down tandem mass spectra against a protein sequence database. TopPIC is a successor to MS-Align+. It efficiently identifies proteoforms with unexpected alterations, such as mutations and post-translational modifications (PTMs), accurately estimates the statistical significance of identifications, and characterizes reported proteoforms with unknown mass shifts. It uses several techniques, such as indexes, spectral alignment, generation function methods, and the modification identification score (MIScore), to increase the speed, sensitivity, and accuracy.
TopPIC outputs two comma separated value (csv) files, an xml file, and a collection of html files for identified proteoforms. For example, when the input data file is spectra_ms2.msalign, the output includes:
To browse identified proteins, proteoforms, and PrSMs, use a chrome browser to open the file spectrum_html/topview/index.html. Google Chrome is recommended (Firefox and IE are not recommended).
When the input contains two or more data files, TopPIC outputs two csv files, an xml file, and a collection of html files for each input file. When a file name is specified for combined identifications, it combines spectra and proteoforms identified from all the input files, removes redundant proteoform identifications, and reports two csv files, an xml file, and a collection of html files for the combined results. For example, when the input is spectra1_ms2.msalign and spectra2_ms2.msalign and the combined output file name is "combined," the output files are:
-h [ --help ] Print the help message.
-a [ --activation ] <CID|HCD|ETD|UVPD|FILE> Set the fragmentation method(s) of MS/MS spectra. When "FILE" is selected, the fragmentation methods of spectra are given in the input spectrum data file. Default value: FILE.
-f [ --fixed-mod ] <C57|C58|a fixed modification file> Set fixed modifications. Three available options: C57, C58, or the name of a text file containing the information of fixed modifications (see an example file). When C57 is selected, carbamidomethylation on cysteine is the only fixed modification. When C58 is selected, carboxymethylation on cysteine is the only fixed modification.
-n [ --n-terminal-form ] <a list of allowed N-terminal forms> Set N-terminal forms of proteins. Four N-terminal forms can be selected: NONE, NME, NME_ACETYLATION, and M_ACETYLATION. NONE stands for no modifications, NME for N-terminal methionine excision, NME_ACETYLATION for N-terminal acetylation after the initiator methionine is removed, and M_ACETYLATION for N-terminal methionine acetylation. When multiple forms are allowed, they are separated by commas. Default value: NONE,M_ACETYLATION,NME,NME_ACETYLATION.
-d [ --decoy ] Use a shuffled decoy protein database to estimate spectrum and proteoform level FDRs. When -d is chosen, a shuffled decoy database is automatically generated and appended to the target database before database search, and FDR rates are estimated using the target-decoy approach.
-e [ --mass-error-tolerance ] <a positive integer> Set the error tolerance for precursor and fragment masses in part-per-million (ppm). Default value: 15. When the lookup table approach (-l) is used for E-value estimation, valid error tolerance values are 5, 10, and 15 ppm.
-p [ --proteoform-error-tolerance ] <a positive number> Set the error tolerance for identifying PrSM clusters (in Dalton). Default value: 1.2 Dalton.
-M [ --max-shift ] <a number> Set the maximum value for unexpected mass shifts (in Dalton). Default value: 500.
-m [ --min-shift ] <a number> Se the minimum value for unexpected mass shifts (in Dalton). Default value: -500.
-s [ --num-shift ] <0|1|2> Set the maximum number of unexpected mass shifts in a PrSM. Default value: 1.
-t [ --spectrum-cutoff-type ] <EVALUE|FDR> Set the spectrum level cutoff type for filtering PrSMs. Default value: EVALUE.
-v [ --spectrum-cutoff-value ] <a positive number> Set the spectrum level cutoff value for filtering PrSMs. Default value: 0.01.
-T [ --proteoform-cutoff-type ] <EVALUE|FDR> Set the proteoform level cutoff type for filtering proteoforms and PrSMs. Default value: EVALUE.
-V [ --proteoform-cutoff-value ] <a positive number> Set the proteoform level cutoff value for filtering proteoforms and PrSMs. Default value: 0.01.
-l [ --lookup-table ] Use a lookup table method for computing p-values and E-values. It is faster than the default generating function approach, but it may reduce the number of identifications.
-r [ --num-combined-spectra ] <a positive integer> Set the number of combined spectra. The parameter is set to 2 (or 3) for combining spectral pairs (or triplets) generated by the alternating fragmentation mode. Default value: 1.
-i [ --mod-file-name ] <a common modification file> Specify a text file containing a list of common PTMs for proteoform characterization. The PTMs are used to identify and localize PTMs in reported PrSMs with unknown mass shifts. See an example file.
-H [ --miscore-threshold ] <a number between 0 and 1> Set the score threshold (MIScore) for filtering results of PTM characterization. Default value: 0.45.
-u [ --thread-number ] <a positive number> Set the number of threads used in the computation. Default value: 1. The maximum number of threads is determined by the CPU and memory of the computer used for computation. About 4 GB memory is required for each thread. If the computer has 16 GB memory and a CPU with 8 cores, the maximum number of threads is 4 because about 16 GB memory is needed for 4 threads.
-x [ --no-topfd-feature ] Specify that there are no TopFD feature files for proteoform identification.
-c [ --combined-file-name ] <a filename> Specify an output file name for combined identifications when the input consists of multiple spectrum files.
-k [ --keep ] Keep intermediate files generated by TopPIC.
toppic proteins.fasta spectra_ms2.msalign
toppic -c combined proteins.fasta spectra1_ms2.msalign spectra2_ms2.msalign
toppic proteins.fasta *_ms2.msalign
toppic -x proteins.fasta spectra_ms2.msalign
toppic -f C57 proteins.fasta spectra_ms2.msalign
toppic -s 2 -M 10000 proteins.fasta spectra_ms2.msalign
toppic -e 5 proteins.fasta spectra_ms2.msalign
toppic -d -t FDR -v 0.05 -T FDR -V 0.05 proteins.fasta spectra_ms2.msalign
toppic -r 3 proteins.fasta spectra_ms2.msalign
toppic -i common_mods.txt -H 0.3 proteins.fasta spectra_ms2.msalign
toppic -u 6 proteins.fasta spectra_ms2.msalign
This man page was written by Filippo Rusconi <lopippo@debian.org>. Material was taken from http://proteomics.informatics.iupui.edu/software/toppic/manual.html.
Filippo Rusconi <lopippo@debian.org> and upstream authors (Dr. Xiaowen Liu's Lab at Indiana University-Purdue University Indianapolis and others)
Filippo Rusconi and Indiana University-Purdue University Indianapolis
20200521 | 1 |