PSCAN(1) | General Commands Manual | PSCAN(1) |
pscan - detection of transcription factor binding sites in DNA sequences
pscan -q multifastafile -p multifastafile [options]
pscan -p multifastafile [options]
pscan -q multifastafile -M matrixfile [options]
Pscan inspects the upstream non-coding regions of many genes to derive subsequences that are characteristic for the binding of proteins, i.e. transcription factors, that control the tissue- and situation-dependent expression of a gene. The tool is supported by the JASPAR database and other data that is downloadable from the tool's home page.
The command line tool pscan is meant for bulk submission. The tool is also offered with a web interface that has all auxillary data updated.
pscan options only have single dashes (`-') and (with notable exceptions) followed by a single letter. Options are case-sensitive. A summary of options is included below.
The sequences to be used with Pscan have to be promoter sequences. To obtain meaningful results it's critical that the background and the foreground sequences are consistent between them either in size and in position (with respect to the transcription start site). For optimal results the foreground set should be a subset of the background set.
If the "-l" option is not used Pscan will try to find Jaspar/Transfac matrix files in the current folder. Jaspar files have ".pfm" extension while Transfac ones have ".pro" extension. If Jaspar matrix files are used than a file called "matrix_list.txt" must be present in the same folder. That file contains required info about the matrices in the ".pfm" files.
1) pscan -p human_450_50.fasta -bi
This command will scan the file "human_450_50.fasta" using the matrices in the current folder. It is handy to use that command the first time one uses a set of matrices with a given background sequences file. A file called human_450_50.short_matrix will be written and it can be used from now on every time you want to use the same background sequences with the same set of matrices. A file called human_450_50.index will be written too and it will be useful every time you will use the same background file.
2) pscan -q human_nfy_targets.fasta -m human_450_50.short_matrix -ui human_450_50.index
This command will scan the file human_nfy_targets.fasta searching for over-represented binding sites (with respect to the preprocessed background contained in the "human_450_50.short_matrix" file) using the matrices in the current folder. Please note that the query file "human_nfy_targets.fasta" must be a subset of the sequences contained in the background file "human_450_50.fasta" in order to use the index file with the "-ui" option. This means that both the sequences and their FASTA headers used in the query file must appear in the background file as well. Using the "-ui" option when the sequences contained in the query file are not a subset of the background file will have undefined/unpredictable outcomes. The output will be a file called "human_nfy_targets.fasta.res" where you will find all the used matrices sorted by ascending P-value. The lower the P-value obtained by a matrix, the higher are the chances that the transcription factor associated to that matrix is a regulator of the input promoter sequences. The fields of the output are the following: "Transcription Factor Name", "Matrix ID", "Z Score", "Pvalue", "Foreground Average", "Background Average".
3) pscan -q human_nfy_targets.fasta -M MA0108.pfm
This command will scan the sequences file "human_nfy_targets.fasta" using the matrix contained in "MA0108.pfm". The result will be written in a file called "human_nfy_targets.fasta.ris" where you will find the sequences in input sorted by a descending score (between 1 and 0). The higher the score, the better is the oligo found with respect to the used matrix. The fields of the output are the following: "Sequence Header", "Score", "Position from the end of sequence", "Oligo that obtained the score", "Strand where the oligo was found".
4) pscan -p human_450_50.fasta -bi -l matrixfile.wil
This command is like Example #1 with the difference that the matrices set to be used is the one contained in the "matrixfile.wil" file. Please look at the "example_matrix_file.wil" file included in this Pscan distribution to see the correct format for matrices file.
5) pscan -q human_nfy_targets.fasta -l matrixfile.wil -N MATRIX1
This command is like Example #3 but it will use the matrix called "MATRIX1" contained in the "matrixfile.wil" file.
For info on how Pscan works pleare refer to the paper.
May 3 2018 |