BP_GENBANK_REF_EXTRACTOR(1p) | User Contributed Perl Documentation | BP_GENBANK_REF_EXTRACTOR(1p) |
bp_genbank_ref_extractor - Retrieves all related sequences for a list of searches on Entrez gene
version 1.77
bp_genbank_ref_extractor [options] [Entrez Gene Queries]
This script searches on Entrez Gene database and retrieves not only the gene sequence but also the related transcript and protein sequences.
The gene UIDs of multiple searches are collected before attempting to retrieve them so each gene will only be analyzed once even if appearing as result on more than one search.
Note that by default no sequences are saved (see options and examples).
Several options can be used to fine tune the script behaviour. It is possible to obtain extra base pairs upstream and downstream of the gene, control the naming of files and genome assembly to use.
See the section bugs for problems when using default values of options.
Note that if not using 'accession' is possible for files to be overwritten. It is possible for the same gene to encode more than one protein or different proteins to have the same description.
Currently only CSV is supported.
Saving the data structure as a CSV file, requires the installation of the Text::CSV module.
Note that if not using 'accession' is possible for files to be overwritten. It is possible for the same gene to have more than one transcript or different transcripts to have the same description. Also, non-coding transcripts will create problems if using 'protein'.
bp_genbank_ref_extractor \ --transcripts=accession \ '"homo sapiens"[organism] AND H2B'
Search Entrez Gene with the query '"homo sapiens"[organism] AND H2B' and save their transcripts sequences only. Note that default value of --limit may only extract some of the hits.
bp_genbank_ref_extractor \ --transcripts=accession --proteins=accession \ --format=fasta \ '"homo sapiens"[organism] AND H2B' \ '"homo sapiens"[organism] AND MCPH1'
Save both transcript and protein sequences in the fasta format, for two queries, '"homo sapiens"[organism] AND H2B' and '"homo sapiens"[organism] AND MCPH1'.
bp_genbank_ref_extractor \ --genes --down=500 --up=100 \ '"homo sapiens"[organism] AND H2B'
Download genomic sequences, including 500 bp downstream and 100 bp upstream of each gene.
bp_genbank_ref_extractor \ --genes --asembly='Alternate HuRef' \ '"homo sapiens"[organism] AND H2B'
Download genomic sequences from the Alternate HuRef genome assembly.
bp_genbank_ref_extractor --save-data=CSV \ '"homo sapiens"[organism] AND H2B'
Do not save any sequence, only save the results in a CSV file.
bp_genbank_ref_extractor --save='search-results' \ --genes=name downstream=500 --upstream=200 \ --nopseudo --nonnon-coding --transcripts --proteins \ --format=fasta --save-data=CSV \ '"homo sapiens"[organism] AND H2B' \ '"homo sapiens"[organism] AND MCPH1'
Ignoring non-coding and pseudo genes, downloads: genomic sequences with 500 and 200 bp downstream and upstream respectively, using the gene name as filename; transcript and proteins sequences using their accession number as filename; everything in fasta format plus a CSV file with search results; saved in a directory named search-results
bp_genbank_ref_extractor --transcripts \ 'H2A AND homo sapiens'
we mean to search for 'H2A AND homo sapiens' saving only the transcripts and using the default as base for the filename. However, the search terms will be interpreted as the base for the filenames (but since it's not a valid identifier, it will return an error). To prevent this, you can either specify the values:
bp_genbank_ref_extractor --transcripts='accession' \ 'H2A AND homo sapiens'
or you can use the double hash to stop processing options. Note that this should only be used after the last option. All arguments supplied after the double dash will be interpreted as search terms
bp_genbank_ref_extractor --transcripts \ -- 'H2A AND homo sapiens'
a-z 0-9 - + . , () {} []'
User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to the Bioperl mailing list. Your participation is much appreciated.
bioperl-l@bioperl.org - General discussion https://bioperl.org/Support.html - About the mailing lists
Please direct usage questions or support issues to the mailing list: bioperl-l@bioperl.org rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address it. Please include a thorough description of the problem with code and data examples if at all possible.
Report bugs to the Bioperl bug tracking system to help us keep track of the bugs and their resolution. Bug reports can be submitted via the web:
https://github.com/bioperl/bio-eutilities/issues
Carnë Draug <carandraug+dev@gmail.com>
This software is copyright (c) 2011-2015 by Carnë Draug.
This software is available under the GNU General Public License, Version 3, June 2007.
2020-03-13 | perl v5.30.0 |