sfetch retrieves the sequence named seqname from a
sequence database.
Which database is used is controlled by the -d and
-D options, or "little databases" and "big
databases". The directory location of "big databases" can be
specified by environment variables, such as $SWDIR for Swissprot, and $GBDIR
for Genbank (see -D for complete list). A complete file path must be
specified for "little databases". By default, if neither option is
specified and the name looks like a Swissprot identifier (e.g. it has a _
character), the $SWDIR environment variable is used to attempt to retrieve
the sequence seqname from Swissprot.
A variety of other options are available which allow retrieval of
subsequences (-f,-t); retrieval by accession number instead of by
name (-a); reformatting the extracted sequence into a variety of
other formats (-F); etc.
If the database has been SSI indexed, sequence retrieval will be
extremely efficient; else, retrieval may be painfully slow (the entire
database may have to be read into memory to find seqname). SSI
indexing is recommended for all large or permanent databases. The program
sindex creates SSI indexes for any sequence file.
sfetch was originally named getseq, and was renamed
because it clashed with a GCG program of the same name.
- -a
- Interpret seqname as an accession number, not an identifier.
- -d
<seqfile>
- Retrieve the sequence from a sequence file named <seqfile>.
If a GSI index <seqfile>.gsi exists, it is used to speed up
the retrieval.
- -f
<from>
- Extract a subsequence starting from position <from>, rather
than from 1. See -t. If <from> is greater than
<to> (as specified by the -t option), then the
sequence is extracted as its reverse complement (it is assumed to be
nucleic acid sequence).
- -h
- Print brief help; includes version number and summary of all options,
including expert options.
- -o
<outfile>
- Direct the output to a file named <outfile>. By default,
output would go to stdout.
- -r
<newname>
- Rename the sequence <newname> in the output after extraction.
By default, the original sequence identifier would be retained. Useful,
for instance, if retrieving a sequence fragment; the coordinates of the
fragment might be added to the name (this is what Pfam does).
- -t <to>
- Extract a subsequence that ends at position <to>, rather than
at the end of the sequence. See -f. If <to> is less
than <from> (as specified by the -f option), then the
sequence is extracted as its reverse complement (it is assumed to be
nucleic acid sequence)
- -D
<database>
- Retrieve the sequence from the main sequence database coded
<database>. For each code, there is an environment variable
that specifies the directory path to that database. Recognized codes and
their corresponding environment variables are -Dsw (Swissprot,
$SWDIR); -Dpir (PIR, $PIRDIR); -Dem (EMBL, $EMBLDIR);
-Dgb (Genbank, $GBDIR); -Dwp (Wormpep, $WORMDIR); and
-Dowl (OWL, $OWLDIR). Each database is read in its native flatfile
format.
- -F
<format>
- Reformat the extracted sequence into a different format. (By default, the
sequence is extracted from the database in the same format as the
database.) Available formats are embl, fasta, genbank, gcg, strider,
zuker, ig, pir, squid, and raw.
- --informat
<s>
- Specify that the sequence file is in format <s>, rather than
the default FASTA format. Common examples include Genbank, EMBL, GCG, PIR,
Stockholm, Clustal, MSF, or PHYLIP; see the printed documentation for a
complete list of accepted format names. This option overrides the default
format (FASTA) and the -B Babelfish autodetection option.
afetch(1), alistat(1), compalign(1),
compstruct(1), revcomp(1), seqsplit(1),
seqstat(1), shuffle(1), sindex(1), sreformat(1),
stranslate(1), weight(1).
Biosquid and its documentation are Copyright (C) 1992-2003
HHMI/Washington University School of Medicine Freely distributed under the
GNU General Public License (GPL) See COPYING in the source code distribution
for more details, or contact me.
Sean Eddy
HHMI/Department of Genetics
Washington University School of Medicine
4444 Forest Park Blvd., Box 8510
St Louis, MO 63108 USA
Phone: 1-314-362-7666
FAX : 1-314-362-2157
Email: eddy@genetics.wustl.edu