DOKK / manpages / debian 13 / ncbi-entrez-direct / transmute.1.en

NCBI Entrez Direct User's Manual

NAME

transmute - transform data, particularly within NCBI Entrez Direct

SYNOPSIS

transmute -x2p|-j2p|-f2p

transmute -align [-a codes] [-g N] [-h N] [-w N]

transmute -a2x [-set tag] [-rec tag]

transmute -t2x|-c2x|-s2x (tbl2xml / csv2xml / scn2xml) [-set tag] [-rec tag] [-skip N] [-header] [-lower|-upper] [-indent|-flush] columnName1 ...

transmute -i2x (ini2xml)

transmute -m2x (toml2xml)

transmute -y2x (yaml2xml)

transmute -txf (filter-record) [-pattern str] [-exclude str] [-require str] [-min N] [-max N]

transmute -f2x (fsa2xml)

transmute -g2x (gbf2xml)

transmute -g2r (gbf2ref)

transmute -r2p (ref2pmid) [-options confirm|verbose|fast|slow|exact ...]

transmute -gbf (filter-genbank) [-accession acc|-accessions file] [-taxid id|-taxids file] [-organism name] [-exclude str] [-require str] [-truncate]

transmute -revcomp

transmute -remove [-first N] [-last N]

transmute -retain -leading N-trailing N

transmute -replace -offset N|-column N [-delete N] [-insert seq] [-lower]

transmute -extract [-1-based|-0-based] [-lower] feat_loc

transmute -cds2prot [-gcode N] [-frame N|-all] [-stop] [-trim] [-part5] [-part3] [-every] [-between str] [-circular] [-orf] [-max N]

transmute -molwt [-met|-fmet]

transmute -hgvs

transmute -counts

transmute -diff

transmute -codons -nuc seq -prot seq [-frame N] [-three]

transmute -search [-protein] [-circular] [-top] pattern ...

transmute -find [-relaxed] [-sensitive] [-whole] pattern ...

transmute -encodeXML|-decodeXML|-plainXML

transmute -encodeURL|-decodeURL

transmute -encode64|-decode64

transmute -plain

transmute -upper|-lower

transmute -aa1to3|-aa3to1

transmute -relax

transmute -format [fmt] [-xml declaration] [-doctype declaration] [-comment] [-cdata] [-combine] [-self] [-unicode style] [-script style] [-mathml terse]

transmute -filter element action target

transmute -normalize database

DESCRIPTION

transmute reads data from standard input, transforms it according to the specified mode, and writes the transformed data to standard output.

OPTIONS

Pretty-Printing

-x2p: Reformat XML.
-j2p: Reformat JSON.
-f2p: Reformat FASTA.
-align: Table column alignment.

-a codes: Column alignment codes:

l: Left.
c: Center.
r: Right.
n: Numeric align on decimal point.
N: Trailing zero-pad decimals.
z: Leading zero-pad integers.
m: Commas to group by 3 digits.
M: Commas plus zero-pad decimals.
w: Just print colum widths.

-g N: Spacing between columns.
-h N: Indentation before columns.
-w N: Minimum column width.

Data Conversion

-j2x: Convert JSON stream to XML suitable for -path navigation.

-set tag: Replace set wrapper tag.
-rec tag: Replace record wrapper tag.
-nest flat|recurse|plural|singular|depth|element: Nested array naming policy.

-a2x: Convert text ASN.1 stream to XML suitable for -path navigation.

-set tag: Replace set wrapper tag.
-rec tag: Replace record wrapper tag.

-t2x, -c2x, -s2x: Convert tab-delimited table, comma-separated values file, or semicolon-delimited table, respectively, to XML.

-set tag: Replace set wrapper tag.
-rec tag: Replace record wrapper tag.
-skip N: Skip the first N lines.
-header: Use fields from first row for column names.
-lower: Convert text to lowercase.
-upper: Convert text to uppercase.
-indent: Indent XML output.
-flush: Do not indent XML output.
columnName1 ...: XML object names per column.

-i2x: Convert .ini configuration file format to XML.
-m2x: Convert TOML configuration file format to XML.
-y2x: Convert YAML configuration file format to XML.
-txf: Text file filtering.

-pattern str: Pattern at start of record.
-exclude str: Reject if string is present.
-require str: Require presence of string.
-min N: Minimum record number.
-max N: Maximum record number.

-f2x: Convert a FASTA stream to corresponding XML.
-g2x: Convert GenBank/GenPept flatfile format to INSDSeq XML.
-g2r: Convert GenBank/GenPept flatfile format to Reference XML.
-r2p [-options option ...]: Reference Index XML lookup to find PMIDs. Supported option values:

confirm: Recheck existing PMID claims.
verbose: Add NOTE nodes explaining reasoning.
fast: Prefilter candidates relatively heavily (default).
slow: Prefilter candidates less heavily.
exact: Require exact, unique title matches.

-gbf: GenBank/GenPept filtering.

-accession acc: Single accession.
-accessions file: File of accessions.
-taxid id: Single taxon identifier.
-taxids file: File of taxon identifiers.
-organism name: Organism scientific name.
-exclude str: Reject if string is present.
-require str: Require presence of string.
-truncate: Remove features and sequence.

Sequence Editing

-revcomp: Reverse complement nucleotide sequence.
-remove: Trim at ends of sequence.

-first N: Delete first N bases or residues.
-last N: Delete last N bases or residues.

-retain: Save either end of sequence.

-leading N: Keep first N bases or residues.
-trailing N: Keep last N bases or residues.

-replace: Apply base or residue substition.

-offset N: Skip ahead by 0-based count (SPDI), or
-column N: Move just before 1-based position (HGVS).
-delete N: Delete N bases or residues.
-insert seq: Insert given sequence.
-lower: Lower-case original sequence.

-extract: Use xtract -insd ... feat_location instructions.

-1-based: GenBank feat_location convention.
-0-based: Alignment, or -insd feat_intervals.
-lower: Lower-case extracted sequence.
feat_loc: Feature location.

Sequence Processing

-cds2prot: Translate coding region into protein.

-gcode N: Genetic code (1 by default).
-frame N: Offset in sequence (0-based).
-stop: Include stop residue.
-trim: Remove trailing Xs and *s.
-part5: CDS partial at 5' end.
-part3: CDS extends past 3' end.
-every: Translate all codons.
-between str: Optional string between residues.
-all: Simultaneous six-frame translations.
-circular: Reprocess first two priming bases at end.
-orf: Only capitalize residues at start states.
-max N: Number of residues per line.

-molwt: Calculate molecular weight of peptide.

-met: Do not cleave leading methionine.
-fmet: Retain leading formyl-methionine.

Variation Processing

-hgvs: Convert Human Genome Variation Society variation format to XML.

Sequence Comparison

-counts: Print summary of base or residue counts.
-diff: Compare two aligned files for point differences.
-codons: Display nucleotide codons above amino acid residues.

-nuc seq: Nucleotide sequence.
-prot seq: Protein sequence.
[-frame N]: Offset in nucleotide sequence.
[-three]: Use three-letter residue abbreviations.

Sequence Searching

-search: Search for one or more patterns in a sequence, skipping any FASTA definition line (with a leading >). Each pattern can have an optional alias, e.g., GGATCC:BamHI.

-protein: Do not expand nucleotide ambiguity characters.
-circular: Match patterns spanning the origin of a circular molecule.
-top: Do not search reverse complements of non-palindromic patterns.
pattern: Pattern to search for.

Text Searching

-find: Find one or more patterns in text, allowing digits, spaces, punctuation, and phrases, e.g., "double, double toil and trouble".

-relaxed: Match on words with letters and digits, ignoring spacing and punctuation.
-sensitive: Case-sensitive match, distinguishing upper-case and lower-case letters.
-whole: Match on whole words or multi-word phrases; implies -relaxed.
pattern: Pattern to search for.

String Transformations

XML

-encodeXML: XML-encode <, >, &, ", and ' characters.
-decodeXML: Decode XML entity references.
-plainXML: Remove embedded mixed-content tags and compress runs of spaces.

URL

-encodeURL: Compress runs of spaces, and URI-escape the result.
-decodeURL: URI-unescape the input.

Base64

-encode64: Base64-encode the input.
-decode64: Base64-decode the input.

Accent

-plain: Strip accents from the input.

Case

-upper: Convert the input to uppercase.
-lower: Convert the input to lowercase.

Protein

-aa1to3: Convert amino acids from 1-character to 3-character format.
-aa3to1: Convert amino acids from 3-character to 1-character format.

Letters plus Digits

-relax: Remove all punctuation and compress whitespace.

Customized XML Reformatting

-format [fmt]

compact: Compress runs of spaces.
flush: Suppress line indentation.
indent: Indent according to nesting depth.
expand: Place each attribute on a separate line.

-xml declaration: Use the given XML declaration.
-doctype declaration: Use the given document type declaration.
-comment: Preserve comments.
-cdata: Preserve cdata blocks.
-combine: If the input contains multiple top-level documents, combine them.
-self: Keep empty self-closing tags.
-unicode style: How to handle Unicode superscript and subscript digits (first converted to ASCII form in all cases).

fuse: Run them all together, with no additional markup.
space: Add spaces between digits in different positions.
period: Add periods between digits in different positions.
brackets: Surround superscripts by square brackets and subscripts by parentheses.
markdown: Surround superscripts with carets and subscripts with tildes.
slash: Add backslashes when going up in height and forward slashes when going down.
tag: Put superscripts in XML sup elements and subscripts in sub elements.

-script style: How to handle XML sup and sub elements (denoting superscripts and subscripts, respectively).

brackets: Surround superscripts by square brackets and subscripts by parentheses.
markdown: Surround superscripts with carets and subscripts with tildes.

-mathml terse: Flatten MathML markup tersely.

XML Modification

-filter element action target: Actions:

retain: Keep matching elements (no-op).
remove: Remove matching elements.
encode: HTML-escape special characters.
decode: Decode HTML escapes.
shrink: Compress runs of spaces.
expand: Place each attribute on a separate line.
accent: Strip off Unicode accents.

Targets:

content: Plain-text content.
cdata: CDATA blocks.
comment: Comments.
object: The whole object.
attributes: Attributes.
container: Start and end tags.

EFetch XML Normalization

-normalize database: Adjust XML fields to conform to common conventions.

2025-05-26

NCBI