xtract - convert XML into a table of data values
xtract [-help] [-strict] [-mixed]
[-accent] [-ascii] [-compress] [-stops]
[-input filename]
[-transform filename]
[-pattern expr] [-group expr]
[-block expr] [-subset expr]
[-if expr [constraint]]
[-unless expr [constraint]]
[-and condition] [-or condition]
[-else] [-position pos]
[-select condition] [-equals str]
[-contains str] [-is-within str]
[-starts-with str]
[-ends-with str] [-is-not str]
[-gt N] [-ge N]
[-lt N] [-le N]
[-eq N] [-ne N]
[-ret str] [-tab str]
[-sep str] [-pfx str]
[-sfx str] [-plg str]
[-elg str] [-rst] [-clr]
[-pfc str] [-deq str]
[-wrp tag] [-def str]
[-lbl str] [-element element]
[-first element] [-last element]
[-NAME] [-num element]
[-len element] [-sum element]
[-min element] [-max element]
[-inc element] [-dec element]
[-sub element] [-avg element]
[-dev element] [-med element]
[-bin element] [-bit element]
[-encode element] [-upper element]
[-lower element] [-title element]
[-year element]
[-translate element]
[-terms element] [-words element]
[-pairs element]
[-reverse element]
[-letters element]
[-clauses element]
[-indices element] [-e2index] [-revcomp]
[-nucleic] [-0-based element]
[-1-based element]
[-ucsc-based element]
[-insd arg ...] [-head str]
[-tail str] [-hd str]
[-tl str] [-format fmt]
[-unicode style] [-script style]
[-mathml terse] [-filter element
action target] [-verify] [-outline]
[-synopsis] [-skip filename] [-examples]
[-version]
xtract converts an XML document into a table of data values
according to user-specified rules.
- -strict
- Remove HTML and MathML tags.
- -mixed
- Allow mixed content XML.
- -accent
- Delete Unicode accents and diacritical marks.
- -ascii
- Convert Unicode to numeric HTML character entities.
- -compress
- Compress runs of spaces.
- -stops
- Retain stop words in selected phrases.
- -ret str
- Override line break between patterns.
- -tab str
- Replace tab character between fields.
- -sep str
- Separator between group members.
- -pfx str
- Prefix to print before group.
- -sfx str
- Suffix to print after group.
- -plg str
- Prologue to print once before elements.
- -elg str
- Epilogue to print once after elements.
- -rst
- Reset -sep through -elg.
- -clr
- Clear queued tab separator.
- -pfc str
- Preface combines -clr and -pfx.
- -deq str
- Delete and replace queued tab separator.
- -wrp tag
- Wrap elements in XML object.
- -def str
- Default placeholder for missing fields.
- -lbl str
- Insert arbitrary text.
- -revcomp
- Reverse-complement nucleotide sequence.
- -nucleic
- Subrange determines forward or revcomp.
- -insd arg ...
- Generate INSDSeq extraction commands. Print them if invoked standalone;
run them if invoked as part of a pipeline. Requires one or more arguments,
which may appear in the following order:
- -format fmt
- clean
- copy
- Fast block copy (still applies processing flags).
- compact
- Compress runs of spaces.
- flush
- Suppress line indentation.
- indent
- Indent according to nesting depth.
- expand
- Place each attribute on a separate line.
- -unicode style
- How to handle Unicode superscript and subscript digits (first converted to
ASCII form in all cases).
- fuse
- Run them all together, with no additional markup.
- space
- Add spaces between digits in different positions.
- period
- Add periods between digits in different positions.
- brackets
- Surround superscripts by square brackets and subscripts by
parentheses.
- markdown
- Surround superscripts with carets and subscripts with tildes.
- slash
- Add backslashes when going up in height and forward slashes when going
down.
- tag
- Put superscripts in XML sup elements and subscripts in sub
elements.
- -script style
- How to handle XML sup and sub elements (denoting
superscripts and subscripts, respectively).
- brackets
- Surround superscripts by square brackets and subscripts by
parentheses.
- markdown
- Surround superscripts with carets and subscripts with tildes.
- -mathml terse
- Flatten MathML markup tersely.
- -outline
- Display outline of XML structure.
- -synopsis
- Display count of unique XML paths.
- -help
- Print usage information and some example argument combinations.
- -examples
- Complete examples of edirect(1) and xtract usage.
- -version
- Print version number.
String constraints use case-insensitive comparisons.
Numeric constraints and selection arguments use integer
values.
-num and -len selections are synonyms for Object
Count (#) and Item Length (%).
-words, -pairs, and -indices convert to lower
case.