YAZ-ICU(1) | Commands | YAZ-ICU(1) |
yaz-icu - YAZ ICU utility
yaz-icu [-c config] [-p opt] [-s] [-x] [infile]
yaz-icu is a utility which demonstrates the ICU chain module of yaz. (yaz/icu.h).
The utility can be used in two ways. It may read some text using an XML configuration for configuring ICU and show text analysis. This mode is triggered by option -c which specifies the configuration to be used. The input file is read from standard input or from a file if infile is specified.
The utility may also show ICU information. This is triggered by option -p.
-c config
-p type
-s
-x
The ICU chain configuration specifies one or more rules to convert text data into tokens. The configuration format is XML based.
The toplevel element must be named icu_chain. The icu_chain element has one required attribute locale which specifies the ICU locale to be used in the conversion steps.
The icu_chain element must include elements where each element specifies a conversion step. The conversion is performed in the order in which the conversion steps are specified. Each conversion element takes one attribute: rule which serves as argument to the conversion step.
The following conversion elements are available:
casemap
l
u
t
f
display
transform
transliterate
tokenize
l
s
w
c
t
join
The following command analyzes text in file text using ICU chain configuration chain.xml:
cat text | yaz-icu -c chain.xml
The chain.xml might look as follows:
<icu_chain locale="en">
<transform rule="[:Control:] Any-Remove"/>
<tokenize rule="w"/>
<transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
<transliterate rule="xy > z;"/>
<display/>
<casemap rule="l"/> </icu_chain>
ICU Home[2]
ICU Transforms[1]
Index Data
01/14/2019 | YAZ 5.27.1 |