hfst-tokenize - =perform matching/lookup on text streams
hfst-tokenize [--segment | --xerox | --cg |
--giella-cg] [OPTIONS...] RULESET
perform matching/lookup on text streams
- -h, --help
- Print help message
- -V, --version
- Print version info
- -v, --verbose
- Print verbosely while processing
- -q, --quiet
- Only print fatal erros and requested output
- -s, --silent
- Alias of --quiet
- -n, --newline
- Newline as input separator (default is blank line)
- -a,
--print-all
- Print nonmatching text
- -w,
--print-weight
- Print weights (overrides earlier -W option)
- -W,
--no-weights
- Don't print weights (default; overrides earlier -w, or -w
implied by -g, options)
- -m,
--tokenize-multichar Tokenize multicharacter symbols
- (by default only one utf-8 character is tokenized at a time regardless of
what is present in the alphabet)
- -b,
--beam=B
- Output only analyses whose weight is within B from best result
- -tS,
--time-cutoff=S
- Limit search after having used S seconds per input
- -lN,
--weight-classes=N
- Output no more than N best weight classes (where analyses with equal
weight constitute a class
- -u, --unique
- Remove duplicate analyses
- -z, --segment
- Segmenting / tokenization mode (default)
- -i,
--space-separated
- Tokenization with one sentence per line, space-separated tokens
- -x, --xerox
- Xerox output
- -c, --cg
- Constraint Grammar output
- -S,
--superblanks
- Ignore contents of unescaped [] (cf. apertium-destxt); flush on NUL
- -g,
--giella-cg
- CG format used in Giella infrastructure (implies -w and -l2,
treats @PMATCH_INPUT_MARK@ as subreading separator, expects tags to be
Multichar_symbols, flush on NUL)
- -C --conllu
- CoNLL-U format
- -f, --finnpos
- FinnPos output
- -L, --visl
- VISL input and output (implies -W, handles <s> as blocks and
<STYLE> inline)
Use standard streams for input and output (for now).
Report bugs to <hfst-bugs@helsinki.fi> or directly to our
bug tracker at: <https://github.com/hfst/hfst/issues>
hfst-tokenize home page:
<https://kitwiki.csc.fi/twiki/bin/view/KitWiki//HfstTokenize>
General help using HFST software:
<https://kitwiki.csc.fi/twiki/bin/view/KitWiki//HfstHome>
Copyright © 2017 University of Helsinki, License GPLv3: GNU
GPL version 3 <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it. There is NO
WARRANTY, to the extent permitted by law.