MMORPH(1) | General Commands Manual | MMORPH(1) |
mmorph - MULTEXT morphology tool
information:
mmorph [ -vh ]
parse only:
mmorph -y | -z [ -a addfile ]
-m morphfile [ -d debug_map ] [ -l
logfile ] [ infile [ outfile ]]
generate:
mmorph -c | -n [ -t trace_level ] [
-s trace_level ] [ -a addfile ]
-m morphfile [ -d debug_map ] [ -l
logfile ] [ infile [ outfile ]]
simple lookup:
mmorph [ -fi ] [ -b | -k ] [ -r
rejectfile ]
-m morphfile [ -d debug_map ] [ -l
logfile ] [ infile [ outfile ]]
record/field lookup:
mmorph -C classes [ -fU ] [ -E | -O
] [ -b | [ -k ] [ -B class ]]
-m morphfile [ -d debug_map ] [ -l
logfile ] [ infile [ outfile ]]
dump database:
mmorph -p | -q
-m morphfile [ -d debug_map ] [ -l
logfile ] [ infile [ outfile ]]
In the simplest mode of operation, with just the -m morphfile option, mmorph operates in lookup mode: it will open an existing database called morphfile.db and lookup all the string segments (usually corresponding to words) in the input.
To create the database from the lexical entries specified in "morphfile", use -c -m morphfile. The file morphfile.db should not exist. When the database is complete it will lookup the segments in the input. If used ineractively (input and output is a terminal), a prompt is printed when the program expects the user to type a segment string. No prompting occurs in record/field mode.
To test the rule applications on the lexical entries specified in morphfile, without creating a database and without looking up segments, use -n -m morphfile. This automatically sets the trace level to 1 if it was not specified.
In order to do the same operations as above, but on the alternate set of lexical entries in addfile, use the extra option -a addfile. The lexical entries in morphfile will be ignored. This is useful when making additions to a standard morphological description. Be aware that entries added to the database morphfile.db do not replace existing ones.
Use the -n option. In the Grammar section, specify goal rules that will match the desired results. In the Lexicon section specify the lexical items you want to test. When running all rules will be applied (recursively) to the lexical items, if the rule is a goal, then the result of the application is printed on the output.
Suggestion: Put the two parts mentioned above (goal rules and Lexicon section) in separate files and reference these files with an #include directive where they should occur in the main input file.
If you are using an existing description and want to test only new lexical entries, use the options -n -a addfile, and put the lexical entries in addfile.
bit decimal hexadecimal purpose no bits 0 0x0 no debug option (default) 1 1 0x1 debug initialisation 2 2 0x2 debug yacc parsing 3 4 0x4 debug rule combination 4 8 0x8 debug spelling application 5 16 0x10 print statistics with -p or -q options all bits -1 0xffff all debug options whatever they areTo combine options add the decimal or hexadecimal values together. Example: -t 0x5 specifies bits (options) 1 and 4.
For a detailed account of the principles and mechanisms used in mmorph, please refer to the documents cited in the SEE ALSO section below.
Briefly sketched, morphosyntactic descriptions written for mmorph describe how words are constructed by the concatenation of morphemes, and how this concatenation process changes the spelling of these morphemes. The first part, the word structure grammar, is specified by restricted context free rewrite rules whose formalism is inspired by unification based systems (cf. Shieber 1986). The second part, the spelling changes, is specified by spelling rules in a formalism based on the two level model of morphology. This approach to morphology is described in Ritchie, Russell et. al, 1992 and more concisely in Pulman and Hepple 1993.
To decide which characters are displayable on the output, mmorph uses the language specific description that setlocale(3) sets according to the environment variable LC_CTYPE. For the languages that are dealt with in MULTEXT it is a good idea to have that variable set to iso_8859_1.
Here is a summary of the common usage of mmorph options:
mmorph -n -m morphfile
mmorph -c -m morphfile
mmorph -m morphfile
mmorph -m morphfile -a addfile
Error messages should be self explanatory. Please refer to mmorph(5) for a formal description of the syntax.
G. Russell and D. Petitpierre, MMORPH - The Multext Morphology Program, Version 2.3, October1995, MULTEXT deliverable report for task 2.3.1.
Ritchie, G. D., G.J. Russell, A.W. Black and S.G. Pulman (1992), Computational Morphology: Practical Mechanisms for the English Lexicon, Cambridge Mass., MIT Press.
Pulman, S.G. and M.R. Hepple, (1993) ``A feature-based formalism for two level phonology: a description and implementation'', Computer Speech and Language 7, pp.333-358.
Shieber, S.M. (1986), An Introduction to Unification-Based Approaches to Grammar, CSLI Lecture Notes Number 4, Stanford University
Dominique Petitpierre, ISSCO, <petitp@divsun.unige.ch>
The parser for the morphology description formalism was written using yacc(1) and flex(1). Flex was written by Vern Paxson, <vern@ee.lbl.gov>, and is distributed in the framework of the GNU project under the condition of the GNU General Public License
The database module in the current version uses the db library package developed at the University of California, Berkeley by Margo Seltzer, Keith Bostic <bostic@cs.berkeley.edu> and Ozan Yigit.
The crc procedures used for taking a signature of the typed feature structure declarations are taken from the fingerprint package by Daniel J. Bernstein and use code written by Gary S. Brown.
Version 2.3, October 1995 |