STATES(1) | STATES | STATES(1) |
states - awk alike text processing tool
states [-hvV] [-D var=val] [-f file] [-o outputfile] [-p path] [-s startstate] [-W level] [filename ...]
States is an awk-alike text processing tool with some state machine extensions. It is designed for program source code highlighting and to similar tasks where state information helps input processing.
At a single point of time, States is in one state, each quite similar to awk's work environment, they have regular expressions which are matched from the input and actions which are executed when a match is found. From the action blocks, states can perform state transitions; it can move to another state from which the processing is continued. State transitions are recorded so states can return to the calling state once the current state has finished.
The biggest difference between states and awk, besides state machine extensions, is that states is not line-oriented. It matches regular expression tokens from the input and once a match is processed, it continues processing from the current position, not from the beginning of the next input line.
States program files can contain on start block, startrules and namerules blocks to specify the initial state, state definitions and expressions.
The start block is the main() of the states program, it is executed on script startup for each input file and it can perform any initialization the script needs. It normally also calls the check_startrules() and check_namerules() primitives which resolve the initial state from the input file name or the data found from the beginning of the input file. Here is a sample start block which initializes two variables and does the standard start state resolving:
start {
a = 1;
msg = "Hello, world!";
check_startrules ();
check_namerules (); }
Once the start block is processed, the input processing is continued from the initial state.
The initial state is resolved by the information found from the startrules and namerules blocks. Both blocks contain regular expression - symbol pairs, when the regular expression is matched from the name of from the beginning of the input file, the initial state is named by the corresponding symbol. For example, the following start and name rules can distinguish C and Fortran files:
namerules {
/\.(c|h)$/ c;
/\.[fF]$/ fortran; } startrules {
/-\*- [cC] -\*-/ c;
/-\*- fortran -\*-/ fortran; }
If these rules are used with the previously shown start block, states first check the beginning of input file. If it has string -*- c -*-, the file is assumed to contain C code and the processing is started from state called c. If the beginning of the input file has string -*- fortran -*-, the initial state is fortran. If none of the start rules matched, the name of the input file is matched with the namerules. If the name ends to suffix c or C, we go to state c. If the suffix is f or F, the initial state is fortran.
If both start and name rules failed to resolve the start state, states just copies its input to output unmodified.
The start state can also be specified from the command line with option -s, --state.
State definitions have the following syntax:
state { expr {statements} ... }
where expr is: a regular expression, special expression or symbol and statements is a list of statements. When the expression expr is matched from the input, the statement block is executed. The statement block can call states' primitives, user-defined subroutines, call other states, etc. Once the block is executed, the input processing is continued from the current intput position (which might have been changed if the statement block called other states).
Special expressions BEGIN and END can be used in the place of expr. Expression BEGIN matches the beginning of the state, its block is called when the state is entered. Expression END matches the end of the state, its block is executed when states leaves the state.
If expr is a symbol, its value is looked up from the global environment and if it is a regular expression, it is matched to the input, otherwise that rule is ignored.
The states program file can also have top-level expressions, they are evaluated after the program file is parsed but before any input files are processed or the start block is evaluated.
/usr/share/enscript/hl/*.st enscript's states definitions
Markku Rossi <mtr@iki.fi> <http://www.iki.fi/~mtr/>
GNU Enscript WWW home page: <http://www.iki.fi/~mtr/genscript/>
October 23, 1998 | STATES |