NAME

flexc++ - Generate a C++ scanner class and parsing function

SYNOPSIS

flexc++ [options] rules-file

DESCRIPTION

Flexc++(1) was designed after flex(1) and flex++(1). Like these latter two programs flexc++ generates code performing pattern-matching on text, possibly executing actions when certain regular expressions are recognized.

Flexc++, contrary to flex and flex++, generates code that is explicitly intended for use by C++ programs. The well-known flex(1) program generates C source-code and flex++(1) merely offers a C++-like shell around the yylex function generated by flex(1) and hardly supports present-day ideas about C++ software development.

Contrary to this, flexc++ creates a C++ class offering a predefined member function lex matching input against regular expressions and possibly executing C++ code once regular expressions were matched. The code generated by flexc++ is pure C++, allowing its users to apply all of the features offered by that language.

Not every aspect of flexc++ is covered by the man-pages. In addition to what’s summarized by the man-pages the flexc++ manual offers a chapter covering pre-loading of input lines (allowing you to, e.g, display lines in which errors are observed even though not all of the line’s tokens have already been scanned), as well as a chapter covering technical documentation about the inner working of flexc++.

Before version 2.08.00 the lexical scanner’s specification file (e.g., lexer) could be split into several files using //include directives, but //include directives required that all files were specified relative to the location of the lexer file itself. E.g., if inc/part1 had to include part2, available in the same (inc) directory as part1, then part1 had to specify //include inc/part2 instead of merely //include part2.

Starting with version 2.08.00 //include directives use the directories of the files containing these directories as the current directory. In the provided example part1 should simply contain //include part2. See the flexc++ flexc++api(3) and flexc++input(7) man-pages for details.

Flexc++ offers several man-pages. These man-pages contain the following main sections:

This man-page

This man-page offers the following sections:

o

1. QUICK START: a quick start overview about how to use flexc++;

o

2. QUICK START: FLEXC++ and BISONC++: a quick start overview about how to use flexc++ in combination with bisonc++(1);

o

3. GENERATED FILES: files generated by flexc++ and their purposes

o

4. OPTIONS: options available for flexc++.

The flexc++api(3) man-page:

This man-page describes the classes generated by flexc++, describing flexc++’s actions from the programmer’s point of view.

o

1. INTERACTIVE SCANNERS: how to create an interactive scanner

o

2. THE CLASS INTERFACE: SCANNER.H: Constructors and members of the scanner class generated by flexc++

o

3. NAMING CONVENTION: symbols defined by flexc++ in the scanner class.

o

4. CONSTRUCTORS: constructors defined in the scanner class.

o

5. PUBLIC MEMBER FUNCTION: public member declared in the scanner class.

o

6. PRIVATE MEMBER FUNCTIONS: private members declared in the scanner class.

o

7. SCANNER CLASS HEADER EXAMPLE: an example of a generated scanner class header

o

8. THE SCANNER BASE CLASS: the scanner class is derived from a base class. The base class is described in this section

o

9. PUBLIC ENUMS AND -TYPES: enums and types declared by the base class

o

10. PROTECTED ENUMS AND -TYPES: enumerations and types used by the scanner and scanner base classes

o

11. NO PUBLIC CONSTRUCTORS: the scanner base class does not offer public constructors.

o

12. PUBLIC MEMBER FUNCTIONS: several members defined by the scanner base class have public access rights.

o

13. PROTECTED CONSTRUCTORS: the base class can be constructed by a derived class. Usually this is the scanner class generated by flexc++.

o

14. PROTECTED MEMBER FUNCTIONS: this section covers the base class member functions that can only be used by scanner class or scanner base class members

o

15. PROTECTED DATA MEMBERS: this section covers the base class data members that can only be used by scanner class or scanner base class members

o

16. FLEX++ TO FLEXC++ MEMBERS: a short overview of frequently used flex(1) members that received different names in flexc++.

o

17. THE CLASS INPUT: the scanner’s job is completely decoupled from the actual input stream. The class Input, nested within the scanner base class handles the communication with the input streams. The class Input, is described in this section.

o

18. INPUT CONSTRUCTORS: the class Input can easily be replaced by another class. The constructor-requirements are described in this section.

o

19. REQUIRED PUBLIC MEMBER FUNCTIONS: this section covers the required public members of a self-made Input class

The flexc++input(7) man-page:

This man-page describes how flexc++’s input s should be organized. It contains the following sections:

o

1. SPECIFICATION FILE(S): the format and contents of flexc++ input files, specifying the Scanner’s characteristics

o

2. FILE SWITCHING: how to switch to another input specification file

o

3. DIRECTIVES: directives that can be used in input specification files

o

4. MINI SCANNERS: how to declare mini-scanners

o

5. DEFINITIONS: how to define symbolic names for regular expressions

o

6. %% SEPARATOR: the separator between the input specification sections

o

7. REGULAR EXPRESSIONS: regular expressions supported by flexc++

o

8. SPECIFICATION EXAMPLE: an example of a specification file

1. QUICK START

A bare-bones, no-frills scanner is generated as follows:

o

First define a subdirectory scanner, and change-dir to scanner. This directory is going to contain all scanner-related files, created next.

o

Create a file lexer defining the regular expressions to recognize, and the tokens to return. Use token values exceeding 0xff when plain ascii character values could also be used as token values. Example (assume capitalized words are token-symbols defined in an enum defined by the scanner class):
%% [ \t\n]+ // skip white space chars. [0-9]+ return NUMBER; [[:alpha:]_][[:alpha:][:digit:]_]* return IDENTIFIER; . return matched()[0];

o

Execute:

flexc++ lexer

This generates four files: Scanner.h, Scanner.ih, Scannerbase.h, and lex.cc.

o

Edit Scanner.h to add the enum defining the token-symbols in (usually) the public section of the class Scanner. E.g.,
class Scanner: public ScannerBase {
public:
enum Tokens
{
IDENTIFIER = 0x100,
NUMBER
};
// ... (etc, as generated by flexc++) }

o

Change-dir to scanner’s base directory, and there create a file main.cc defining int main:
#include <iostream> #include "scanner/Scanner.h" using namespace std; int main() {
Scanner scanner; // define a Scanner object
while (int token = scanner.lex()) // get all tokens
{
string const &text = scanner.matched();
switch (token)
{
case Scanner::IDENTIFIER:
cout << "identifier: " << text << ’\n’;
break;
case Scanner::NUMBER:
cout << "number: " << text << ’\n’;
break;
default:
cout << "char. token: `" << text << "’\n";
break;
}
} }

o

Compile all .cc files, creating a.out:

g++ *.cc scanner/*.cc

o

To `tokenize’ main.cc, execute:

a.out < main.cc

2. QUICK START: FLEXC++ and BISONC++

To interface flexc++ to the bisonc++(1) parser generator proceed as follows:

o

Start from the directory containing main.cc used in the previous section; the lexical scanner developed there is also used here.

o

Create a directory parser and change-dir to that directory.

o

Define the following grammar in the file grammar:
%scanner ../scanner/Scanner.h %token-path ../scanner/tokens.h
%token IDENTIFIER NUMBER CHAR %% startrule:
startrule tokenshow |
tokenshow ; tokenshow:
token
{
std::cout << "matched: " << d_scanner.matched() << ’\n’;
} ; token:
IDENTIFIER |
NUMBER |
CHAR ;

o

Create the parser by executing:
bisonc++ grammar

This generates five files: parse.cc, Parserbase.h, Parser.h, Parser.ih and ../scanner/tokens.h, where the last file contains the class Tokens defining the enumeration Tokens_ specifying the symbolic token names.

o

Now that the parser has been defined, edit the (three) lines in the file ../scanner/lexer containing return statements. Change these lines as follows (the first two lines of the file lexer remain as-is):
[0-9]+ return Tokens::NUMBER; [[:alpha:]_][[:alpha:][:digit:]_]* return Tokens::IDENTIFIER; . return Tokens::CHAR;

This allows the scanner to return Parser tokens to the generated parser.

o

Modify the scanner so that it returns these Parser tokens by executing:

flexc++ lexer

o

Next, add the line

#include "tokens.h"

to the file scanner/Scanner.ih, informing the scanner about the existence of the tokens expected by the parser.

If ever you have to use members from the parser’s base class generated by bisonc++(1), then

#include "../parser/Parserbase.h"

should be added to the file scanner/Scanner.ih. In that case including the file token.h in scanner/Scanner.ih is optional.

o

Change-dir to the scanner’s parent directory and rewrite the main.cc file defined in the previous section to contain:
#include "parser/Parser.h" int main(int argc, char **argv) {
Parser parser;
parser.parse(); }

o

Compile all sources:

g++ *.cc */*.cc

o

Execute the program, providing it with some source file to be processed:

a.out < main.cc

3. GENERATED FILES

Flexc++ generates four files from a well-formed input file:

o

A file containing the implementation of the lex member function and its support functions. By default this file is named lex.cc.

o

A file containing the scanner’s class interface. By default this file is named Scanner.h. The scanner class itself is generated once and is thereafter `owned’ by the programmer, who may change it ad-lib. Newly added members (data members, function members) will survive future flexc++ runs as flexc++ will never rewrite an existing scanner class interface file, unless explicitly ordered to do so.

o

A file containing the interface of the scanner class’s base class. The scanner class is publicly derived from this base class. It is used to minimize the size of the scanner interface itself. The scanner base class is `owned’ by flexc++ and should never be hand-modified. By default the scanner’s base class is provided in the file Scannerbase.h. At each new flexc++ run this file is rewritten unless flexc++ is explicitly ordered not to do so.

o

A file containing the implementation header. This file should contain includes and declarations that are only required when compiling the members of the scanner class. By default this file is named Scanner.ih. This file, like the file containing the scanner class’s interface is never rewritten by flexc++ unless flexc++ is explicitly ordered to do so.

4. OPTIONS

Where available, single letter options are listed between parentheses following their associated long-option variants. Single letter options require arguments if their associated long options require arguments as well. Options affecting the class header or implementation header file are ignored if these files already exist. Options accepting a `filename’ do not accept path names, i.e., they cannot contain directory separators (/); options accepting a ’pathname’ may contain directory separators.

Some options may generate errors. This happens when an option conflicts with the contents of an existing file which flexc++ cannot modify (e.g., a scanner class header file exists, but doesn’t define a name space, but a --namespace option was provided). To solve the error the offending option could be omitted, the existing file could be removed, or the existing file could be hand-edited according to the option’s specification. Note that flexc++ currently does not handle the opposite error condition: if a previously used option is omitted, then flexc++ does not detect the inconsistency. In those cases you may encounter compilation errors.

o

--baseclass-header=filename (-b)
Use filename as the name of the file to contain the scanner class’s base class. Defaults to the name of the scanner class plus base.h

It is an error if this option is used and an already existing scanner-class header file does not include `filename’.

o

--baseclass-skeleton=pathname (-C)
Use pathname as the path to the file containing the skeleton of the scanner class’s base class. Its filename defaults to flexc++base.h.

o

--case-insensitive
Use this option to generate a scanner case insensitively matching regular expressions. All regular expressions specified in flexc++’s input file are interpreted case insensitively and the resulting scanner object will case insensitively interpret its input.

When this option is specified the resulting scanner does not distinguish between the following rules:

First // initial F is transformed to f
first
FIRST // all capitals are transformed to lower case chars

With a case-insensitive scanner only the first rule can be matched, and flexc++ will issue warnings for the second and third rule about rules that cannot be matched.

Input processed by a case-insensitive scanner is also handled case insensitively. The above mentioned First rule is matched for all of the following input words: first First FIRST firST.

Although the matching process proceeds case insensitively, the matched text (as returned by the scanner’s matched() member) always contains the original, unmodified text. So, with the above input matched() returns, respectively first, First, FIRST and firST, while matching the rule First.

o

--class-header=filename (-c)
Use filename as the name of the file to contain the scanner class. Defaults to the name of the scanner class plus the suffix .h

o

--class-name=className
Use className (rather than Scanner) as the name of the scanner class. Unless overridden by other options generated files will be given the (transformed to lower case) className* name instead of scanner*.

It is an error if this option is used and an already existing scanner-class header file does not define class `className’

o

--class-skeleton=pathname (-C)
Use pathname as the path to the file containing the skeleton of the scanner class. Its filename defaults to flexc++.h.

o

--construction (-K)
Write details about the lexical scanner to the file `rules-file’.output. Details cover the used character ranges, information about the regexes, the raw NFA states, and the final DFAs.

o

--debug (-d)
Provide lex and its support functions with debugging code, showing the actual parsing process on the standard output stream. When included, the debugging output is active by default, but its activity may be controlled using the setDebug(bool on-off) member. Note that #ifdef DEBUG macros are not used anymore. By rerunning flexc++ without the --debug option an equivalent scanner is generated not containing the debugging code. This option does not provide debug information about flexc++ itself. For that use the options --own-parser and/or --own-tokens (see below).

o

--filenames=genericName (-f)
Generic name of generated files (header files, not the lex-function source file, see the --lex-source option for that). By default the header file names will be equal to the name of the generated class.

o

--help (-h)
Write basic usage information to the standard output stream and terminate.

o

--implementation-header=filename (-i)
Use filename as the name of the file to contain the implementation header. Defaults to the name of the generated scanner class plus the suffix .ih. The implementation header should contain all directives and declarations only used by the implementations of the scanner’s member functions. It is the only header file that is included by the source file containing lex()’s implementation. User defined implementation of other class members may use the same convention, thus concentrating all directives and declarations that are required for the compilation of other source files belonging to the scanner class in one header file.

It is an error if this option is used and an already existing ’filename’ file does not include the scanner class header file.

o

--implementation-skeleton=pathname (-I)
Use pathname as the path to the file containing the skeleton of the implementation header. Its filename defaults to flexc++.ih.

o

--lex-skeleton=pathname (-L)
Use pathname as the path to the file containing the lex() member function’s skeleton. Its filename defaults to flexc++.cc.

o

--lex-function-name=funname
Use funname rather than lex as the name of the member function performing the lexical scanning.

o

--lex-source=filename (-l)
Define filename as the name of the source file to contain the scanner member function lex. Defaults to lex.cc.

o

--matched-rules (-’R’)
The generated scanner will write the numbers of matched rules to the standard output. It is implied by the --debug option. Displaying the matched rules can be suppressed by calling the generated scanner’s member setDebug(false) (or, of course, by re-generating the scanner without using specifying --matched-rules).

o

--max-depth=depth (-m)
Set the maximum inclusion depth of the lexical scanner’s specification files to depth. By default the maximum depth is set to 10. When more than depth specification files are used the scanner throws a Max stream stack size exceeded std::length_error exception.

o

--namespace=identifier
Define the scanner class in the namespace identifier. By default no namespace is used. If this options is used the implementation header is provided with a commented out using namespace declaration for the requested namespace. In addition, the scanner and scanner base class header files also use the specified namespace to define their include guard directives.

It is an error if this option is used and an already existing scanner-class header file does not define namespace identifier.

o

--no-baseclass-header
Do not write the file containing the scanner’s base class interface even if it doesn’t yet exist. By default the file containing the scanner’s base class interface is (re)written each time flexc++ is called.

o

--no-lines
Do not put #line preprocessor directives in the file containing the scanner’s lex function. By default #line directives are entered at the beginning of the action statements in the generated lex.cc file, allowing the compiler and debuggers to associate errors with lines in your grammar specification file, rather than with the source file containing the lex function itself.

o

--no-lex-source
Do not write the file containing the scanner’s predefined scanner member functions, even if that file doesn’t yet exist. By default the file containing the scanner’s lex member function is (re)written each time flexc++ is called. This option should normally be avoided, as this file contains parsing tables which are altered whenever the grammar definition is modified.

o

--own-parser (-P)
The actions performed by flexc++’s own parser are written to the standard output stream.

This option does not result in the generated program optionally displaying the actions of its lex function. If that is what you want, use the --debug option.

o

--own-tokens (-T)
The tokens returned as well as the text matched by flexc++ are written to the standard output stream when this option is used.

This option does not result in the generated program displaying returned tokens and matched text. If that is what you want, use the --print-tokens option.

o

--print-tokens (-t)
The tokens returned as well as the text matched by the generated lex function are displayed on the standard output stream, just before returning the token to lex’s caller. Displaying tokens and matched text is suppressed again when the lex.cc file is generated without using this option. The function showing the tokens (ScannerBase::print_) is called from Scanner::printTokens, which is defined in-line in Scanner.h. Calling ScannerBase::print_, therefore, can also easily be controlled by an option controlled by the program using the scanner object.

This option does not show the tokens returned and text matched by flexc++ itself when reading its input s. If that is what you want, use the --own-tokens option.

o

--regex-calls
Show the function call order when parsing regular expressions (this option is normally not required. Its main purpose is to help developers understand what happens when regular expressions are parsed).

o

--show-filenames (-F)
Write the names of the files that are generated to the standard error stream.

o

--skeleton-directory=pathname (-S)
Defines the directory containing the skeleton files. This option can be overridden by the specific skeleton-specifying options (-B -C, -H, and -I).

o

--target-directory=pathname
Specifies the directory where generated files should be written. By default this is the directory where flexc++ is called.

o

--usage (-h)
Write basic usage information to the standard output stream and terminate.

o

--verbose(-V)
The verbose option generates on the standard output stream various pieces of additional information, not covered by the --construction and --show-filenames options.

o

--version (-v)
Display flexc++’s version number and terminate.

FILES

Flexc++’s default skeleton files are in /usr/share/flexc++.
By default, flexc++ generates the following files:

o

Scanner.h: the header file containing the scanner class’s interface.

o

Scannerbase.h: the header file containing the interface of the scanner class’s base class.

o

Scanner.ih: the internal header file that is meant to be included by the scanner class’s source files (e.g., it is included by lex.cc, see the next item’s file), and that should contain all declarations required for compiling the scanner class’s sources.

o

lex.cc: the source file implementing the scanner class member function lex (and support functions), performing the lexical scan.

SEE ALSO

bisonc++(1), flexc++api(3), flexc++input(7)

BUGS

None reported

ABOUT flexc++

Flexc++ was originally started as a programming project by Jean-Paul van Oosten and Richard Berendsen in the 2007-2008 academic year. After graduating, Richard left the project and moved to Amsterdam. Jean-Paul remained in Groningen, and after on-and-off activities on the project, in close cooperation with Frank B. Brokken, Frank undertook a rewrite of the project’s code around 2010. During the development of flexc++, the lookahead-operator handling continuously threatened the completion of the project. But in version 2.00.00 the lookahead operator received a completely new implementation (with a bug fix in version 2.04.00), which solved previously encountered problems with the lookahead-operator.

COPYRIGHT

This is free software, distributed under the terms of the GNU General Public License (GPL).

AUTHOR

Frank B. Brokken (f.b.brokken@rug.nl),
Jean-Paul van Oosten (j.p.van.oosten@rug.nl),
Richard Berendsen (richardberendsen@xs4all.nl) (until 2010).

2008-2022 flexc++.2.11.02