triehash - Generate a perfect hash function derived from a
trie.
triehash [option] [input file]
triehash takes a list of words in input file and generates a
function and an enumeration to describe the word
The file consists of multiple lines of the form:
[label ~ ] word [= value]
This maps word to value, and generates an enumeration with entries
of the form:
label = value
If label is undefined, the word will be used, the minus
character will be replaced by an underscore. If value is undefined it is
counted upwards from the last value.
There may also be one line of the format
[ label ~] = value
Which defines the value to be used for non-existing keys. Note
that this also changes default value for other keys, as for normal entries.
So if you place
= 0
at the beginning of the file, unknown strings map to 0, and the
other strings map to values starting with 1. If label is not specified, the
default is Unknown.
- -C.c file
--code=.c file
- Generate code in the given file.
- -Hheader file
--header=header file
- Generate a header in the given file, containing a declaration of the hash
function and an enumeration.
- --enum-name=word
- The name of the enumeration.
- --function-name=word
- The name of the function.
- --label-prefix=word
- The prefix to use for labels.
- --label-uppercase
- Uppercase label names when normalizing them.
- --namespace=name
- Put the function and enum into a namespace (C++)
- --class=name
- Put the function and enum into a class (C++)
- --enum-class
- Generate an enum class instead of an enum (C++)
- --counter-name=name
- Use name for a counter that is set to the latest entry in the
enumeration + 1. This can be useful for defining array sizes.
- --ignore-case
- Ignore case for words.
- --multi-byte=value
- Generate code reading multiple bytes at once. The value is a string of
power of twos to enable. The default value is 320 meaning that 8, 4, and
single byte reads are enabled. Specify 0 to disable multi-byte completely,
or add 2 if you also want to allow 2-byte reads. 2-byte reads are disabled
by default because they negatively affect performance on older Intel
architectures.
This generates code for both multiple bytes and single byte
reads, but only enables the multiple byte reads of GNU C compatible
compilers, as the following extensions are used:
- Byte-aligned
integers
- We must be able to generate integers that are aligned to a single byte
using:
typedef uint64_t __attribute__((aligned (1))) triehash_uu64;
- Byte-order
- The macros __BYTE_ORDER__ and __ORDER_LITTLE_ENDIAN__ must be
defined.
We forcefully disable multi-byte reads on platforms where the
variable __ARM_ARCH is defined and __ARM_FEATURE_UNALIGNED is
not defined, as there is a measurable overhead from emulating the unaligned
reads on ARM.
- --language=language
- Generate a file in the specified language. Currently known are 'C' and
'tree', the latter generating a tree.
- --include=header
- Add the header to the include statements of the header file. The value
must be surrounded by quotes or angle brackets for C code. May be
specified multiple times.
triehash is available under the MIT/Expat license, see the source
code for more information.
Julian Andres Klode <jak@jak-linux.org>