dictd - a dictionary database server
dictd is a server for the Dictionary Server Protocol
(DICT), a TCP transaction based query/response protocol that allows a client
to access dictionary definitions from a set of natural language dictionary
databases.
For security reasons, dictd drops root permissions after startup.
If user dictd exists on the system, the daemon will run as that user,
group dictd, otherwise it will run as user nobody, group
nobody or nogroup (depending on the operating system
distribution).
Since startup time is significant, the server is designed to run
continuously, and should not be run from inetd(8). (However,
with a fast processor, it is feasible to do so.)
Databases are distributed separately from the server.
By default, dictd assumes that the index files are sorted
alphabetically, and only alphanumeric characters from the 7-bit ASCII
character set are used for search. This default may be overridden by a
header in the data file. The only such features implemented at this time are
the headers "00-database-allchars" which tells dictd that
non-alphanumeric characters may also be used for search, the header
"00-database-utf8" which indicates that the database uses utf8
encoding, and the "00-database-8bit-new" which indicates that the
database is encoded and sorted according to a locale that uses an 8-bit
encoding.
For many years, the Internet community has relied on the
"webster" protocol for access to natural language definitions. The
webster protocol supports access to a single dictionary and (optionally) to
a single thesaurus. In recent years, the number of publicly available
webster servers on the Internet has dramatically decreased.
Fortunately, several freely-distributable dictionaries and
lexicons have recently become available on the Internet. However, these
freely-distributable databases are not accessible via a uniform interface,
and are not accessible from a single site. They are often small and
incomplete individually, but would collectively provide an interesting and
useful database of English words. Examples include the Jargon file, the
WordNet database, MICRA's version of the 1913 Webster's Revised Unabridged
Dictionary, and the Free Online Dictionary of Computing. (See the DICT
protocol specification (RFC) for references.) Translating and non-English
dictionaries are also becoming available (for example, the FOLDOC dictionary
is being translated into Spanish).
The webster protocol is not suitable for providing access to a
large number of separate dictionary databases, and extensions to the current
webster protocol were not felt to be a clean solution to the dictionary
database problem.
The DICT protocol is designed to provide access to multiple
databases. Word definitions can be requested, the word index can be searched
(using an easily extended set of algorithms), information about the server
can be provided (e.g., which index search strategies are supported, or which
databases are available), and information about a database can be provided
(e.g., copyright, citation, or distribution information). Further, the DICT
protocol has hooks that can be used to restrict access to some or all of the
databases.
dictd(8) is a server that implements the DICT protocol.
Bret Martin implemented another server, and several people (including Bret
and myself) have implemented clients in a variety of languages.
- -V or
--version
- Display version information.
- --license
- Display copyright and license information.
- -h or --help
- Display help information.
- -v or --verbose or
-dverbose
- Be verbose.
- -c file or
--config file
- Specify configuration file. The default is /etc/dictd/dictd.conf ,
but may be changed in the defs.h file at compile time
(DICTD_CONFIG_FILE).
- -p port or
--port port
- Overrides the keyword port in Global Settings Specification
section of configuration file.
- -i or --inetd
- Communicate on standard input/output, suitable for use from inetd.
Although, due to its rather large startup time, this daemon was not
intended to run from inetd, with a fast processor it is feasible to do so.
This option also implies --fast-start.
- --pp prog
- Sets a preprocessor for configuration file. like m4 or cpp
. See examples/dictd_complex.conf file from distribution. By default
configuration file is parsed without preprocessor.
- --depth
length
- Overrides the keyword depth in Global Settings Specification
section of configuration file.
- --delay
seconds
- Overrides the keyword delay in Global Settings Specification
section of configuration file.
- --facility
facility
- The same as syslog_facility keyword in Global Settings
Specification of configuration files.
- -f or --force
- Force the daemon to start even if an instance of the daemon is already
running. (This is of little value unless a non-default port is specified
with -p, since, if one instance is bound to a port, the second one
fails when it can not bind to the port.)
- --limit
children
- Overrides the keyword limit in Global Settings Specification
section of configuration file.
- --listen-to
host
- Overrides the keyword listen_to in Global Settings
Specification section of configuration file.
- --address-family
family
- Overrides the keyword address_family in Global Settings
Specification section of configuration file.
- --locale
locale
- Overrides the keyword locale in Global Settings
Specification section of configuration file.
- -s
- The same as syslog keyword in Global Settings Specification
of configuration files.
- -L file or
--logfile file
- The same as log_file keyword in Global Settings
Specification of configuration files.
- --pid-file
file
- The same as pid_file keyword in Global Settings
Specification of configuration files.
- -m minutes or
--mark minutes
- Overrides the keyword timestamp in Global Settings
Specification section of configuration file.
- --default-strategy
strategy
- Overrides the keyword default_strategy in Global Settings
Specification section of configuration file.
- --without-strategy
strat1,strat2,...
- The same as without_strategy keyword in Global Settings
Specification of configuration files.
- --add-strategy
strategy_name:description
- The same as add_strategy keyword in Global Settings
Specification of configuration files.
- --fast-start
- The same as fast_start keyword in Global Settings
Specification of configuration files.
- --without-mmap
- The same as without_mmap keyword in Global Settings
Specification of configuration files.
- --stdin2stdout
- When applied with --inetd, each command obtained from stdin is output to
stdout. This option is useful for debugging.
- -l option or
--log option
- The same as log_option keyword in Global Settings
Specification of configuration files.
- -d option
- The same as debug_option keyword in Global Settings
Specification of configuration files.
- Introduction
- The configuration file defaults to /etc/dictd/dictd.conf but can be
specified on the command line with the -c option (see above).
The configuration file is read into memory at startup, and is not
referenced again by dictd unless a signal 1 (SIGHUP) is
received, which will cause dictd to reread the configuration
file.
The file is divided into sections. The Access Section should come
first, followed by the Database Section, and the User Section. The Database
Section is required; the others are optional, but they must be in the order
listed here.
- Syntax
- The following keywords are valid in a configuration file: access, allow,
deny, group, database, data, index, filter, prefilter, postfilter, name,
include, user, authonly, site. Keywords are case sensitive. String
arguments that contain spaces should be surrounded by double quotes.
Without quoting, strings may contain alphanumeric characters and _, -, .,
and *, but not spaces. Strings can be continued between lines. \",
\\, \n, \<NL> are treated as double quote, backslash, new line and
no symbol respectively. Comments start with # and extend to the end of the
line.
- Global Settings
Section
- Access
Section
- access { access
specification }
- This section contains access restrictions for the server and all of the
databases collectively. Per-database control is specified in the Database
Section.
- EXAMPLE:
- See examples/dictd3.conf file from the distribution.
- Database
Section
- database
string { database specification }
- The string specifies the name of the database (e.g., wn or web1913). (This
is an arbitrary name selected by the administrator, and is not necessarily
related to the file name or any name listed in the data file. A short,
easy to type name is often selected for easy use with dict
-d.)
EXAMPLE: See examples/dictd*.conf files
from the distribution.
NOTE: If the files specified in the database specification
do not exist on the system, dictd may silently fail.
- database_virtual
string { virtual database specification }
- This section specifies the virtual database. The string specifies
the name of the database (e.g., en-ru or fren).
EXAMPLE: See examples/dictd_virtual.conf or
examples/dictd_complex.conf files from the distribution.
- database_plugin
string { plugin specification }
- This section specifies the plugin. The string specifies the name of
the database.
EXAMPLE: See examples/dictd_plugin_dbi.conf
or examples/dictd_complex.conf files from the distribution.
- database_mime
string { mime specification }
- Traditionally, databases created for dictd contained plain text
only because dictd releases before 1.10.0 didn't have full support
of OPTION MIME option (consult with RFC-2229). This section
describes the special database which behaves differently depending on
whether OPTION MIME command was received from client or was not,
i.e. the database created by this section allows one to return to the
client either a plain text or specially formatted content depending on
whether DICT client supports (or wants to receive) MIMEized content or
doesn't. The string specifies the name of the database.
NOTE: All this is about DEFINE command only.
MATCH, SHOW DB, SHOW STRAT, SHOW INFO, SHOW SERVER and HELP commands
return texts prepended with empty line only.
EXAMPLE: See examples/dictd_mime.conf file
from the distribution.
- database_exit
- Excludes following databases from the '*' database. By default '*' means
all databases available. Look at 'examples/dictd_virtual.conf' file for
example configuration.
NOTE: If you use 'virtual' dictionaries, you should use
this directive, otherwise you will search the same dictionary twice.
- User Section
- user string
string
- The first string specifies the username, and the second string specifies
the shared secret for this username. When the AUTH command is used, the
client will provide the username and a hashed version of the shared
secret. If the shared secret matches, the user is said to have
authenticated, and will have access to databases whose access
specifications allow that user (by name, or by wildcard). If present, this
section must appear last in the configuration file. There may be many user
entries. The shared secret should be kept secret, as anyone who has access
to it can access the shared databases (assuming access is not denied by
domain name).
- Access
Specification
Access specifications may occur in the Access Section or
in the Database Section. The access specification will be described here.
For allow, deny, and authonly, a star (*) may be used as a wild
card that matches any number of characters. A question mark (?) may be used
as a wildcard that matches a single character. For example, 10.0.0.* and
*.edu are valid strings.
Further, a range of IP addresses and an IP address followed by a
netmask may be specified. For example, 10.0.0.0:10.0.0.255, 10.0.0.0/24, and
10.0.0.* all specify the same range of IP numbers. Notation cannot be
combined on the same line. If the notation does not make sense, access will
be denied by default. Use the --debug auth option to debug related
problems.
Note that these specifications take only one string per
specification line. However, you can have multiple lines of each type.
The syntax is as follows:
- allow
string
- The string specifies a domain name or IP address which is allowed access
to the server (in the Access Section) or to a database (in the Database
Section). Note that more than one string is not permitted for a single
"allow" line, but more than one "allow" lines are
permitted in the configuration file.
- deny
string
- The string specifies a domain name or IP address which is denied access to
the server (in the Access Section) or to a database (in the Database
Section). Note that if reverse DNS is not working, then only the IP number
will be checked. Therefore, it is essential to deny networks based on IP
number, since a denial based on domain name may not always be
checked.
- authonly
string
- This form is only useful in the Access Section. The string specifies a
domain name or IP address which is allowed access to the server but not to
any of the databases. All commands are valid except DEFINE, MATCH, and
SHOW DB. More specifically AUTH is a valid command, and commands which
access the databases are not allowed.
- user
string
- This form is only useful in the Database Section. The string specifies a
username that is allowed to access this database after a successful AUTH
command is executed.
- Global Settings
Specification
This section describes the following parameters:
- port
string_or_number
- Specifies the port or service name (e.g., 2628). The default is 2628, as
specified in the DICT Protocol RFC, but may be changed in the
defs.h file at compile time (DICT_DEFAULT_SERVICE).
- site
string
- Used to specify the filename for the site information file, a flat text
file which will be displayed in response to the SHOW SERVER command.
EXAMPLE: See examples/dictd4.conf file from the
distribution.
- site_no_banner
boolean
- By default SHOW SERVER command outputs information about dictd
version and an operating system type. This option disables this.
- site_no_uptime
boolean
- By default SHOW SERVER command outputs information about uptime of
dictd , a number of forks since startup and forks per hour. This
option disables this.
- site_no_dblist
boolean
- By default SHOW SERVER command outputs internal information about
databases, such as a number of headwords, index size and so on. This
option disables this.
- delay
number
- Specifies the number of seconds a client may be idle before the server
will close the connection. Idle time is defined to be the time the server
is waiting for input and does not include the time the server spends
searching the database. The default is 0 seconds (no limit), but may be
changed in the defs.h file at compile time
(DICT_DEFAULT_DELAY).
NOTE: Setting
delay option disables
limit_time option. Only one of them (last specified in
dictd.conf ) is in effect.
NOTE: Connections are closed without warning since no
provision for premature connection termination is specified in the DICT
protocol RFC.
- depth
number
- Specify the queue length for listen(2). Specifies the number of
pending socket connections which are queued by the operating system. Some
operating systems may silently limit this value to 5 (older BSD systems)
or 128 (Linux). The default is 10 but may be changed in the defs.h
file at compile time (DICT_QUEUE_DEPTH).
- limit_childs
number
- Specifies the number of daemons that may be running simultaneously. Each
daemon services a single connection. If the limit is exceeded, a
(serialized) connection will be made by the server process, and a response
code 420 (server temporarily unavailable) will be sent to the client. This
parameter should be adjusted to prevent the server machine from being
overloaded by dict clients, but should not be set so low that many clients
are denied useful connections. The default is 100, but may be changed in
the defs.h file at compile time (DICT_DAEMON_LIMIT_CHILDS).
- limit
number
- Synonym for limit_childs. For backward compatibility only.
- limit_matches
number
- Specifies the maximum number of matches that can be returned by MATCH
query. Zero means no limit. The default is 2000.
- limit_definitions
number
- Specifies the maximum number of definitions that can be returned by DEFINE
query. Zero means no limit. The default is 200.
- limit_time
number
- Specifies the number of seconds a client may talk to the server before the
server will close the connection. The default is 600 seconds (10 minutes),
but may be changed in the defs.h file at compile time
(DICT_DEFAULT_LIMIT_TIME).
NOTE: Setting limit_time option disables
delay option. Only one of them (last specified in dictd.conf )
is in effect.
NOTE: Connections are closed without warning since no
provision for premature connection termination is specified in the DICT
protocol RFC.
- limit_queries
number
- Specifies the number of queries (MATCH, DEFINE, SHOW DB etc.) that client
may send to the server before the server will close the connection. Zero
means no limit. The default is 2000, but may be changed in the
defs.h file at compile time (DICT_DEFAULT_LIMIT_QUERIES).
- timestamp
number
- How often a timestamp should be logged (int minutes). (This is effective
only if logging has been enabled with the -s or -L option, or with a
debugging option.)
- log_option
option
- Specify a logging option. This is effective only if logging has been
enabled with the -s or -L option or in configuration file,
or logging to the console has been activated with a debugging option
(e.g., --debug nodetach. Only one option may be set with each
invocation of this option; however, multiple invocations of this option
may be made in configuration file or dictd command line. For instance:
dictd -s --log stats --log found --log notfound
is a valid command line, and sets three logging options.
Some of the more verbose logging options are used primarily for
debugging the server code, and are not practical for normal use.
- server
- Log server diagnostics. This is extremely verbose.
- connect
- Log all connections.
- stats
- Log all children terminations.
- command
- Log all commands. This is extremely verbose.
- client
- Log results of CLIENT command.
- found
- Log all words found in the databases.
- notfound
- Log all words not found in the databases.
- timestamp
- When logging to a file, use a full timestamp like that which syslog would
produce. Otherwise, no timestamp is made, making the files shorter.
- host
- Log name of foreign host.
- auth
- Log authentication failures.
- min
- Set a minimal number of options. If logging is activated (to a file, or
via syslog), and no options are set, then the minimal set of options will
be used. If options are set, then only those options specified will be
used.
- all
- Set all of the options.
- none
- Clear all of the options.
To facilitate location of interesting information in the log file,
entries are marked with initial letters indicating the class of the line
being logged:
- I
- Information about the server, connections, or termination statistics.
These lines are generally not designed to be parsed automatically.
- E
- Error messages.
- C
- CLIENT command information.
- D
- Definitions found in the databases searched.
- M
- Matches found in the database searched.
- N
- Matches which were not found in the databases searched.
- T
- Trace of exact line sent by client.
- A
- Authentication information.
To preserve anonymity of the client, do not use the
connect or host options. Clients may or may not send host
information using the CLIENT command, but this should be an option that is
selectable on the client side.
- debug_option
string
- Activate a debugging option. There are several, all of which are only
useful to developers. They are documented here for completeness. A list
can be obtained interactively by using -d with an illegal
option.
- verbose
- The same as -v or --verbose. Adds verbosity to other
options.
- scan
- Debug the scanner for the configuration file.
- parse
- Debug the parser for the configuration file.
- search
- Debug the character folding and binary search routines.
- init
- Report database initialization.
- port
- Log client-side port number to the log file.
- lev
- Debug Levenshtein search algorithm.
- auth
- Debug the authorization routines.
- nodetach
- Do not detach as a background process. Implies that a copy of the log file
will appear on the standard output.
- nofork
- Do not fork daemons to service requests. Be a single-threaded server. This
option implies nodetach, and is most useful for using a debugger to
find the point at which daemon processes are dumping core.
- alt
- Debugs altcompare in index.c.
- locale
string
- Specifies the locale used for searching. If no locale is specified, the
"C" locale is used. The locale used for the server should be the
same as that used for dictfmt when the database was built (specifically,
the locale under which the index was sorted). The locale should be
specified for both 8-bit and UTF-8 formats. If locale contains utf8 or
utf-8 substring, UTF-8 format is expected. Note that if your database is
not in ASCII7 or UTF-8 format, then the dictd server will not be compliant
to RFC 2229.
NOTE If utf-8 or 8-bit dictionaries are included in the
configuration file, and the appropriate --locale has not been specified,
dictd will fail to start. This implies that dictd will not run
with both utf-8 and 8-bit dictionaries in the configuration file.
- add_strategy
strategy_name description
- Adds strategy strategy_name with the description
description. This new search strategy may be implemented with a
help of plugins. Both strategy_name and description are
strings.
- default_strategy
string
- Set the server's default search strategy for MATCH search type. The
compiled-in default is 'lev'. It is also possible to set default strategy
per database. See default_strategy keyword in Database
specification section.
- disable_strategy
string
- Disable specified strategies. By default all implemented search strategies
are enabled. It is also possible to disable strategies per database. See
disable_strategy keyword in Database specification
section.
- listen_to
host
- Local host name or IP address for bind. If unspecified or *, dictd
will bind to all interfaces. Otherwise, dictd will bind to this address
only.
- address_family
family
- If 4, address family is IPv4 (the default), if 6, address
family is IPv6.
- syslog
string
- Log using the syslog(3) facility.
- syslog_facility
string
- Specifies the syslog facility to use. The use of this option implies the
-s option to turn on logging via syslog. When the operating system
libraries support SYSLOG_NAMES, the names used for this option should be
those listed in syslog.conf(5). Otherwise, the following names are
used (assuming the particular facility is defined in the header files):
auth, authpriv, cron, daemon, ftp, kern, lpr, mail, news, syslog, user,
uucp, local0, local1, local2, local3, local4, local5, local6, and
local7.
- log_file
string
- Specify the file for logging. The filename specified is recomputed on each
use using the strftime(3) call. For example, a filename ending in
".%Y%m%d" will write to log files ending in the year, month, and
date that the log entry was written.
NOTE: If dictd does not have write
permission for this file, it will silently fail.
- pid_file
string
- The specified filename will be created to contain the process id of the
main dictd process. The default is /var/run/dictd.pid
- fast_start
- By default, dictd creates (in memory) additional index to make the search
faster. This option disables this behaviour and makes startup faster.
- without_mmap
- do not use the mmap(2) function and read entire files into memory instead.
Use this option, if you know exactly what you are doing.
- Database
Specification
The database specification describes the database:
- data
string
- Specifies the filename for the flat text database. If the filename does
not begin with '.' or '/', it is prepended with $datadir/. It is a compile
time option. You can change this behaviour by editing Makefile or running
./configure --datadir=...
- index
string
- Specifies the filename for the index file. Path matter is similar to that
described above in "data" option .
- index_suffix
string
- This is optional index file to make 'suffix' search strategy faster
(binary search). It is generated by 'dictfmt_index2suffix'. Run
"dictfmt_index2suffix --help" for more information. Path matter
is similar to that described above in "data" option .
- index_word
string
- This is optional index file to make 'word' search strategy faster (binary
search). It is generated by 'dictfmt_index2word'. Run
"dictfmt_index2word --help" for more information. Path matter is
similar to that described above in "data" option .
- prefilter
string
- Specifies the prefilter command. When a chunk of the compressed database
is read, it will be filtered with this filter before being decompressed.
This may be used to provide some additional compression that knows about
the data and can provide better compression than the LZ77 algorithm used
by zlib.
- postfilter
string
- Specifies the postfilter command. When a chunk of the compressed database
is read, it will be filtered with this filter before the offset and length
for the entry are used to access data. This is provided for symmetry with
the prefilter command, and may also be useful for providing additional
database compression.
- filter
string
- Specifies the filter command. After the entry is extracted from the
database, it will be filtered with this filter. This may be used to
provide formatting for the entry (e.g., for html).
- name
string
- Specifies the short name of the database (e.g., "1913
Webster's"). If the string begins with @, then it specifies the
headword to look up in the dictionary to find the short name of the
database. The default is "@00-database-short", but this may be
changed in the defs.h file at compile time
(DICT_SHORT_ENTRY_NAME).
- info
string
- Specifies the information about database. If the string begins with @,
then it specifies the headword to look up in the dictionary to find
information. The default is "@00-database-info", but this may be
changed in the defs.h file at compile time
(DICT_INFO_ENTRY_NAME).
- invisible
- Makes dictionary invisible to the clients i.e. this dictionary will not be
recognized or shown by DEFINE, MATCH, SHOW INFO, SHOW SERVER and SHOW DB
commands. If some definitions or matches are found in invisible
dictionary, the name of the upper visible virtual dictionary is returned.
Dictionaries '*' and '!' don't include invisible ones. NOTE:
Invisible dictionaries are completely inaccessible (and invisible) to the
client unless they are included to the virtual or MIME dictionary (See
database_virtual or database_mime database sections).
- disable_strategy
string
- Disables the specified strategy for database. This may be useful for slow
dictionaries (plugins) or for dictionaries included to virtual ones. For
an example see file examples/dictd_complex.conf.
- default_strategy
string
- Specifies the strategy which will be used if the database is accessed
using the strategy '.'. I.e. this directive is the way to set the
preferred search strategy per database. For example, instead of strategy
lev , the strategy word may be preferred for databases
mainly containing the multiword phrases but the single words.
- Virtual Database
Specification
The virtual database specification describes the virtual
database:
- database_list
string
- Specifies a list of databases which are included into the virtual
database. Database names are in the string and are separated by
comma.
- name
string
- Specifies the short name of the database. See database
specification
- info
string
- Specifies the information about database. See database
specification
- invisible
- Makes dictionary invisible to the clients. See database
specification
- disable_strategy
string
- Disables the specified strategy for database. See database
specification
- Plugin
Specification
- plugin
string
- Specifies a filename of the plugin.
- data
string
- Specifies data for initializing plugin.
- name
string
- Specifies the short name of the database. See Database
Specification for more information.
- info
string
- Specifies the information about database. See Database
Specification for more information.
- invisible
- Makes dictionary invisible to the clients. See Database
Specification for more information.
- disable_strategy
string
- Disables the specified strategy for database. See Database
Specification for more information.
- default_strategy
string
- Sets the default search strategy for database. See Database
Specification for more information.
Mime Specification
- dbname_nomime
string
- Specifies the real database name which is used in case OPTION MIME
command was NOT received from a client.
- dbname_mime
string
- Specifies the real database name which is used in case OPTION MIME
command WAS received from a client. A necessary MIME header is set while
creating a database. See dictfmt(1) for option
--mime-header.
- name
string
- Specifies the short name of the database. See Database
Specification for more information.
- info
string
- Specifies the information about database. See Database
Specification for more information.
- invisible
- Makes dictionary invisible to the clients. See Database
Specification for more information.
- disable_strategy
string
- Disables the specified strategy for database. See Database
Specification for more information.
- default_strategy
string
- Sets the default search strategy for database. See Database
Specification for more information.
- include
string
- The text of the file "string" (usually a database specification)
will be read as if it appeared at this location in the configuration file.
Nested includes are not permitted.
When a client connects, the global access specification is
scanned, in order, until a specification matches. If no access specification
exists, all access is allowed (e.g., the action is the same as if
"allow *" was the only item in the specification). For each item,
both the hostname and IP are checked. For example, consider the following
access specification:
allow 10.42.*
authonly *.edu
deny *
With this specification, all clients in the 10.42 network will be allowed access
to unrestricted databases; all clients from *.edu sites will be allowed to
authenticate, but will be denied access to all databases, even those which are
otherwise unrestricted; and all other clients will have their connection
terminated immediately. The 10.42 network clients can send an AUTH command and
gain access to restricted databases. The *.edu clients must send an AUTH
command to gain access to any databases, restricted or unrestricted.
When the AUTH command is sent, the access list for each database
is scanned, in order, just as the global access list is scanned. However,
after authentication, the client has an associated username. For example,
consider the following access specification:
user u1
deny *.com
user u2
allow *
If the client authenticated as u1, then the client will have access to this
database, even if the client comes from a *.com site. In contrast, if the
client authenticated as u2, the client will only have access if it does not
come from a *.com site. In this case, the "user u2" is redundant,
since that client would also match "allow *".
Warning: Checks are performed for domain names and for IP
addresses. However, if reverse DNS for a specific site is not working, it is
possible that a domain name may not be available for checking. Make sure
that all denials use IP addresses. (And consider a future enhancement: if a
domain name is not available, should denials that depend on a domain name
match anything? This is the more conservative viewpoint, but it is not
currently implemented.)
The DICT standard specifies a few search algorithms that must be
implemented, and permits others to be supported on a server-dependent basis.
The following search strategies are supported by this server. Note that
all strategies are case insensitive. Most ignore non-alphanumeric,
non-whitespace characters.
- exact
- An exact match. This algorithm uses a binary search and is one of the
fastest search algorithms available.
- lev
- The Levenshtein algorithm (string edit distance of one). This algorithm
searches for all words which are within an edit distance of one from the
target word. An "edit" means an insertion, deletion, or
transposition. This is a rapid algorithm for correcting spelling errors,
since many spelling errors are within a Levenshtein distance of one from
the original word.
- prefix
- Prefix match. This algorithm also uses a binary search and is very
fast.
- nprefix
- Like prefix but returns the specified range of matches. For
example, when prefix strategy returns 1000 matches, you can get
only 100 ones skipping the first 800 matches. This is made by specified
these limits in a query like this: 800#100#app, where 800 is skip count,
100 is a number of matches you want to get and "app" is your
query. This strategy allows one to implement DICT client with fast
autocompletion (although it is not trivial) just like many standalone
dictionary programs do.
NOTE: If you access the dictionary "*"
(or virtual one) with nprefix strategy, the same range is set for each
database in it, but globally for all matches found in all databases.
NOTE: In case you access non-english dictionary
the returned matches may be (and mostly will be) NOT ordered in alphabetic
order.
- re
- POSIX 1003.2 (modern) regular expression search. Modern regular
expressions are the ones used by egrep(1). These regular
expressions allow predefined character classes (e.g., [[:alnum:]],
[[:alpha:]], [[:digit:]], and [[:xdigit:]] are useful for this
application); uses * to match a sequence 0 or more matches of the previous
atom; uses + to match a sequence of 1 or more matches of the previous
atom; uses ? to match a sequence of 0 or 1 matches of the previous atom;
used ^ to match the beginning of a word, uses $ to match the end of a
word, and allows nested subexpression and alternation with () and |. For
example, "(foo|bar)" matches all words that contain either
"foo" or "bar". To match these special characters,
they must be quoted with two backslashes (due to the quoting
characteristics of the server). Warning: Regular expression matches
can take 10 to 300 times longer than substring matches. On a busy server,
with many databases, this can required more than 5 minutes of waiting
time, depending on the complexity of the regular expression.
- regexp
- Old (basic) regular expressions. These regular expressions don't support
|, +, or ?. Groups use escaped parentheses. While modern regular
expressions are generally easier to use, basic regular expressions have a
back reference feature. This can be used to match a second occurrence of
something that was already matched. For example, the following expression
finds all words that begin and end with the same three letters:
^\\(...\\).*\\1$
Note the use of the double backslashes to escape the special
characters. This is required by the DICT protocol string specification (a
single backslash quotes the next character -- we use two to get a single
backslash through to the regular expression engine). Warning: Note
that the use of backtracking is even slower than the use of general regular
expressions.
- soundex
- The Soundex algorithm, a classic algorithm for finding words that sound
similar to each other. The algorithm encodes each word using the first
letter of the word and up to three digits. Since the first letter is
known, this search is relatively fast, and it sometimes good for
correcting spelling errors when the Levenshtein algorithm doesn't
help.
- substring
- Match a substring anywhere in the headword. This search strategy uses a
modified Boyer-Moore-Horspool algorithm. Since it must search the whole
index file, it is not as fast as the exact and prefix matches.
- suffix
- Suffix match. This search strategy also uses a modified
Boyer-Moore-Horspool algorithm, and is as fast as the substring search. If
the optional index_suffix string file is listed in the configuration file
this search is much faster.
- word
- Match any single word, even if part of a multi-word entry. If the optional
index_word string file is listed in the configuration file this search
strategy works much faster.
- first
- Match the first word that begins a multi-word entry.
- last
- Match the last word that ends a multi-word entry. If the optional
index_suffix string file is listed in the configuration file this search
strategy works much faster.
Databases for dictd are distributed separately. A database
consists of two files. One is a flat text file, the other is the index.
The flat text file contains dictionary entries (or any other
suitable data), and the index contains tab-delimited tuples consisting of
the headword, the byte offset at which this entry begins in the flat text
file, and the length of the entry in bytes. The offset and length are
encoded using base 64 encoding using the 64-character subset of
International Alphabet IA5 discussed in RFC 1421 (printable encoding) and
RFC 1522 (base64 MIME). Encoding the offsets in base 64 saves considerable
space when compared with the usual base 10 encoding, while still permitting
tab characters (ASCII 9) to be used for delimiting fields in a record. Each
record ends with a newline (ASCII 10), so the index file is human
readable.
Some headwords are used by dictd especially
00-database-info Contains the information about database
which is returned by SHOW INFO command, unless it is specified in the
configuration file.
00-database-short Contains the short name of the database
which is returned by SHOW DB command, unless it is specified in the
configuration file. See dictfmt -s.
00-database-url URL where original dictionary sources were
obtained from. See dictfmt -u. This headword is not used by dictd
00-database-utf8 Presents if dictionary is encoded using
UTF-8. See dictfmt --utf8
00-database-8bit-new Presents if dictionary is encoded
using 8-BIT character set (not ASCII and not UTF8). See dictfmt
--locale.
The flat text file may be compressed using gzip(1) (not
recommended) or dictzip(1) (highly recommended). Optimal speed will
be obtained using an uncompressed file. However, the gzip compression
algorithm works very well on plain text, and can result in space savings
typically between 60 and 80%. Using a file compressed with gzip(1) is
not recommended, however, because random access on the file can only be
accomplished by serially decompressing the whole file, a process which is
prohibitively slow. dictzip(1) uses the same compression algorithm
and file format as does gzip(1), but provides a table that can be
used to randomly access compressed blocks in the file. The use of 50-64kB
blocks for compression typically degrades compression by less than 10%,
while maintaining acceptable random access capabilities for all data in the
file. As an added benefit, files compressed with dictzip(1) can be
decompressed with gzip(1) or zcat(1). (Note: recompressing a
dictzip'd file using, for example, znew(1) will destroy the
random access characteristics of the file. Always compress data files using
dictzip(1).)
SIGHUP causes dictd to reread configuration file and
reinitialize databases.
SIGUSR1 causes dictd to unload databases. Then dictd
returns 420 status (instead of 220). To load databases again, send
SIGHUP signal. Because database files are mmap'ed(2) , it is
impossible to update them while dictd is running. So, if you need to
update database files and reread configuration file, first, send
SIGUSR1 signal to dictd to unload databases, update files, and
then send SUGHUP signal to load them again.
The main source files for the dictd server and the
dictzip compression program were written by Rik Faith
(faith@dict.org) and are distributed under the terms of the GNU General
Public License. If you need to distribute under other terms, write to the
author.
The main libraries used by these programs (zlib, regex, libmaa)
are distributed under different terms, so you may be able to use the
libraries for applications which are incompatible with the GPL -- please see
the copyright notices and license information that come with the libraries
for more information, and consult with your attorney to resolve these
issues.
The regular expression searches do not ignore non-whitespace,
non-alphanumeric characters as do the other searches. In practice, this
isn't much of a problem.
Conformance of regular expressions (used by 're' and 'regexp'
search strategies) to ERE and BRE depends on library you build dictd with.
Whether 're' and 'regex' strategies support utf8 depends on library you
build dictd with.
- /etc/dictd/dictd.conf
- dictd configuration file
- /usr/sbin/dictd
- dictd daemon itself
- /var/run/dictd.pid
- File for storing pid of dictd daemon
- /usr/share/dictd
- The default directory for dictd databases (.index and .dict[.dz]
files)