DOKK / manpages / debian 11 / dicod / dicod.conf.5.en

GNU Dico Reference

NAME

dicod.conf - GNU dictionary server configuration file.

DESCRIPTION

The file /etc/dicod.conf contains configuration settings and database definitions for the GNU dictionary server dicod(8). The server reads this file once, upon startup, and uses the settings until it is shut down or the HUP signal is delivered, in which case previous configuration settings are discarded and the file is re-read.

NOTE

This manpage is a short description of the dicod.conf configuration file. For a detailed discussion, including examples and usage recommendations, refer to the GNU Dico Manual available in texinfo format. If the info reader and GNU Dico documentation are properly installed on your system, the command

info dico

should give you access to the complete manual.

You can also view the manual using the info mode in emacs(1), or find it in various formats online at

http://www.gnu.org.ua/software/dico/manual

If any discrepancies occur between this manpage and the GNU Dico Manual, the later shall be considered the authoritative source.

LEXICAL STRUCTURE

There are three classes of lexical tokens: words, quoted strings, and separators. Blanks, tabs, newlines and comments, collectively called white space are ignored except as they serve to separate tokens. Some white space is required to separate otherwise adjacent keywords and values.

Words

A word is a sequence of letters, digits, and any of the following characters: _, -, ., /, @, *, :, [, ].

Strings

A quoted string is any sequence of characters enclosed in double-quotes ("). A backslash appearing within a quoted string introduces an escape sequence, which is replaced with a single character according to the following rules:

	Sequence	Expansion	ASCII
	\\	\	134
	\"	"	042
	\a	audible bell	007	
	\b	backspace	010
	\f	form-feed	014
	\n	new line	012
	\r	charriage return	015
	\t	horizontal tabulation	011
	\v	vertical tabulation	013

In addition, the sequence \newline is removed from the string. This allows to split long strings over several physical lines, e.g.:

"a long string may be\


 split over several lines"

If the character following a backslash is not one of those specified above, the backslash is ignored and a warning is issued.

Two or more adjacent quoted strings are concatenated, which gives another way to split long strings over several lines to improve readability. The following fragment produces the same result as the example above:

"a long string may be"
" split over several lines"

A here-document is a special construct that allows to introduce strings of text containing embedded newlines.

The <<word construct instructs the parser to read all the following lines up to the line containing only word, with possible trailing blanks. Any lines thus read are concatenated together into a single string. For example:

<<EOT
A multiline
string
EOT

The body of a here-document is interpreted the same way as a double-quoted string, unless word is preceded by a backslash (e.g. <<\EOT) or enclosed in double-quotes, in which case the text is read as is, without interpretation of escape sequences.

If word is prefixed with - (a dash), then all leading tab characters are stripped from input lines and the line containing word. Furthermore, - is followed by a single space, all leading whitespace is stripped from them. This allows to indent here-documents in a natural fashion. For example:

<<- TEXT


    The leading whitespace will be


    ignored when reading these lines.
TEXT

It is important that the terminating delimiter be the only token on its line. The only exception to this rule is allowed if a here-document appears as the last element of a statement. In this case a semicolon can be placed on the same line with its terminating delimiter, as in:

help-text <<-EOT


    A sample help text.
EOT;

Comments

The usual comment styles are supported:

C style: /* */

C++ style: // to end of line

Unix style: # to end of line

Pragmatic comments are similar to the usual single-line comments, except that they cause some changes in the way the configuration is parsed. Pragmatic comments begin with a # sign and end with the next physical newline character.

#include <FILE>
#include FILE: Include the contents of the file file. Both forms are equivalent. The FILE must be an absolute file name.
#include_once <FILE>
#include_once FILE: Same as #include, except that, if the FILE has already been included, it will not be included again.
#line num
#line num "FILE": This line causes the parser to believe, for purposes of error diagnostics, that the line number of the next source line is given by num and the current input file is named by FILE. If the latter is absent, the remembered file name does not change.
# num "FILE": This is a special form of the #line statement, understood for compatibility with the C preprocessor.

STATEMENTS

A simple statement consists of a keyword and value separated by any amount of whitespace. Some statements take more than one value. Simple statement is terminated with a semicolon (;).

The following is a simple statement:

pidfile /var/run/direvent.pid;

See below for a list of valid simple statements.

A value can be one of the following:

number: A number is a sequence of decimal digits.
boolean: A boolean value is one of the following: yes, true, t or 1, meaning true, and no, false, nil, 0 meaning false.
word
quoted string
list: A comma-separated list of values, enclosed in parentheses.

Block Statement

A block statement introduces a logical group of statements. It consists of a keyword, followed by an optional value, called a tag, and a sequence of statements enclosed in curly braces, as shown in the example below:

acl global {


   allow all from 198.51.100.0/24;


   deny all;
}

The closing curly brace may be followed by a semicolon, although this is not required.

SERVER SETTINGS

user NAME: Run with the privileges of this user. The argument is either a user name, or UID prefixed with a plus sign.
group LIST: If the user statement is present, dicod will drop all supplementary groups and switch to the principal group of that user. Sometimes, however, it may be necessary to retain one or more supplementary groups. For example, this might be necessary to access dictionary databases. The group statement retains the supplementary groups listed in LIST. Each group can be specified either by its name or by its GID number, prefixed with @samp{+}, e.g.:

user nobody;
group (man, dict +88);

This statement is ignored if no user statement is present or if dicod is running in inetd mode.

mode daemon|inetd: Sets server operation mode.
listen LIST: Specify the IP addresses and ports to listen on in daemon mode. By default, dicod will listen on port 2628 on all existing interfaces.
Elements of LIST can have the following forms:

HOST:PORT

Specifies an IP (version 4 or 6) socket to listen on. The HOST part is either an IPv4 in ``dotted-quad'' notation, or an IPv6 address in square brackets, or a host name. In the latter case, dicod will listen on all IP addresses corresponding to its A and AAAA DNS records.

The PORT part is either a numeric port number or a symbolic service name from the /etc/services file.

Either of the two parts may be omitted. If HOST is omitted, the server will listen on all interfaces. If PORT is omitted, the default port 2628 will be used.

inet://HOST:PORT, inet4://HOST:PORT

Listen on IPv4 socket. HOST is either an IP address or a host name. In the latter case, dicod will start listening on all IP addresses from the A records for this host.

Either HOST or PORT (but not both) can be omitted. Missing HOST defaults to IPv4 addresses on all available network interfaces, and missing PORT defaults to 2628.

inet6://HOST:PORT

Listen on IPv6 socket. HOST is either an IPv6 address in square brackets, or a host name. In the latter case, dicod will start listening on all IP addresses from the AAAA records for this host.

Either HOST or PORT (but not both) can be omitted. Missing HOST defaults to IPv6 addresses on all available network interfaces, and missing PORT defaults to 2628.

FILENAME, unix://FILENAME

Specifies the name of a UNIX socket to listen on. FILENAME must be an absolute file name of the socket.

pidfile STRING: Store PID of the master process in this file. Default is /var/run/dicod.pid.
max-children NUMBER: Sets maximum number of subprocesses that can run simultaneously. This is equivalent to the number of clients that can simultaneously use the server. The default is 64.
inactivity-timeout NUMBER: Sets inactivity timeout to the NUMBER of seconds. The server disconnects automatically if the remote client has not sent any command within this number of seconds. Setting timeout to 0 disables inactivity timeout (the default).
This statement along with max-children allows you to control the server load.
shutdown-timeout NUMBER: When the master server is shutting down, wait this number of seconds for all children to terminate. Default is 5 seconds.
identity-check BOOLEAN: Enable identification check using AUTH protocol (RFC 1413). The received user name or UID can be shown in access log using the %l conversion (see below).
ident-keyfile STRING: Use encryption keys from the named file to decrypt AUTH replies encrypted using DES.
ident-timeout NUMBER: Set timeout for AUTH input/output operation to NUMBER of seconds. Default timeout is 3 seconds.

AUTHENTICATION SETTINGS

The authentication database is defined as:

user-db URL {


    # Additional configuration options.


    options STRING;


    # Name of the password resource.


    password-resource RESOURCE;


    # Name of the resource returning user group information.


    group-resource RESOURCE;
}

The URL consists of the following parts (square brackets denoting optional ones):

TYPE://[[USER[:PASSWORD]@]HOST]/PATH[PARAMS]

where:

TYPE: Database type. Two types are supported: text and ldap.
USER: User name, if necessary to access the database.
PASSWORD: User password, if necessary to access the database.
HOST: Domain name or IP address of a machine running the database.
PATH: A path to the database. The exact meaning of this element depends on the database protocol. See the texinfo documentation.
PARAMS: A list of protocol-dependent parameters. Each parameter is of the form KEYWORD=NAME, multiple parameters are separated with semicolons.

The following statements can appear within the user-db block:

options STRING: Pass additional options to the underlying mechanism. The argument is treated as an opaque string and passed to the authentication open procedure verbatim. Its exact meaning depends on the type of the database.
password-resource ARG: A database resource which returns the user's password.
group-resource ARG: A database resource which returns the list of groups this user is member of.

The exact semantics of the database resource depends on the type of database being used. For flat text databases, it means the name of a text file that contains these data, for LDAP databases, the resource is the filter string, etc. Please refer to the GNU Dico Manual, subsection 4.3.3 Authentication for a detailed discussion.

SASL AUTHENTICATION

The SASL authentication is available if the server was compiled with GNU SASL. It is configured using the following statement:

sasl {


    # Disable SASL mechanisms listed in MECH.


    disable-mechanism MECH;


    # Enable SASL mechanisms listed in MECH.


    enable-mechanism MECH;


    # Set service name for GSSAPI and Kerberos.


    service NAME;


    # Set realm name for GSSAPI and Kerberos.


    realm NAME;


    # Define groups for anonymous users.


    anon-group GROUPS;
}

disable-mechanism MECH: Disable SASL mechanisms listed in MECH, which is a list of names.
enable-mechanism MECH: Enable SASL mechanisms listed in MECH, which is a list of names.
service NAME: Sets the service name for GSSAPI and Kerberos mechanisms.
realm NAME: Sets the realm name.
anon-group LIST: Declares the list of user groups considered anonymous.

ACCESS CONTROL LISTS

Define an ACL:

acl NAME {


    DEFINITION...
}

The parameter NAME assigns a unique name to that ACL. This name will be used by another configuration statements to refer to that ACL (see SECURITY SETTINGS, and Database Visibility).

Each DEFINITION is:

allow|deny [all|authenticated|group GROUPLIST] [acl NAME] [from ADDRLIST]

A definition starting with allow allows access to the resource, and the one starting with deny denies it.

The next part controls what users have access to the resource:

all: All users (the default).
authenticated: Only authenticated users.
group GROUPLIST: Authenticated users which are members of at least one of the groups listed in GROUPLIST.

The acl part refers to an already defined ACL.

The from keyword declares that the client IP must be within the ADDRLIST in order for the definition to apply. Elements of ADDRLIST are:

any: Matches any client address.
IP address: Matches if the request comes from the given IP (both IPv4 and IPv6 are allowed).
ADDR/NETLEN: Matches if first NETLEN bits from the client IP address equal to ADDR. The network mask length, NETLEN must be an integer number between 0 and 32 for IPv4, and between 0 and 128 for IPv6. The address part, ADDR, is as described above.
ADDR/NETMASK: The specifier matches if the result of logical AND between the client IP address and NETMASK equals to ADDR. The network mask must be specified in a IP address (either IPv4 or IPv6) notation.

SECURITY SETTINGS

connection-acl NAME: Use ACL NAME to control incoming connections. The ACL itself must be defined before this statement. Using the group clause in this ACL makes no sense, because the authentication itself is performed only after the connection have been established.
show-sys-info NAME: Controls whether to show system information in reply to SHOW SERVER command. The information will be shown only if ACL NAME allows it.
visibility-acl NAME: Sets name of the ACL that controls visibility of all databases.

LOGGING AND DEBUGGING

log-tag STRING: Prefix syslog messages with this string. By default, the program name is used.
log-facility STRING: Sets the syslog facility to use. Allowed values are: user, daemon, auth, authpriv, mail, cron, local0 through local7 (case-insensitive), or a decimal facility number.
log-print-severity BOOLEAN: Prefix diagnostics messages with a string identifying their severity.
transcript BOOLEAN: Controls the transcript of user sessions.

ACCESS LOG

GNU Dico provides a feature similar to Apache's CustomLog, which keeps a log of MATCH and DEFINE requests.

access-log-file STRING: Sets access log file name.
access-log-format STRING: Defines the format string. Its argument can contain literal characters, which are copied into the log file verbatim, and format specifiers, i.e. special sequences beginning with %, which are replaced in the log file as shown in the table below:

%%

The percent sign.

%a

Remote IP address.

%A

Local IP address.

%B

Size of response in bytes.

%b

Size of response in bytes in CLF format, i.e. a dash rather than a 0 when no bytes are sent.

%C

Remote client (from the CLIENT command).

%D

The time taken to serve the request, in microseconds.

%d

Request command verb in abbreviated form, suitable for use in URLs, i.e. d for DEFINE, and m for MATCH.

%h

Remote host.

%H

Request command verb (DEFINE or MATCH).

%l

Remote logname (from identd(1), if supplied). This will return a dash unless identity-check statement is set to true.

%m

The search strategy.

%p

The canonical port of the server serving the request.

%P

The PID of the child that served the request.

%q

The database from the request.

%r

Full request.

%{N}R

The Nth token from the request (N is 0-based).

%s

Reply status. For multiple replies, the form %s returns the status of the first reply, while %>s returns that of the last reply.

%t

Time the request was received in the standard Apache format, e.g.:



  [04/Jun/2008:11:05:22 +0300]

%{FORMAT}t

The time, in the form given by FORMAT, which should be a valid strftime(3) format string. The standard %t format is equivalent to



  [%d/%b/%Y:%H:%M:%S %z]

%T

The time taken to serve the request, in seconds.

%u

Remote user from AUTH command.

%v

The host name of the server serving the request.

%V

Actual host name of the server (in case it was overridden in configuration).

%W

The word from the request.

The absence of access-log-format statement is equivalent to the following:



  access-log-format "%h %l %u %t \"%r\" %>s %b";

GENERAL SETTINGS

initial-banner-text TEXT

Display TEXT in the textual part of the initial server reply.

hostname STRING

Sets the hostname. By default it is determined automatically.

The server hostname is used, among others, in the initial reply after the 220 and may also be displayed in the access log file using the %v escape (see ACCESS LOG).

server-info TEXT

Sets the server description to be shown in reply to the SHOW SERVER command.

It is common for TEXT to use the here-document syntax, e.g.:



  server-info <<EOT


    Welcome to the FOO dictionary service.


    Contact <dict@foo.example.org> if you have questions or


    suggestions.


  EOT;

help-text TEXT

Sets the text to be displayed in reply to the HELP command.

The default reply displays a list of commands understood by the server with a short description of each.

If TEXT begins with a plus sign, it will be appended to the default reply.

default-strategy NAME

Sets the name of the default matching strategy (*note MATCH::). By default, Levenshtein matching is used, which is equivalent to default-strategy lev;

CAPABILITIES

capability LIST: Requests additional capabilities from the LIST.

Capabilities are certain server features that can be enabled or disabled at the system administrator's will. The following capabilities are defined:

auth: The AUTH command is supported. See the section AUTHENTICATION, for its configuration.
mime: The OPTION MIME command is supported. Notice that RFC 2229 requires all servers to support that command, so you should always specify this capability.
xversion: The XVERSION command is supported. It is a GNU extension that displays the dicod implementation and version number.
xlev: The XLEV command is supported. This command allows the remote party to set and query maximal Levenshtein distance for the lev matching strategy.

The capabilities set using this directive are displayed in the initial server reply, and their descriptions are added to the HELP command output (unless specified otherwise by the help-text statement).

DATABASE MODULES

A database module is an external piece of software designed to handle a particular format of dictionary databases. This piece of software is built as a shared library that `dicod' loads at run time.

A handler is an instance of the database module loaded by dicod and configured for a specific database or a set of databases.

Database handlers are defined using the following block statement:

load-module NAME {


    command CMD;
}

The load-module statement creates an instance of a database module. The NAME argument specifies a unique name which will be used by subsequent parts of the configuration to refer to this handler. The command line for this handler is supplied with the command statement. It must begin with the name of the module (without the library suffix) and can contain any additional arguments. If the module name is not an absolute file name, the module will be searched in the module load path.

For example:

load-module dict {


   command "dictorg dbdir=/var/dicodb";
}

A simplified form of this statement:



    load-module NAME;

is equivalent to:



    load-module NAME {


        command NAME;


    }

A module load path is an internal list of directories which dicod scans in order to find a loadable file name specified in the command statement. By default the search order is as follows:

1.: Optional prefix search directories specified in the prepend-load-path statement (see below);
2.: GNU Dico module directory /usr/lib/x86_64-linux-gnu/dico;
3.: Additional search directories specified in the module-load-path statement (see below);
4.: The value of the environment variable LTDL_LIBRARY_PATH;
5.: The system dependent library search path (e.g. on GNU/Linux it is defined by the file /etc/ld.so.conf and the environment variable LD_LIBRARY_PATH).

The value of LTDL_LIBRARY_PATH and LD_LIBRARY_PATH must be a colon-separated list of absolute directory names.

In each of these directories, dicod first attempts to find and load the given filename. If this fails, it tries to append the following suffixes to it:

1.: the libtool archive suffix .la;
2.: the suffix used for native dynamic libraries on the host platform, e.g., .so, .sl, etc.

module-load-path LIST: Add directories from LIST to the end of the module load path.
prepend-load-path LIST: Add directories from LIST to the beginning of the module load path.

DATABASES

database {


    name WORD;


    description STRING;


    info TEXT;


    languages-from LANGLIST;


    languages-to LANGLIST;


    handler NAME;


    visibility-acl NAME;


    mime-headers TEXT;
}

name STRING: Sets the name of this database (a single word). This name will be used to identify this database in DICT commands.
handler STRING: Specifies the handler name for this database and optional arguments for it. This handler must be previously defined using the load-module statement (see above).
description STRING: Supplies a short description, to be shown in reply to the SHOW DB command. The STRING may not contain newlines.
info STRING: Defines a full description of the database. This description is shown in reply to the SHOW INFO command. It is usually a multi-line text, so it is common to use here-document syntax.
content-type STRING: Sets the content type of the reply (for use in MIME headers).
content-transfer-encoding VALUE: Sets transfer encoding to use when sending MIME replies for this database. VALUE is one of: base64, quoted-printable.
visibility-acl NAME: Sets name of the ACL that controls that database visibility.

STRATEGIES AND SEARCHES

A default search is a MATCH request with * or ! as the database argument. The former means search in all available databases, and the latter means search in all databases until a match is found.

Default searches cabd be quite expensive and can cause considerable strain on the server. For example, the command MATCH * priefix "" returns all entries from all available databases, which would consume a lot of resources both on the server and on the client side.

To minimize harmful effects from such potentially dangerous requests, the following statement makes it possible to limit the use of certain strategies in default searches:

strategy NAME {


    deny-all BOOL;


    deny-word CONDLIST;


    deny-length-lt NUMBER;


    deny-length-le NUMBER;


    deny-length-gt NUMBER;


    deny-length-ge NUMBER;


    deny-length-eq NUMBER;


    deny-length-ne NUMBER;
}

deny-all BOOL: Unconditionally deny the use of this strategy in default searches.
deny-word LIST: Deny this strategy if the search word matches one of the words from LIST.
deny-length-lt NUMBER: Deny if length of the search word is less than NUMBER.
deny-length-le NUMBER: Deny if length of the search word is less than or equal to NUMBER.
deny-length-gt NUMBER: Deny if length of the search word is greater than NUMBER.
deny-length-ge NUMBER: Deny if length of the search word is greater than or equal to NUMBER.
deny-length-eq NUMBER: Deny if length of the search word is equal to NUMBER.
deny-length-ne NUMBER: Deny if length of the search word is not equal to NUMBER.

For example, the following statement denies the use of prefix strategy in default searches if its argument is an empty string:

strategy prefix {


    deny-length-eq 0;
}

TUNING

While tuning your server, it is often necessary to get timing information which shows how much time is spent serving certain requests. This can be achieved using the following configuration directive:

timing BOOLEAN: Provide timing information after successful completion of an operation.

This information is displayed after replies to the following requests: MATCH, DEFINE, and QUIT. The format is:

[d/m/c = ND/NM/NC RTr UTu STs]

where:

ND: Number of processed define requests.
NM: Number of processed match requests.
NC: Number of comparisons made. This value may be inaccurate if the underlying database module does not provide such information.
RT: Real time spent serving the request.
UT: Time in user space spent serving the request.
ST: Time in kernel space spent serving the request.

You can also add timing information to your access log files. See the %T conversuion in section ACCESS LOG.

COMMAND ALIASES

Aliases allow a string to be substituted for a word when it is used as the first word of a command. The daemon maintains a list of aliases that are created using the alias configuration file statement:

alias WORD COMMAND: Creates a new alias.

Aliases may be recursive, i.e. the first word of COMMAND may refer to another alias. To prevent endless loops, recursive expansion is stopped if the first word of the replacement text is identical to an alias expanded earlier.

Aliases are useful to facilitate manual interaction with the server, as they allow the administrator to create abbreviations for some frequently typed commands. For example, the following alias creates new command d which is equivalent to DEFINE *:

alias d DEFINE "*";

AUTHORS

Sergey Poznyakoff

BUG REPORTS

Report bugs to <bug-dico@gnu.org.ua>.

COPYRIGHT

Copyright © 2008-2018 Sergey Poznyakoff
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.

August 20, 2018

GNU DICO