rbldnsd(8) | System Manager's Manual | rbldnsd(8) |
rbldnsd - DNS daemon suitable for running DNS-based blocklists
rbldnsd options zone:dataset...
rbldnsd is a small DNS-protocol daemon which is designed to handle queries to DNS-based IP-listing or NAME-listing services. Such services are a simple way to share/publish a list of IP addresses or (domain) names which are "listed" for for some reason, for example in order to be able to refuse a service to a client which is "listed" in some blocklist.
rbldnsd is not a general-purpose nameserver. It will answer to A and TXT (and SOA and NS if such RRs are specified) queries, and has limited ability to answer to some other types of queries.
rbldnsd tries to handle data from two different perspectives: given a set (or several) of "listed entries" (e.g. IP address ranges or domain names), it builds and serves a DNS zone. Note the two are not the same: list of spammer's IPs is NOT a DNS zone, but may be represented and used as such, provided that some additional information necessary to build complete DNS zone (e.g. NS and SOA records, maybe A records necessary for http to work) is available. In this context, rbldnsd is very different from other general-purpose nameservers such as BIND or NSD: rbldnsd operates with datasets (sets of entries - IP addresses or domain names, logically grouped together), while other general-purpose nameservers operates with zones. The way how rbldnsd operates may be somewhat confusing to BIND experts.
For rbldnsd, a building block is a dataset: e.g., set of insecure/abuseable hosts (IP addresses), set of network ranges that belongs to various spam operations (IP ranges), domain names that belong to spammers (RHSBL) and so on. Usually, different kind of information is placed into separate file, for easy maintenance. From a number of such datasets, rbldnsd constructs a number of DNS zones as specified on command line. A single dataset may be used for several zones, and a single zone may be constructed from several datasets.
rbldnsd will answer queries to DNS zones specified on the command line as a set of zone specifications. Each zone specification consists of zone basename, dataset type and a comma-separated list of files that forms a given dataset: zone:type:file,file,...
Several zones may be specified in command line, so that rbldnsd will answer queries to any of them. Also, a single zone may be specified several times with different datasets, so it is possible to form a zone from a combination of several different dataset. The same dataset may be reused for several zones too (and in this case, it will be read into memory only once).
There are several dataset formats available, each is suitable and optimized (in terms of memory, speed and ease of use) for a specific task. Available dataset types may be grouped into the following categories:
The following options may be specified:
where timestamp is unix time (secounds since epoch), zone is the name of the base zone, qtot is the total number of queries received, qok - number of positive replies, qnxd - number of NXDOMAIN replies, bin is the total number of bytes read from network (excluding IP/UDP overhead and dropped packets), bout is the total number of bytes written to network. Ther are as many such tuples as there are zones, and one extra, total typle at the end, with zone being "*", like:
timestamp zone:qtot:qok:qnxd:bin:bout zone:...
Note the total values may be larger than the sum of per-zone values, due to queries made against unlisted zones, or bad/broken packets.
1234 bl1.ex:10:5:4:311:432 bl2.ex:24:13:7:248:375 *:98:35:12:820:987
Rbldnsd will write bare timestamp to statsfile when it is starting up, shutting down or when statistic counters are being reset after receiving SIGUSR2 signal (see below), to indicate the points where the counters are starting back from zero.
By default, rbldnsd writes absolute counter values into statsfile (number of packets (bytes) since startup or last reset). statsfile may be prefixed with plus sign (+), in which case rbldnsd will write delta values, that is, number of packets or bytes since last write, or number of packets (bytes) per unit of time ("incremental" mode, hence the "+" sign).
Dataset files are text files which are interpreted depending on type specified in command line. Empty lines and lines starting with hash character (#) or semicolon (;) are ignored, except for a special case outlined below in section titled "Special Entries".
A (comma-separated) list of files in dataset specification (in type:file,file,...) is interpreted as if all files where logically combined into one single file.
When compiled with zlib support, rbldnsd is able to read gzip-compressed data files. So, every file in dataset specification can be compressed with gzip(1), and rbldnsd will read such a file decompressing it on-the-fly. This feature may be turned off by specifying -C option.
rbldnsd is designed to service a DNSBL, where each entry have single A record and optional TXT record assotiated with it. rbldnsd allows to specify A value and TXT template either for each entry individually, or to use default A value and TXT template pair for a group of entries. See section "Resulting A values and TXT templates" below for a way to specify them.
If a line starts with a dollar sign ($), hash character and a dollar sign (#$), semicolon and dollar sign (;#) or colon and a dollar sign (:$), it is interpreted in a special way, regardless of dataset type (this is one exception where a line starting with hash character is not ignored - to be able to use zone files for both rbldnsd and for DJB's rbldns). The following keywords, following a dollar sign, are recognized:
This constraint is active for a dataset it is specified in, and can be owerwritten (by subsequent $MAXRANGE statement) by a smaller value, but can not be increased.
$MAXRANGE4 /24
$MAXRANGE4 256
A set of IP addresses or CIDR address ranges, together with A and TXT resulting values. IP addresses are specified one per line, by an IP address prefix (initial octets), complete IP address, CIDR range, or IP prefix range (two IP prefixes or complete addresses delimited by a dash, inclusive). Examples, to specify 127.0.0.0/24:
127.0.0.0/24
127.0.0
127/24
127-127.0.0
127.0.0.0-127.0.0.255
127.0.0.1-255
to specify 127.16.0.0-127.31.255.255:
127.16.0.0-127.31.255.255
127.16.0-127.31.255
127.16-127.31
127.16-31
127.16.0.0/12
127.16.0/12
127.16/12
Note that in prefix range, last boundary is completed with all-ones (255), not all-zeros line with first boundary and a prefix alone. In prefix ranges, if last boundary is only one octet (127.16-31), it is treated as "suffix", as value of last specified octet of the first boundary prefix (127.16.0-31 is treated as 127.16.0.0-127.16.31.255, i.e. 127.16.0.0/19).
After an IP address range, A and TXT values for a given entry may be specified. If none given, default values in current scope (see below) applies. If a value starts with a colon, it is interpreted as a pair of A record and TXT template, delimited by colon (:127.0.0.2:This entry is listed). If a value does not start with a colon, it is interpreted as TXT template only, with A record defaulting to the default A value in current scope.
IP address range may be followed by a comment char (either hash character (#) or semicolon (;)), e.g.:
In this case all characters up to the end of line are ignored, and default A and TXT values will be used for this IP range.
127/8 ; loopback network
Every IP address that fits within any of specified ranges is "listed", and rbldnsd will respond to reverse queries against it within specified zone with positive results. In contrast, if an entry starts with an exclamation sign (!), this is an exclusion entry, i.e. corresponding address range is excluded from being listed (and any value for this record is ignored). This may be used to specify large range except some individual addresses, in a compact form.
If a line starts with a colon (:), this line specifies the default A value and TXT template to return (see below) for all subsequent entries up to end of current file. If no default entry specified, and no value specified for a given record, rbldnsd will return 127.0.0.2 for matching A queries and no record for matching TXT queries. If TXT record template is specified and contains occurences of of dollar sign ($), every such occurence is replaced with an IP address in question, so singe TXT template may be used to e.g. refer to a webpage for an additional information for a specific IP address.
Set of IP4 CIDR ranges with corresponding (A, TXT) values. This dataset is similar to ip4set, but uses a different internal representation. It accepts CIDR ranges only (not a.b.c.d-e.f.g.h), and allows for the specification of A/TXT values on a per CIDR range basis. (If multiple CIDR ranges match a query, the value for longest matching prefix is returned.) Exclusions are supported too.
This dataset is not particularly memory-efficient for storing many single IP addresses — it uses about 50% more memory than the ip4set dataset in that case. The ip4trie dataset is better adapted, however, for listing CIDR ranges (whose lengths are not a multiple of 8 bits.)
"trivial" ip4set: a set of single IP addresses (one per line), with the same A+TXT template. This dataset type is more efficient than ip4set (in both memory usage and access times), but have obvious limitation. It is intended for DNSBLs like DSBL.org, ORDB.org and similar, where each entry uses the same default A+TXT template. This dataset uses only half a memory for the same list of IP addresses compared to ip4set.
Set of IP6 CIDR ranges. This is the IP6 equivalent of the ip4trie dataset. It allows the sepecification of individual A/TXT values for each CIDR range and supports exclusions. Compressed ("::") ip6 notation is supported.
Example zone data:
# Default A and TXT template valuse
:127.0.1.2: Listed, see http://example.com/lookup?$
# A listing, note that trailing :0s can be omitted
2001:21ab:c000/36
# /64 range with non-default A and TXT values
2001:21ab:def7:4242 :127.0.1.3: This one smells funny
# compressed notation
2605:6001:42::/52
::1 # localhost
!2605:6001:42::bead # exclusion
"Trivial" ip6 dataset: a set of /64 IP6 CIDR ranges (one per line), all sharing a single A+TXT template. Exclusions of single IP6 (/128) addresses are also supported. This dataset type is quite memory-efficient — it uses about 40% of the memory that the ip6trie dataset would use — but has obvious limitations.
This dataset wants the /64s listed as four ip6 words, for example:
Exclusions are denoted with a leading exclamation mark. You may also use compressed "::" notation for excluded addresses. E.g.:
2001:20fe:23:41ed
abac:adab:ad00:42f
!abac:adab:ad00:42f:face:0f:a:beef
!abac:adab:ad00:42f::2
Set of (possible wildcarded) domain names with associated A and TXT values. Similar to ip4set, but instead of IP addresses, data consists of domain names (not in reverse form). One domain name per line, possible starting with wildcard (either with star-dot (*.) or just a dot). Entry starting with exclamation sign is exclusion. Default value for all subsequent lines may be specified by a line starting with a colon.
Wildcards are interpreted as follows:
This dataset type may be used instead of ip4set, provided all CIDR ranges are expanded and reversed (but in this case, TXT template will be expanded differently).
Generic type, simplified bind-style format. Every record should be on one line (line continuations are not supported), and should be specified completely (i.e. all domain names in values should be fully-qualified, entry name may not be omitted). No wildcards are accepted. Only A, TXT, and MX records are recognized. TTL value may be specified before record type. Examples:
# bl.ex.com
# specify some values for current zone
$NS 0 ns1.ex.com ns2.ex.com
# record with TTL
www 3000 A 127.0.0.1
about TXT "ex.com combined blocklist"
This is a special dataset that stores no data by itself but acts like a container for several other datasets of any type except of combined type itself. The data file contains an optional common section, where various specials are recognized like $NS, $SOA, $TTL (see above), and a series of sections, each of which defines one (nested) dataset and several subzones of the base zone, for which this dataset should be consulted. New (nested) dataset starts with a line
$DATASET type[:name] subzone subzone...
and all subsequent lines up to the end of current file or to next $DATASET line are interpreted as a part of dataset of type type, with optional name (name is used for logging purposes only, and the whole ":name" (without quotes or square brackets) part is optional). Note that combined datasets cannot be nested. Every subzone will always be relative to the base zone name specified on command line. If subzone specified as single character "@", dataset will be connected to the base zone itself.
This dataset type aims to simplify subzone maintenance, in order to be able to include several subzones in one file for easy data transfer, atomic operations and to be able to modify list of subzones on remote secondary nameservers.
Example of a complete dataset that contains subzone `proxies' with a list of open proxies, subzone `relays' with a list of open relays, subzone `multihop' with output IPs of multihop open relays, and the base zone itself includes proxies and relays but not multihops:
# common section
$NS 1w ns1.ex.com ns2.ex.com
$SOA 1w ns1.ex.com admin.ex.com 0 2h 2h 1w 1h
# list of open proxies,
# in `proxies' subzone and in base zone
$DATASET ip4set:proxy proxies @
:2:Open proxy, see http://bl.ex.com/proxy/$
127.0.0.2
127.0.0.10
# list of open relays,
# in `relays' subzone and in base zone
$DATASET ip4set:relay relays @
:3:Open relay, see http://bl.ex.com/relay/$
127.0.0.2
127.0.2.10
# list of optputs of multistage relays,
# in `multihop' subzone only
$DATASET ip4set:multihop-relay multihop
:4:Multihop open relay, see http://bl.ex.com/relay/$
127.0.0.2
127.0.9.12
# for the base zone and all subzones,
# include several additional records
$DATASET generic:common proxies relays multihop @
@ A 127.0.0.8
www A 127.0.0.8
@ MX 10 mx.ex.com
# the above results in having the following records
# (provided that the base zone specified is bl.ex.com):
# proxies.bl.ex.com A 127.0.0.8
# www.proxies.bl.ex.com 127.0.0.8
# relays.bl.ex.com A 127.0.0.8
# www.relays.bl.ex.com 127.0.0.8
# multihop.bl.ex.com A 127.0.0.8
# www.multihop.bl.ex.com 127.0.0.8
# bl.ex.com A 127.0.0.8
# www.bl.ex.com 127.0.0.8
Note that $NS and $SOA values applies to the base zone only, regardless of the placement in the file. Unlike the $TTL values and $n substitutions, which may be both global and local for a given (sub-)dataset.
In all zone file types except generic, A values and TXT templates are specified as following:
:127.0.0.2:Blacklisted: http://example.com/bl?$
If a line starts with a colon, it specifies default A and TXT for all subsequent entries in this dataset. Similar format is used to specify values for individual records, with the A value (enclosed by colons) being optional:
127.0.0.2 :127.0.0.2:Blacklisted: http://example.com/bl?$
or, without specific A value:
127.0.0.2 Blacklisted: http://example.com/bl?$
Two parts of a line, delimited by second colon, specifies A and TXT record values. Both are optional. By default (either if no default line specified, or no IP address within that line), rbldnsd will return 127.0.0.2 as A record. 127.0.0 prefix for A value may be omitted, so the above example may be simplified to:
There is no default TXT value, so rbldnsd will not return anything for TXT queries unless TXT isn't specified.
:2:Blacklisted: http://example.com/bl?$
When A value is specified for a given entry, but TXT template is omitted, there may be two cases interpreted differently, namely, whenever there's a second semicolon (:) after the A value. If there's no second semicolon, default TXT value for this scope will be used. In contrast, when second semicolon is present, no TXT template will be generated at all. All possible cases are outlined in the following example:
# default A value and TXT template
:127.0.0.2:IP address $ is listed
# 127.0.0.4 will use default A and TXT
127.0.0.4
# 127.0.0.5 will use specific A and default TXT
127.0.0.5 :5
# 127.0.0.6 will use specific a and no TXT
127.0.0.6 :6:
# 127.0.0.7 will use default A and specific TXT
127.0.0.7 IP address $ running an open relay
In a TXT template, references to substitution variables are replaced with values of that variables. In particular, single dollar sign ($) is replaced by a listed entry (an IP address in question for IP-based datasets and the domain name for domain-based datasets). $n-style constructs, where n is a single digit, are replaced by a substitution variable $n defined for this dataset in current scope (see section "Special Entries" above). To specify a dollar sign as-is, use $$.
For example, the following lines:
will result in the following text to be generated:
$1 See http://www.example.com/bl
$2 for details
127.0.0.2 $1/spammer/$ $2
127.0.0.3 $1/relay/$ $2
127.0.0.4 This spammer wants some $$$$. $1/$
See http://www.example.com/bl/spammer/127.0.0.2 for details
See http://www.example.com/bl/relay/127.0.0.3 for details
This spammer wants some $$. See http://www.example.com/bl/127.0.0.4
If the "base template" ($= variable) is defined, this template is used for expansion, instead of the one specified for an entry being queried. Inside the base template, $= construct is substituted with the text given for individual entries. In order to stop usage of base template $= for a single record, start it with = (which will be omitted from the resulting TXT value). For example,
produces the following TXT records:
$= See http://www.example.com/bl?$= ($) for details
127.0.0.2 r123
127.0.0.3
127.0.0.4 =See other blocklists for details about $
See http://www.example.com/bl?r123 (127.0.0.2) for details
See http://www.example.com/bl?127.0.0.3 (127.0.0.3) for details
See other blocklists for details about 127.0.0.4
This is not a real dataset, while the syntax and usage is the same as with other datasets. Instead of defining which records exists in a given zone and which do not, the acl dataset specifies which client hosts (peers) are allowed to query the given zone. The dataset specifies a set of IPv4 and/or IPv6 CIDR ranges (with the syntax exactly the same as understood by the ip4trie and ip6trie datasets), together with action specifiers. When a query is made from an IP address listed (not for the IP address), the specified action changes rules used to construct the reply. Possible actions and their meanings are:
Only one ACL dataset can be specified for a given zone, and each
zone must have at least one non-acl dataset. It is also possible to specify
one global ACL dataset, by specifying empty zone name (which is not allowed
for other dataset types), like
rbldnsd ... :acl:filename...
For this dataset type, only a few $-style specials are recognized. In particular, $SOA and $NS keywords are not allowed. When rbldnsd performs $ substitution in the TXT template returned from ACL dataset, it will use client IP address to substitute for a single $ character, instead of the IP address or domain name found in the original query.
Rbldnsd handles the following signals:
Some unsorted usage notes follows.
When creating a data file for rbldnsd (and for anything else, it is a general advise), it is a good idea to create the data in temporary file and rename the temp file when all is done. Never try to write to the main file directly, it is possible that at the same time, rbldnsd will try to read it and will get incomplete data as the result. The same applies to copying data using cp(1) utility and similar (including scp(1)), that performs copying over existing data. Even if you're sure noone is reading the data while you're copying or generating it, imagine what will happen if you will not be able to complete the process for whatever reason (interrupt, filesystem full, endless number of other reasons...). In most cases is better to keep older but correct data instead of leaving incomplete/corrupt data in place.
Right:
Wrong:
scp remote:data target.tmp && mv target.tmp target
Right:
scp remote:data target
Wrong:
./generate.pl > target.tmp && mv target.tmp target
./generate.pl > target
From this point of view, rsync(1) command seems to be safe, as it always creates temporary file and renames it to the destination only when all is ok (but note the --partial option, which is good for downloading something but may be wrong to transfer data files -- usually you don't want partial files to be loaded). In contrast, scp(1) command is not safe, as it performs direct copying. You may still use scp(1) in a safe manner, as shown in the example above.
Also try to eliminate a case when two (or more) processes performs data copying/generation at the same time to the same destination. When your data is generated by a cron job, use file locking (create separate lock file (which should never be removed) and flock/fcntl it in exclusive mode without waiting, exiting if lock fails) before attempting to do other file manipulation.
All keys specified in dataset files are always relative to the zone base DN. In contrast, all the values (NS and SOA records, MX records in generic dataset) are absolute. This is different from BIND behaviour, where trailing dot indicates whenever this is an absolute or relative DN. Trailing dots in domain names are ignored by rbldnsd.
Several zones may be served by rbldnsd, every zone may consist of several datasets. There are numerous ways to combine several data files into several zones. For example, suppose you have a list of dialup ranges in file named `dialups', and a list of spammer's ip addresses in file named `spammers', and want to serve 3 zones with rbldnsd: dialups.bl.ex.com, spam.bl.ex.com and bl.ex.com which is a combination of the two. There are two ways to do this:
rbldnsd options... \
dialups.bl.ex.com:ip4set:dialups \
spam.bl.ex.com:ip4set:spammers \
bl.ex.com:ip4set:dialups,spammers
or:
rbldnsd options... \
dialups.bl.ex.com:ip4set:dialups \
spam.bl.ex.com:ip4set:spammers \
bl.ex.com:ip4set:dialups \
bl.ex.com:ip4set:spammers
(note you should specify combined bl.ex.com zone after all its subzones in a command line, or else subzones will not be consulted at all).
In the first form, there will be 3 independent data sets, and every record will be stored 2 times in memory, but only one search in internal data structures will be needed to resolve queries for aggregate bl.ex.com. In second form, there will be only 2 data sets, every record will be stored only once (both datasets will be reused), but 2 searches will be performed by rbldnsd to answer queries against aggregate zone (but difference in speed is almost unnoticeable). Note that when aggregating several data files into one dataset, an exclusion entry in one file becomes exclusion entry in the whole dataset (which may be a problem when aggregating dialups, where exclusions are common, with open relays/proxies, where exclusions are rare if at all used).
Similar effect may be achieved by using combined dataset type, sometimes more easily. combined dataset results in every nested dataset to be used independantly, like in second form above.
combined dataset requires rbldnsd to be the authoritative nameserver for the whole base zone. Most important, one may specify SOA and NS records for the base zone only. So, some DNSBLs which does not use a common subzone for the data, cannot use this dataset. An example being DSBL.org DNSBL, where each of list.dsbl.org, multihop.dsbl.org and unconfirmed.dsbl.org zones are separate, independant zones with different set of nameservers. But for DSBL.org, where each dataset is really independant and used only once (there's no (sub)zone that is as a combinations of other zones), combined dataset isn't necessary. In contrast, SORBS.net zones, where several subzones used and main zone is a combination of several subzones, combined dataset is a way to go.
When you have several nameservers for your zone, set them all in a similar way. Namely, if one is set up using combined dataset, all the rest should be too, or else DNS meta-data will be broken. This is because metadata (SOA and NS) records returned by nameservers using combined and other datasets will have different origin. With combined dataset, rbldnsd return NS and SOA records for the base zone, not for any subzone defined inside the dataset. Given the above example with dialups.bl.ex.com, spammers.bl.ex.com and aggregate bl.ex.com zones, and two nameservers, first is set up in any ways described above (using individual datasets for every of the 3 zones), and second is set up for the whole bl.ex.com zone using combined dataset. In this case, for queries against dialups.bl.ex.com, first nameserver will return NS records like
dialups.bl.ex.com. IN NS a.ns.ex.com.
while second will always use base zone, and NS records will look like
bl.ex.com. IN NS a.ns.ex.com.
All authoritative nameservers for a zone must have consistent metadata records. The only way to achieve this is to use similar configuration (combined or not) on all nameservers. Have this in mind when using other software for a nameserver.
generic dataset type is very rudimentary. It's purpose is to complement all the other type to form complete nameserver that may answer to A, TXT and MX queries. This is useful mostly to define A records for HTTP access (relays.bl.example.com A, www.bl.example.com A just in case), and maybe descriptive texts as a TXT record.
Since rbldnsd only searches one, most closely matching (sub)zone for every request, one cannot specify a single e.g. generic dataset in form
for several (sub)zones, each of which are represented as a zone too (either in command line or as combined dataset). Instead, several generic datasets should be specified, separate one for every (sub)zone. If the data for every subzone is the same, the same, single dataset may be used, but it should be specified for every zone it should apply to (see combined dataset usage example above).
proxies TXT list of open proxies
www.proxies A 127.0.0.8
relays TXT list of open relays
www.relays A 127.0.0.9
Most of the bugs outlined in this section aren't really bugs, but present due to non-standartized and thus unknown expected behaviour of a nameserver that serves a DNSBL zone. rbldnsd matches BIND runtime behaviour where appropriate, but not always.
rbldnsd lowercases some domain names (the ones that are lookup keys, e.g. in `generic' and `dnset' datasets) when loading, to speed up lookup operations. This isn't a problem in most cases.
There is no TCP mode. If a resource record does not fit in UDP packet (512 bytes), it will be silently ignored. For most usages, this isn't a problem, because there should be only a few RRs in an answer, and because one record is usually sufficient to decide whenever a given entry is "listed" or not. rbldnsd isn't a full-featured nameserver, after all.
rbldnsd will not always return a list of nameserver records in the AUTHORITY section of every positive answer: NS records will be provided (if given) only if there's a room for them in single UDP packet. If records does not fit, AUTHORITY section will be empty.
rbldnsd does not allow AXFR operations. For DNSBLs, AXFR is the stupidiest yet common thing to do - use rsync for zone transfers instead. This isn't a bug in rbldnsd itself, but in common practice of using AXFR and the like to transfer huge zones in a format which isn't suitable for such a task. Perhaps in the future, if there will be some real demand, I'll implement AXFR "server" support (so that rbldnsd will be able to act as master for BIND nameservers, but not as secondary), but the note remains: use rsync.
rbldnsd truncates all TXT records to be at most 255 bytes. DNS specs allows longer TXTs, but long TXTs is something that should be avoided as much as possible - TXT record is used as SMTP rejection string. Note that DNS UDP packet is limited to 512 bytes. rbldnsd will log a warning when such truncation occurs.
This manpage corresponds to rbldnsd version 0.998.
The rbldnsd daemon written by Michael Tokarev <mjt+rbldnsd@corpit.ru>, based on ideas by Dan Bernstein and his djbdns package, with excellent contributions by Geoffrey T. Dairiki <dairiki@dairiki.org>.
Mostly GPL, with some code licensed under 3-clause BSD license.
Dec 2015 |