xtables-addons(8) | v3.13 (2020-11-20) | xtables-addons(8) |
Xtables-addons — additional extensions for iptables, ip6tables, etc.
The ACCOUNT target is a high performance accounting system for large local networks. It allows per-IP accounting in whole prefixes of IPv4 addresses with size of up to /8 without the need to add individual accouting rule for each IP address.
The ACCOUNT is designed to be queried for data every second or at least every ten seconds. It is written as kernel module to handle high bandwidths without packet loss.
The largest possible subnet size is 24 bit, meaning for example 10.0.0.0/8 network. ACCOUNT uses fixed internal data structures which speeds up the processing of each packet. Furthermore, accounting data for one complete 192.168.1.X/24 network takes 4 KB of memory. Memory for 16 or 24 bit networks is only allocated when needed.
To optimize the kernel<->userspace data transfer a bit more, the kernel module only transfers information about IPs, where the src/dst packet counter is not 0. This saves precious kernel time.
There is no /proc interface as it would be too slow for continuous access. The read-and-flush query operation is the fastest, as no internal data snapshot needs to be created&copied for all data. Use the "read" operation without flush only for debugging purposes!
Usage:
ACCOUNT takes two mandatory parameters:
The subnet 0.0.0.0/0 is a special case: all data are then stored in the src_bytes and src_packets structure of slot "0". This is useful if you want to account the overall traffic to/from your internet provider.
The data can be queried using the userspace libxt_ACCOUNT_cl library, and by the reference implementation to show usage of this library, the iptaccount(8) tool.
Here is an example of use:
iptables -A FORWARD -j ACCOUNT --addr 0.0.0.0/0 --tname all_outgoing; iptables -A FORWARD -j ACCOUNT --addr 192.168.1.0/24 --tname sales;
This creates two tables called "all_outgoing" and "sales" which can be queried using the userspace library/iptaccount tool.
Note that this target is non-terminating — the packet destined to it will continue traversing the chain in which it has been used.
Also note that once a table has been defined for specific CIDR address/netmask block, it can be referenced multiple times using -j ACCOUNT, provided that both the original table name and address/netmask block are specified.
For more information go to http://www.intra2net.com/en/developer/ipt_ACCOUNT/
Causes confusion on the other end by doing odd things with incoming packets. CHAOS will randomly reply (or not) with one of its configurable subtargets:
The randomness factor of not replying vs. replying can be set during load-time of the xt_CHAOS module or during runtime in /sys/modules/xt_CHAOS/parameters.
See http://inai.de/projects/chaostables/ for more information about CHAOS, DELUDE and lscan.
The DELUDE target will reply to a SYN packet with SYN-ACK, and to all other packets with an RST. This will terminate the connection much like REJECT, but network scanners doing TCP half-open discovery can be spoofed to make them belive the port is open rather than closed/filtered.
In conjunction with ebtables, DHCPMAC can be used to completely change all MAC addresses from and to a VMware-based virtual machine. This is needed because VMware does not allow to set a non-VMware MAC address before an operating system is booted (and the MAC be changed with `ip link set eth0 address aa:bb..`).
EXAMPLE, replacing all addresses from one of VMware's assigned vendor IDs (00:50:56) addresses with something else:
iptables -t mangle -A FORWARD -p udp --dport 67 -m physdev --physdev-in vmnet1 -m dhcpmac --mac 00:50:56:00:00:00/24 -j DHCPMAC --set-mac ab:cd:ef:00:00:00/24
iptables -t mangle -A FORWARD -p udp --dport 68 -m physdev --physdev-out vmnet1 -m dhcpmac --mac ab:cd:ef:00:00:00/24 -j DHCPMAC --set-mac 00:50:56:00:00:00/24
(This assumes there is a bridge interface that has vmnet1 as a port. You will also need to add appropriate ebtables rules to change the MAC address of the Ethernet headers.)
The DNETMAP target allows dynamic two-way 1:1 mapping of IPv4 subnets. A single rule can map a private subnet to a shorter public subnet, creating and maintaining unambiguous private-public IP address bindings. The second rule can be used to map new flows to a private subnet according to maintained bindings. The target allows efficient public IPv4 space usage and unambiguous NAT at the same time.
The target can be used only in the nat table in POSTROUTING or OUTPUT chains for SNAT, and in PREROUTING for DNAT. Only flows directed to bound addresses will be DNATed. The packet continues chain traversal if there is no free postnat address to be assigned to the prenat address. The default binding TTL is 10 minutes and can be changed using the default_ttl module option. The default address hash size is 256 and can be changed using the hash_size module option.
* /proc interface
The module creates the following entries for each new specified subnet:
The following write operations are supported via the procfs interface:
Note! Entries are removed if the last iptables rule for a specific prefix is deleted unless the persistent flag is set.
* Logging
The module logs binding add/timeout events to klog. This behaviour can be disabled using the disable_log module parameter.
* Examples
1. Map subnet 192.168.0.0/24 to subnets 20.0.0.0/26. SNAT only:
iptables -t nat -A POSTROUTING -s 192.168.0.0/24 -j DNETMAP --prefix 20.0.0.0/26
Active hosts from the 192.168.0.0/24 subnet are mapped to 20.0.0.0/26. If the packet from a not yet bound prenat address hits the rule and there are no free or timed-out (TTL<0) entries in prefix 20.0.0.0/28, then a notice is logged to klog and chain traversal continues. If packet from an already-bound prenat address hits the rule, the binding's TTL value is reset to default_ttl and SNAT is performed.
2. Use of --reuse and --ttl switches, multiple rule interaction:
iptables -t nat -A POSTROUTING -s 192.168.0.0/24 -j DNETMAP --prefix 20.0.0.0/26 --reuse --ttl 200
iptables -t nat -A POSTROUTING -s 192.168.0.0/24 -j DNETMAP --prefix 30.0.0.0/26
Active hosts from 192.168.0.0/24 subnet are mapped to 20.0.0.0/26 with TTL = 200 seconds. If there are no free addresses in first prefix, the next one (30.0.0.0/26) is used with the default TTL. It is important to note that the first rule SNATs all flows whose source address is already actively bound (TTL>0) to ANY prefix. The --reuse parameter makes this functionality work even for inactive (TTL<0) entries.
If both subnets are exhausted, then chain traversal continues.
3. Map 192.168.0.0/24 to subnets 20.0.0.0/26 in a bidirectional way:
iptables -t nat -A POSTROUTING -s 192.168.0.0/24 -j DNETMAP --prefix 20.0.0.0/26
iptables -t nat -A PREROUTING -j DNETMAP
If the host 192.168.0.10 generates some traffic, it gets bound to first free address in the subnet — 20.0.0.0. Now, any traffic directed to 20.0.0.0 gets DNATed to 192.168.0.10 as long as there is an active (TTL>0) binding. There is no need to specify --prefix parameter in a PREROUTING rule, because this way, it DNATs traffic to all active prefixes. You could specify the prefix you would like to make DNAT work for a specific prefix only.
4. Map 192.168.0.0/24 to subnets 20.0.0.0/26 with static assignments only:
iptables -t nat -A POSTROUTING -s 192.168.0.0/24 -j DNETMAP --prefix 20.0.0.0/26 --static
echo "+192.168.0.10:20.0.0.1"
>/proc/net/xt_DNETMAP/20.0.0.0_26
echo "+192.168.0.11:20.0.0.2" >/proc/net/xt_DNETMAP/20.0.0.0_26
echo "+192.168.0.51:20.0.0.3"
>/proc/net/xt_DNETMAP/20.0.0.0_26
This configuration will allow only preconfigured static bindings to work due to the static rule option. Without this flag, dynamic bindings would be created using non-static entries.
5. Persistent prefix:
iptables -t nat -A POSTROUTING -s 192.168.0.0/24 -j DNETMAP
--prefix 20.0.0.0/26 --persistent
or
iptables -t nat -A POSTROUTING -s 192.168.0.0/24 -j DNETMAP --prefix
20.0.0.0/26
echo "+persistent" >/proc/net/xt_DNETMAP/20.0.0.0_26
Now, we can check the persistent flag of the prefix:
cat /proc/net/xt_DNETMAP/20.0.0.0_26
0 0 64 0 persistent
Flush the iptables nat table and see that prefix is still in
existence:
iptables -F -t nat
ls -l /proc/net/xt_DNETMAP
-rw-r--r-- 1 root root 0 06-10 09:01 20.0.0.0_26
-rw-r--r-- 1 root root 0 06-10 09:01 20.0.0.0_26_stat
The ECHO target will send back all packets it received. It serves as an examples for an Xtables target.
ECHO takes no options.
Allows you to mark a received packet basing on its IP address. This can replace many mangle/mark entries with only one, if you use firewall based classifier.
This target is to be used inside the mangle table.
The order of IP address bytes is reversed to meet "human order of bytes": 192.168.0.1 is 0xc0a80001. At first the "AND" operation is performed, then "OR".
Examples:
We create a queue for each user, the queue number is adequate to the IP address of the user, e.g.: all packets going to/from 192.168.5.2 are directed to 1:0502 queue, 192.168.5.12 -> 1:050c etc.
We have one classifier rule:
Earlier we had many rules just like below:
Using IPMARK target we can replace all the mangle/mark rules with only one:
On the routers with hundreds of users there should be significant load decrease (e.g. twice).
(IPv6 example) If the source address is of the form 2001:db8:45:1d:20d:93ff:fe9b:e443 and the resulting mark should be 0x93ff, then a right-shift of 16 is needed first:
The LOGMARK target will log packet and connection marks to syslog.
The PROTO target modifies the protocol number in IP packet header.
For IPv4 packets, the Protocol field is modified and the checksum is re-calculated.
For IPv6 packets, the scenario can be more complex due to the introduction of the extension headers mechanism. By default, the PROTO target will scan the IPv6 packet, finding the last extension header and modify its Next-header field. Normally, the following headers will be seen as an extension header: NEXTHDR_HOP, NEXTHDR_ROUTING, NEXTHDR_FRAGMENT, NEXTHDR_AUTH, NEXTHDR_DEST.
For fragmented packets, only the first fragment is processed and other fragments are not touched.
The SYSRQ target allows to remotely trigger sysrq on the local machine over the network. This can be useful when vital parts of the machine hang, for example an oops in a filesystem causing locks to be not released and processes to get stuck as a result — if still possible, use /proc/sysrq-trigger. Even when processes are stuck, interrupts are likely to be still processed, and as such, sysrq can be triggered through incoming network packets.
The xt_SYSRQ implementation uses a salted hash and a sequence number to prevent network sniffers from either guessing the password or replaying earlier requests. The initial sequence number comes from the time of day so you will have a small window of vulnerability should time go backwards at a reboot. However, the file /sys/module/xt_SYSREQ/seqno can be used to both query and update the current sequence number. Also, you should limit as to who can issue commands using -s and/or -m mac, and also that the destination is correct using -d (to protect against potential broadcast packets), noting that it is still short of MAC/IP spoofing:
You should also limit the rate at which connections can be received to limit the CPU time taken by illegal requests, for example:
This extension does not take any options. The -p udp options are required.
The SYSRQ password can be changed through /sys/module/xt_SYSRQ/parameters/password, for example:
The module will not respond to sysrq requests until a password has been set.
Alternatively, the password may be specified at modprobe time, but this is insecure as people can possible see it through ps(1). You can use an option line in e.g. /etc/modprobe.d/xt_sysrq if it is properly guarded, that is, only readable by root.
The hash algorithm can also be specified as a module option, for example, to use SHA-256 instead of the default SHA-1:
The xt_SYSRQ module is normally silent unless a successful request is received, but the debug module parameter can be used to find exactly why a seemingly correct request is not being processed.
To trigger SYSRQ from a remote host, just use socat:
sysrq_key="s" # the SysRq key(s) password="password" seqno="$(date +%s)" salt="$(dd bs=12 count=1 if=/dev/urandom 2>/dev/null |
openssl enc -base64)" ipaddr="2001:0db8:0000:0000:0000:ff00:0042:8329" req="$sysrq_key,$seqno,$salt" req="$req,$(echo -n "$req,$ipaddr,$password" | sha1sum | cut -c1-40)" echo "$req" | socat stdin udp-sendto:$ipaddr:9
See the Linux docs for possible sysrq keys. Important ones are: re(b)oot, power(o)ff, (s)ync filesystems, (u)mount and remount readonly. More than one sysrq key can be used at once, but bear in mind that, for example, a sync may not complete before a subsequent reboot or poweroff.
An IPv4 address should have no leading zeros, an IPv6 address should be in the full expanded form (as shown above). The debug option will cause output to be emitted in the same form.
The hashing scheme should be enough to prevent mis-use of SYSRQ in many environments, but it is not perfect: take reasonable precautions to protect your machines.
Captures and holds incoming TCP connections using no local per-connection resources.
TARPIT only works at the TCP level, and is totally application agnostic. This module will answer a TCP request and play along like a listening server, but aside from sending an ACK or RST, no data is sent. Incoming packets are ignored and dropped. The attacker will terminate the session eventually. This module allows the initial packets of an attack to be captured by other software for inspection. In most cases this is sufficient to determine the nature of the attack.
This offers similar functionality to LaBrea <http://www.hackbusters.net/LaBrea/> but does not require dedicated hardware or IPs. Any TCP port that you would normally DROP or REJECT can instead become a tarpit.
To tarpit connections to TCP port 80 destined for the current machine:
To significantly slow down Code Red/Nimda-style scans of unused address space, forward unused ip addresses to a Linux box not acting as a router (e.g. "ip route 10.0.0.0 255.0.0.0 ip.of.linux.box" on a Cisco), enable IP forwarding on the Linux box, and add:
NOTE: If you use the conntrack module while you are using TARPIT, you should also use unset tracking on the packet, or the kernel will unnecessarily allocate resources for each TARPITted connection. To TARPIT incoming connections to the standard IRC port while using conntrack, you could:
This matches if a specific condition variable is (un)set.
This module matches a rate limit based on a fuzzy logic controller (FLC).
Match a packet by its source or destination country.
The extra files you will need is the binary database files. They are generated from a country-subnet database with the geoip_build_db.pl tool that is shipped with the source package, and which should be available in compiled packages in /usr/lib(exec)/xtables-addons/. The first command retrieves CSV files from MaxMind, while the other two build packed bisectable range files:
mkdir -p /usr/share/xt_geoip; cd /tmp; $path/to/xt_geoip_dl;
$path/to/xt_geoip_build -D /usr/share/xt_geoip GeoIP*.csv;
The shared library is hardcoded to look in these paths, so use them.
This module matches packets based on grsecurity RBAC status.
Allows you to check interface states. First, an interface needs to be selected for comparison. Exactly one option of the following three must be specified:
Following that, one can select the interface properties to check for:
This module matches certain packets in P2P flows. It is not designed to match all packets belonging to a P2P connection — use IPP2P together with CONNMARK for this purpose.
Use it together with -p tcp or -p udp to search these protocols only or without -p switch to search packets of both protocols.
IPP2P provides the following options, of which one or more may be specified on the command line:
Note that ipp2p may not (and often, does not) identify all packets that are exchanged as a result of running filesharing programs.
There is more information on http://ipp2p.org/ , but it has not been updated since September 2006, and the syntax there is different from the ipp2p.c provided in Xtables-addons; most importantly, the --ipp2p flag was removed due to its ambiguity to match "all known" protocols.
The "ipv4options" module allows to match against a set of IPv4 header options.
Known symbol names (and their number):
1 — nop
2 — security — RFC 1108
3 — lsrr — Loose Source Routing, RFC 791
4 — timestamp — RFC 781, 791
7 — record-route — RFC 791
9 — ssrr — Strict Source Routing, RFC 791
11 — mtu-probe — RFC 1063
12 — mtu-reply — RFC 1063
18 — traceroute — RFC 1393
20 — router-alert — RFC 2113
Examples:
Match packets that have both Timestamp and NOP: -m ipv4options --flags nop,timestamp
~ that have either of Timestamp or NOP, or both: --flags nop,timestamp --any
~ that have Timestamp and no NOP: --flags '!nop,timestamp'
~ that have either no NOP or a timestamp (or both conditions): --flags '!nop,timestamp' --any
This module matches the length of a packet against a specific value or range of values.
If no --layer* option is given, --layer3 is assumed by default. Note that using --layer5 may not match a packet if it is not one of the recognized types (currently TCP, UDP, UDPLite, ICMP, AH and ESP) or which has no 5th layer.
Detects simple low-level scan attempts based upon the packet's contents. (This is different from other implementations, which also try to match the rate of new connections.) Note that an attempt is only discovered after it has been carried out, but this information can be used in conjunction with other rules to block the remote host's future connections. So this match module will match on the (probably) last packet the remote side will send to your machine.
NOTE: Some clients (Windows XP for example) may do what looks like a SYN scan, so be advised to carefully use xt_lscan in conjunction with blocking rules, as it may lock out your very own internal network.
Attempt to detect TCP and UDP port scans. This match was derived from Solar Designer's scanlogd.
The "quota2" implements a named counter which can be increased or decreased on a per-match basis. Available modes are packet counting or byte counting. The value of the counter can be read and reset through procfs, thereby making this match a minimalist accounting tool.
When counting down from the initial quota, the counter will stop at 0 and the match will return false, just like the original "quota" match. In growing (upcounting) mode, it will always return true.
Because counters in quota2 can be shared, you can combine them for various purposes, for example, a bytebucket filter that only lets as much traffic go out as has come in:
-A INPUT -p tcp --dport 6881 -m quota --name bt --grow; -A OUTPUT -p tcp --sport 6881 -m quota --name bt;
Pknock match implements so-called "port knocking", a stealthy system for network authentication: a client sends packets to selected ports in a specific sequence (= simple mode, see example 1 below), or a HMAC payload to a single port (= complex mode, see example 2 below), to a target machine that has pknock rule(s) installed. The target machine then decides whether to unblock or block (again) the pknock-protected port(s). This can be used, for instance, to avoid brute force attacks on ssh or ftp services.
Example prerequisites:
Example 1 (TCP mode, manual closing of opened port not possible):
The rule will allow tcp port 22 for the attempting IP address after the successful reception of TCP SYN packets to ports 4002, 4001 and 4004, in this order (a.k.a. port-knocking). Port numbers in the connect sequence must follow the exact specification, no other ports may be "knocked" inbetween. The rule is named 'SSH' — a file of the same name for tracking port knocking states will be created in /proc/net/xt_pknock . Successive port knocks must occur with delay of at most 10 seconds. Port 22 (from the example) will be automatiaclly dropped after 60 minutes after it was previously allowed.
Example 2 (UDP mode — non-replayable and non-spoofable, manual closing of opened port possible, secure, also called "SPA" = Secure Port Authorization):
The first rule will create an "ALLOWED" record in /proc/net/xt_pknock/FTP after the successful reception of an UDP packet to port 4000. The packet payload must be constructed as a HMAC256 using "foo" as a key. The HMAC content is the particular client's IP address as a 32-bit network byteorder quantity, plus the number of minutes since the Unix epoch, also as a 32-bit value. (This is known as Simple Packet Authorization, also called "SPA".) In such case, any subsequent attempt to connect to port 21 from the client's IP address will cause such packets to be accepted in the second rule.
Similarly, upon reception of an UDP packet constructed the same way, but with the key "bar", the first rule will remove a previously installed "ALLOWED" state record from /proc/net/xt_pknock/FTP, which means that the second rule will stop matching for subsequent connection attempts to port 21. In case no close-secret packet is received within 4 hours, the first rule will remove "ALLOWED" record from /proc/net/xt_pknock/FTP itself.
Things worth noting:
General:
Specifying --autoclose 0 means that no automatic close will be performed at all.
xt_pknock is capable of sending information about successful matches via a netlink socket to userspace, should you need to implement your own way of receiving and handling portknock notifications.
TCP mode:
This mode is not immune against eavesdropping, spoofing and replaying of the port knock sequence by someone else (but its use may still be sufficient for scenarios where these factors are not necessarily this important, such as bare shielding of the SSH port from brute-force attacks). However, if you need these features, you should use UDP mode.
It is always wise to specify three or more ports that are not monotonically increasing or decreasing with a small stepsize (e.g. 1024,1025,1026) to avoid accidentally triggering the rule by a portscan.
Specifying the inter-knock timeout with --time is mandatory in TCP mode, to avoid permanent denial of services by clogging up the peer knock-state tracking table that xt_pknock internally keeps, should there be a DDoS on the first-in-row knock port from more hostile IP addresses than what the actual size of this table is (defaults to 16, can be changed via the "peer_hasht_ents" module parameter). It is also wise to use as short a time as possible (1 second) for --time for this very reason. You may also consider increasing the size of the peer knock-state tracking table. Using --strict also helps, as it requires the knock sequence to be exact. This means that if the hostile client sends more knocks to the same port, xt_pknock will mark such attempt as failed knock sequence and will forget it immediately. To completely thwart this kind of DDoS, knock-ports would need to have an additional rate-limit protection. Or you may consider using UDP mode.
UDP mode:
This mode is immune against eavesdropping, replaying and spoofing attacks. It is also immune against DDoS attack on the knockport.
For this mode to work, the clock difference on the client and on the server must be below 1 minute. Synchronizing time on both ends by means of NTP or rdate is strongly suggested.
There is a rate limiter built into xt_pknock which blocks any subsequent open attempt in UDP mode should the request arrive within less than one minute since the first successful open. This is intentional; it thwarts eventual spoofing attacks.
Because the payload value of an UDP knock packet is influenced by client's IP address, UDP mode cannot be used across NAT.
For sending UDP "SPA" packets, you may use either knock.sh or knock-orig.sh. These may be found in doc/pknock/util.
iptables(8), ip6tables(8), iptables-extensions(8), iptaccount(8)
For developers, the book "Writing Netfilter modules" at http://inai.de/documents/Netfilter_Modules.pdf provides detailed information on how to write such modules/extensions.