NETGRAPH(4) | Device Drivers Manual | NETGRAPH(4) |
netgraph
— graph
based kernel networking subsystem
The netgraph
system provides a uniform and
modular system for the implementation of kernel objects which perform
various networking functions. The objects, known as nodes,
can be arranged into arbitrarily complicated graphs. Nodes have
hooks which are used to connect two nodes together,
forming the edges in the graph. Nodes communicate along the edges to process
data, implement protocols, etc.
The aim of netgraph
is to supplement
rather than replace the existing kernel networking infrastructure. It
provides:
The most fundamental concept in netgraph
is that of a
node. All
nodes implement a number of predefined methods which allow them to interact
with other nodes in a well defined manner.
Each node has a type, which is a static property of the node determined at node creation time. A node's type is described by a unique ASCII type name. The type implies what the node does and how it may be connected to other nodes.
In object-oriented language, types are classes, and nodes are instances of their respective class. All node types are subclasses of the generic node type, and hence inherit certain common functionality and capabilities (e.g., the ability to have an ASCII name).
Nodes may be assigned a globally unique ASCII name which can be
used to refer to the node. The name must not contain the characters
‘.
’ or
‘:
’, and is limited to
NG_NODESIZ
characters (including the terminating
NUL
character).
Each node instance has a unique ID number which is expressed as a 32-bit hexadecimal value. This value may be used to refer to a node when there is no ASCII name assigned to it.
Nodes are connected to other nodes by connecting a pair of hooks, one from each node. Data flows bidirectionally between nodes along connected pairs of hooks. A node may have as many hooks as it needs, and may assign whatever meaning it wants to a hook.
Hooks have these properties:
.
’ or
‘:
’, and is limited to
NG_HOOKSIZ
characters (including the terminating
NUL
character).A node may decide to assign special meaning to some hooks. For example, connecting to the hook named debug might trigger the node to start sending debugging information to that hook.
Two types of information flow between nodes: data messages and
control messages. Data messages are passed in mbuf
chains along the edges in the graph, one edge at a time. The first
mbuf in a chain must have the
M_PKTHDR
flag set. Each node decides how to handle
data received through one of its hooks.
Along with data, nodes can also receive control messages. There
are generic and type-specific control messages. Control messages have a
common header format, followed by type-specific data, and are binary
structures for efficiency. However, node types may also support conversion
of the type-specific data between binary and ASCII formats, for debugging
and human interface purposes (see the
NGM_ASCII2BINARY
and
NGM_BINARY2ASCII
generic control messages below).
Nodes are not required to support these conversions.
There are three ways to address a control message. If there is a sequence of edges connecting the two nodes, the message may be “source routed” by specifying the corresponding sequence of ASCII hook names as the destination address for the message (relative addressing). If the destination is adjacent to the source, then the source node may simply specify (as a pointer in the code) the hook across which the message should be sent. Otherwise, the recipient node's global ASCII name (or equivalent ID-based name) is used as the destination address for the message (absolute addressing). The two types of ASCII addressing may be combined, by specifying an absolute start node and a sequence of hooks. Only the ASCII addressing modes are available to control programs outside the kernel; use of direct pointers is limited to kernel modules.
Messages often represent commands that are followed by a reply message in the reverse direction. To facilitate this, the recipient of a control message is supplied with a “return address” that is suitable for addressing a reply.
Each control message contains a 32-bit value, called a “typecookie”, indicating the type of the message, i.e. how to interpret it. Typically each type defines a unique typecookie for the messages that it understands. However, a node may choose to recognize and implement more than one type of messages.
If a message is delivered to an address that implies that it arrived at that node through a particular hook (as opposed to having been directly addressed using its ID or global name) then that hook is identified to the receiving node. This allows a message to be re-routed or passed on, should a node decide that this is required, in much the same way that data packets are passed around between nodes. A set of standard messages for flow control and link management purposes are defined by the base system that are usually passed around in this manner. Flow control message would usually travel in the opposite direction to the data to which they pertain.
In order to minimize latency, most
netgraph
operations are functional. That is, data
and control messages are delivered by making function calls rather than by
using queues and mailboxes. For example, if node A wishes to send a data
mbuf to neighboring node B, it calls the generic
netgraph
data delivery function. This function in
turn locates node B and calls B's “receive data” method. There
are exceptions to this.
Each node has an input queue, and some operations can be considered to be writers in that they alter the state of the node. Obviously, in an SMP world it would be bad if the state of a node were changed while another data packet were transiting the node. For this purpose, the input queue implements a reader/writer semantic so that when there is a writer in the node, all other requests are queued, and while there are readers, a writer, and any following packets are queued. In the case where there is no reason to queue the data, the input method is called directly, as mentioned above.
A node may declare that all requests should be considered as
writers, or that requests coming in over a particular hook should be
considered to be a writer, or even that packets leaving or entering across a
particular hook should always be queued, rather than delivered directly
(often useful for interrupt routines who want to get back to the hardware
quickly). By default, all control message packets are considered to be
writers unless specifically declared to be a reader in their definition.
(See NGM_READONLY
in
<netgraph/ng_message.h>
.)
While this mode of operation results in good performance, it has a few implications for node developers:
Netgraph
provides internal synchronization between
nodes. Data always enters a “graph” at an edge
node. An edge node is a node that interfaces between
netgraph
and some other part of the system.
Examples of “edge nodes” include device drivers, the
socket, ether,
tty, and ksocket node type. In
these edge nodes, the calling thread directly executes
code in the node, and from that code calls upon the
netgraph
framework to deliver data across some
edge in the graph. From an execution point of view, the calling thread
will execute the netgraph
framework methods, and
if it can acquire a lock to do so, the input methods of the next node.
This continues until either the data is discarded or queued for some
device or system entity, or the thread is unable to acquire a lock on the
next node. In that case, the data is queued for the node, and execution
rewinds back to the original calling entity. The queued data will be
picked up and processed by either the current holder of the lock when they
have completed their operations, or by a special
netgraph
thread that is activated when there are
such items queued.So far, these issues have not proven problematical in practice.
A node may have a hidden interaction with other components of the
kernel outside of the netgraph
subsystem, such as
device hardware, kernel protocol stacks, etc. In fact, one of the benefits
of netgraph
is the ability to join disparate kernel
networking entities together in a consistent communication framework.
An example is the socket node type which is
both a netgraph
node and a
socket(2) in the protocol family
PF_NETGRAPH
. Socket nodes allow user processes to
participate in netgraph
. Other nodes communicate
with socket nodes using the usual methods, and the node hides the fact that
it is also passing information to and from a cooperating user process.
Another example is a device driver that presents a node interface to the hardware.
Nodes are notified of the following actions via function calls to the following node methods, and may accept or reject that action (by returning the appropriate error code):
NG_NODE_REVIVE
()
macro to signal the generic code that the shutdown is aborted. In the case
where the shutdown is started by the node itself due to hardware removal
or unloading (via
ng_rmnode_self
()),
it should set the NGF_REALLY_DIE
flag to signal to
its own shutdown method that it is not to persist.Two other methods are also supported by all nodes:
netgraph
queueable request
item, usually referred to as an item, is
received by this function. The item contains a pointer to an
mbuf.
The node is notified on which hook the item has
arrived, and can use this information in its processing decision. The
receiving node must always
NG_FREE_M
()
the mbuf chain on completion or error, or pass it
on to another node (or kernel module) which will then be responsible for
freeing it. Similarly, the item must be freed if it is
not to be passed on to another node, by using the
NG_FREE_ITEM
()
macro. If the item still holds references to mbufs
at the time of freeing then they will also be appropriately freed.
Therefore, if there is any chance that the mbuf
will be changed or freed separately from the item, it is very important
that it be retrieved using the
NGI_GET_M
()
macro that also removes the reference within the item. (Or multiple
frees of the same object will occur.)
If it is only required to examine the contents of
the mbufs, then it is possible to use the
NGI_M
()
macro to both read and rewrite mbuf pointer inside
the item.
If developer needs to pass any meta information along with the mbuf chain, he should use mbuf_tags(9) framework.
netgraph
specific
meta-data format is obsoleted now.The receiving node may decide to defer the data by
queueing it in the netgraph
NETISR system (see
below). It achieves this by setting the HK_QUEUE
flag in the flags word of the hook on which that data will arrive. The
infrastructure will respect that bit and queue the data for delivery at
a later time, rather than deliver it directly. A node may decide to set
the bit on the
peer node, so
that its own output packets are queued.
The node may elect to nominate a
different receive data function for data received on a particular hook,
to simplify coding. It uses the
NG_HOOK_SET_RCVDATA
(hook,
fn) macro to do this. The function receives the
same arguments in every way other than it will receive all (and only)
packets from that hook.
NGI_MSG
()
macro, or completely extracted from the item using the
NGI_GET_MSG
()
which also removes the reference within the item. If the item still holds
a reference to the message when it is freed (using the
NG_FREE_ITEM
() macro), then the message will also
be freed appropriately. If the reference has been removed, the node must
free the message itself using the
NG_FREE_MSG
()
macro. A return address is always supplied, giving the address of the node
that originated the message so a reply message can be sent anytime later.
The return address is retrieved from the item using the
NGI_RETADDR
()
macro and is of type ng_ID_t. All control messages
and replies are allocated with the malloc(9) type
M_NETGRAPH_MSG
, however it is more convenient to
use the
NG_MKMESSAGE
()
and
NG_MKRESPONSE
()
macros to allocate and fill out a message. Messages must be freed using
the NG_FREE_MSG
() macro.
If the message was delivered via a specific hook, that hook will also be made known, which allows the use of such things as flow-control messages, and status change messages, where the node may want to forward the message out another hook to that on which it arrived.
The node may elect to nominate a
different receive message function for messages received on a particular
hook, to simplify coding. It uses the
NG_HOOK_SET_RCVMSG
(hook,
fn) macro to do this. The function receives the
same arguments in every way other than it will receive all (and only)
messages from that hook.
Much use has been made of reference counts, so that nodes being freed of all references are automatically freed, and this behaviour has been tested and debugged to present a consistent and trustworthy framework for the “type module” writer to use.
The netgraph
framework provides an
unambiguous and simple to use method of specifically addressing any single
node in the graph. The naming of a node is independent of its type, in that
another node, or external component need not know anything about the node's
type in order to address it so as to send it a generic message type. Node
and hook names should be chosen so as to make addresses meaningful.
Addresses are either absolute or relative. An absolute address begins with a node name or ID, followed by a colon, followed by a sequence of hook names separated by periods. This addresses the node reached by starting at the named node and following the specified sequence of hooks. A relative address includes only the sequence of hook names, implicitly starting hook traversal at the local node.
There are a couple of special possibilities for the node name. The
name ‘.
’ (referred to as
‘.:
’) always refers to the local node.
Also, nodes that have no global name may be addressed by their ID numbers,
by enclosing the hexadecimal representation of the ID number within the
square brackets. Here are some examples of valid
netgraph
addresses:
.: [3f]: foo: .:hook1 foo:hook1.hook2 [d80]:hook1
The following set of nodes might be created for a site with a single physical frame relay line having two active logical DLCI channels, with RFC 1490 frames on DLCI 16 and PPP frames over DLCI 20:
[type SYNC ] [type FRAME] [type RFC1490] [ "Frame1" ](uplink)<-->(data)[<un-named>](dlci16)<-->(mux)[<un-named> ] [ A ] [ B ](dlci20)<---+ [ C ] | | [ type PPP ] +>(mux)[<un-named>] [ D ]
One could always send a control message to node C from anywhere by
using the name “Frame1:uplink.dlci16
”.
In this case, node C would also be notified that the message reached it via
its hook mux. Similarly,
“Frame1:uplink.dlci20
” could reliably
be used to reach node D, and node A could refer to node B as
“.:uplink
”, or simply
“uplink
”. Conversely, B can refer to A
as “data
”. The address
“mux.data
” could be used by both nodes
C and D to address a message to node A.
Note that this is only for control messages. In each of these cases, where a relative addressing mode is used, the recipient is notified of the hook on which the message arrived, as well as the originating node. This allows the option of hop-by-hop distribution of messages and state information. Data messages are only routed one hop at a time, by specifying the departing hook, with each node making the next routing decision. So when B receives a frame on hook data, it decodes the frame relay header to determine the DLCI, and then forwards the unwrapped frame to either C or D.
In a similar way, flow control messages may be routed
in the reverse direction to outgoing data. For example a “buffer
nearly full” message from
“Frame1:
” would be passed to node B
which might decide to send similar messages to both nodes C and D. The nodes
would use direct hook
pointer addressing to route the messages. The message may have
travelled from “Frame1:
” to B as a
synchronous reply, saving time and cycles.
Structures are defined in
<netgraph/netgraph.h>
(for
kernel structures only of interest to nodes) and
<netgraph/ng_message.h>
(for
message definitions also of interest to user programs).
The two basic object types that are of interest to node authors are nodes and hooks. These two objects have the following properties that are also of interest to the node writers.
typedef
to declare their pointers, and should
never actually declare the structure.
typedef struct ng_node *node_p;
The following properties are associated with a node, and can be accessed in the following manner:
NG_NODE_IS_VALID
(node)
macro will return this state. Eventually it should be almost
impossible for code to run in an invalid node but at this time that
work has not been completed.NG_NODE_ID
(node).NUL
terminated
string. If there is a value in here, it is the name of the node.
if (NG_NODE_NAME(node)[0] != '\0') ... if (strcmp(NG_NODE_NAME(node), "fred") == 0) ...
NG_NODE_SET_PRIVATE
(node,
value) and
NG_NODE_PRIVATE
(node)
set and retrieve this property, respectively.NG_NODE_NUMHOOKS
(node)
macro is used to retrieve this value.NG_NODE_FOREACH_HOOK
(node,
fn, arg,
rethook) where fn is a
function that will be called for each hook with the form
fn
(hook,
arg) and returning 0 to terminate the search. If
the search is terminated, then rethook will be
set to the hook at which the search was terminated.typedef
to declare their hook pointers.
typedef struct ng_hook *hook_p;
The following properties are associated with a hook, and can be accessed in the following manner:
NG_HOOK_SET_PRIVATE
(hook,
value) and
NG_HOOK_PRIVATE
(hook)
set and retrieve this property, respectively.NG_HOOK_NODE
(hook)
finds the associated node.NG_HOOK_PEER
(hook)
macro finds the peer.NG_HOOK_REF
(hook)
and
NG_HOOK_UNREF
(hook)
macros increment and decrement the hook reference count accordingly.
After decrement you should always assume the hook has been freed
unless you have another reference still valid.NG_HOOK_SET_RCVDATA
(hook,
fn) and
NG_HOOK_SET_RCVMSG
(hook,
fn) macros can be used to set override methods
that will be used in preference to the generic receive data and
receive message functions. To unset these, use the macros to set them
to NULL
. They will only be used for data and
messages received on the hook on which they are set.The maintenance of the names, reference
counts, and linked list of hooks for each node is handled automatically
by the netgraph
subsystem. Typically a node's
private info contains a back-pointer to the node or hook structure,
which counts as a new reference that must be included in the reference
count for the node. When the node constructor is called, there is
already a reference for this calculated in, so that when the node is
destroyed, it should remember to do a
NG_NODE_UNREF
()
on the node.
From a hook you can obtain the corresponding node, and from a node, it is possible to traverse all the active hooks.
A current example of how to define a node can always be seen in src/sys/netgraph/ng_sample.c and should be used as a starting point for new node writers.
Control messages have the following structure:
#define NG_CMDSTRSIZ 32 /* Max command string (including null) */ struct ng_mesg { struct ng_msghdr { u_char version; /* Must equal NG_VERSION */ u_char spare; /* Pad to 4 bytes */ uint16_t spare2; uint32_t arglen; /* Length of cmd/resp data */ uint32_t cmd; /* Command identifier */ uint32_t flags; /* Message status flags */ uint32_t token; /* Reply should have the same token */ uint32_t typecookie; /* Node type understanding this message */ u_char cmdstr[NG_CMDSTRSIZ]; /* cmd string + */ } header; char data[]; /* placeholder for actual data */ }; #define NG_ABI_VERSION 12 /* Netgraph kernel ABI version */ #define NG_VERSION 8 /* Netgraph message version */ #define NGF_ORIG 0x00000000 /* The msg is the original request */ #define NGF_RESP 0x00000001 /* The message is a response */
Control messages have the fixed header shown above, followed by a variable length data section which depends on the type cookie and the command. Each field is explained below:
netgraph
message
protocol itself. The current version is
NG_VERSION
.EINVAL
.
Each type should have an include file that defines
the commands, argument format, and cookie for its own messages. The
typecookie ensures that the same header file was included by both sender
and receiver; when an incompatible change in the header file is made,
the typecookie
must be changed.
The de-facto method for generating unique type cookies is to take the
seconds from the Epoch at the time the header file is written (i.e., the
output of “date
-u
+%s
”).
There is a predefined typecookie
NGM_GENERIC_COOKIE
for the
generic node type, and a corresponding set of
generic messages which all nodes understand. The handling of these
messages is automatic.
Some modules may choose to implement messages from more than one of the header files and thus recognize more than one type cookie.
Control messages are in binary format for efficiency. However, for debugging and human interface purposes, and if the node type supports it, control messages may be converted to and from an equivalent ASCII form. The ASCII form is similar to the binary form, with two exceptions:
NUL
-terminated
ASCII string version of the message arguments.In general, the arguments field of a control message can be any
arbitrary C data type. Netgraph
includes parsing
routines to support some pre-defined datatypes in ASCII with this simple
syntax:
=
’)
preceding it. Whenever an element does not have an explicit index, the
index is implicitly the previous element's index plus one.Each node type may define its own arbitrary types by providing the necessary routines to parse and unparse. ASCII forms defined for a specific node type are documented in the corresponding man page.
There are a number of standard predefined messages that will work
for any node, as they are supported directly by the framework itself. These
are defined in
<netgraph/ng_message.h>
along with the basic layout of messages and other similar information.
NGM_CONNECT
NGM_MKPEER
NGM_SHUTDOWN
NGM_NAME
NGM_MKPEER
method. Such nodes can only be
addressed relatively or by their ID number.NGM_RMHOOK
NGM_NODEINFO
NGM_LISTHOOKS
NGM_NODEINFO
, but in addition includes an array of
fields describing each link, and the description for the node at the far
end of that link.NGM_LISTNAMES
NGM_NODEINFO
) where each entry of the array
describes a named node. All named nodes will be described.NGM_LISTNODES
NGM_LISTNAMES
except that all
nodes are listed regardless of whether they have a name or not.NGM_LISTTYPES
netgraph
types.NGM_TEXT_STATUS
NG_TEXTRESPONSE
bytes in length (presently 1024). This can be used to return general
status information in human readable form.NGM_BINARY2ASCII
NGM_BINARY2ASCII
message itself. If
successful, the reply will contain the same control message in ASCII form.
A node will typically only know how to translate messages that it itself
understands, so the target node of the
NGM_BINARY2ASCII
is often the same node that would
actually receive that message.NGM_ASCII2BINARY
NGM_BINARY2ASCII
. The entire
control message to be converted, in ASCII form, is contained in the
arguments section of the NGM_ASCII2BINARY
and need
only have the flags, cmdstr,
and arglen header fields filled in, plus the
NUL
-terminated string version of the arguments in
the arguments field. If successful, the reply contains the binary version
of the control message.In addition to the control messages that affect nodes with respect
to the graph, there are also a number of
flow
control messages defined. At present these are
not handled
automatically by the system, so nodes need to handle them if they are going
to be used in a graph utilising flow control, and will be in the likely path
of these messages. The default action of a node that does not understand
these messages should be to pass them onto the next node. Hopefully some
helper functions will assist in this eventually. These messages are also
defined in
<netgraph/ng_message.h>
and
have a separate cookie NG_FLOW_COOKIE
to help
identify them. They will not be covered in depth here.
The base netgraph
code may either be
statically compiled into the kernel or else loaded dynamically as a KLD via
kldload(8). In the former case, include
options NETGRAPH
in your kernel configuration file. You may also include selected node types in the kernel compilation, for example:
options NETGRAPH
options NETGRAPH_SOCKET
options NETGRAPH_ECHO
Once the netgraph
subsystem is loaded,
individual node types may be loaded at any time as KLD modules via
kldload(8). Moreover, netgraph
knows how to automatically do this; when a request to create a new node of
unknown type type is made,
netgraph
will attempt to load the KLD module
ng_⟨type⟩.ko.
Types can also be installed at boot time, as certain device
drivers may want to export each instance of the device as a
netgraph
node.
In general, new types can be installed at any time
from within the kernel by calling
ng_newtype
(),
supplying a pointer to the type's struct ng_type
structure.
The
NETGRAPH_INIT
()
macro automates this process by using a linker set.
Several node types currently exist. Each is fully documented in its own man page:
PF_NETGRAPH
. The new sockets protocols are
NG_DATA
and NG_CONTROL
,
both of type SOCK_DGRAM
. Typically one of each is
associated with a socket node. When both sockets have closed, the node
will shut down. The NG_DATA
socket is used for
sending and receiving data, while the NG_CONTROL
socket is used for sending and receiving control messages. Data and
control messages are passed using the sendto(2) and
recvfrom(2) system calls, using a struct
sockaddr_ng socket address.netgraph
node. It has a
programmable “hotkey” character.ng0
”,
“ng1
”, etc.netgraph
. The net/mpd5
port can use these modules to make a very low latency high capacity PPP
system. It also supports PPTP VPNs using the PPTP node.netgraph
system for further
processing. This allows such things as UDP tunnels to be almost trivially
implemented from the command line.Refer to the section at the end of this man page for more nodes types.
Whether a named node exists can be checked by trying to send a
control message to it (e.g., NGM_NODEINFO
). If it
does not exist, ENOENT
will be returned.
All data messages are mbuf chains with the
M_PKTHDR
flag set.
Nodes are responsible for freeing what they allocate. There are three exceptions:
NG_SEND_MSG_*
()
family macros are freed by the recipient. As in the case above, the
addresses associated with the message are freed by whatever allocated them
so the recipient should copy them if it wants to keep that
information.netgraph
item. The
item must be freed using
NG_FREE_ITEM
(item)
or passed on to another node.<netgraph/netgraph.h>
netgraph
nodes.<netgraph/ng_message.h>
netgraph
messages.<netgraph/ng_socket.h>
netgraph
socket type nodes.<netgraph/ng_>
⟨type⟩.hnetgraph
type nodes, including the type cookie
definition.netgraph
subsystem loadable KLD module.netgraph
node. Use this as a starting
point for new node types.There is a library for supporting user-mode programs that wish to
interact with the netgraph
system. See
netgraph(3) for details.
Two user-mode support programs, ngctl(8) and nghook(8), are available to assist manual configuration and debugging.
There are a few useful techniques for debugging new node types. First, implementing new node types in user-mode first makes debugging easier. The tee node type is also useful for debugging, especially in conjunction with ngctl(8) and nghook(8).
Also look in /usr/share/examples/netgraph
for solutions to several common networking problems, solved using
netgraph
.
socket(2), netgraph(3), ng_async(4), ng_atm(4), ng_atmllc(4), ng_bluetooth(4), ng_bpf(4), ng_bridge(4), ng_bt3c(4), ng_btsocket(4), ng_car(4), ng_cisco(4), ng_device(4), ng_echo(4), ng_eiface(4), ng_etf(4), ng_ether(4), ng_frame_relay(4), ng_gif(4), ng_gif_demux(4), ng_h4(4), ng_hci(4), ng_hole(4), ng_hub(4), ng_iface(4), ng_ip_input(4), ng_ipfw(4), ng_ksocket(4), ng_l2cap(4), ng_l2tp(4), ng_lmi(4), ng_mppc(4), ng_nat(4), ng_netflow(4), ng_one2many(4), ng_patch(4), ng_ppp(4), ng_pppoe(4), ng_pptpgre(4), ng_rfc1490(4), ng_socket(4), ng_split(4), ng_sppp(4), ng_sscfu(4), ng_sscop(4), ng_tee(4), ng_tty(4), ng_ubt(4), ng_UI(4), ng_uni(4), ng_vjc(4), ng_vlan(4), ngctl(8), nghook(8)
The netgraph
system was designed and first
implemented at Whistle Communications, Inc. in a version of
FreeBSD 2.2 customized for the Whistle InterJet. It
first made its debut in the main tree in FreeBSD
3.4.
Julian Elischer <julian@FreeBSD.org>, with contributions by Archie Cobbs <archie@FreeBSD.org>.
November 25, 2013 | Debian |