xymond - Master network daemon for a Xymon server
xymond is the core daemon in the Xymon Monitor. It is designed to
handle monitoring of a large number of hosts, with a strong focus on being a
high-speed, low-overhead implementation of a Big Brother compatible
server.
To achieve this, xymond stores all information about the state of
the monitored systems in memory, instead of storing it in the host
filesystem. A number of plug-ins can be enabled to enhance the basic
operation; e.g. a set of plugins are provided to implement persistent
storage in a way that is compatible with the Big Brother daemon. However,
even with these plugins enabled, xymond still performs much faster than the
standard bbd daemon.
xymond is normally started and controlled by the
xymonlaunch(8) tool, and the command used to invoke xymond should
therefore be in the tasks.cfg file.
- --hosts=FILENAME
- Specifies the path to the Xymon hosts.cfg file. This is used to check if
incoming status messages refer to known hosts; depending on the
"--ghosts" option, messages for unknown hosts may be dropped. If
this option is omitted, the default path used is set by the HOSTSCFG
environment variable.
- --checkpoint-file=FILENAME
- With regular intervals, xymond will dump all of its internal state to this
check-point file. It is also dumped when xymond terminates, or when it
receives a SIGUSR1 signal.
- --checkpoint-interval=N
- Specifies the interval (in seconds) between dumps to the check-point file.
The default is 900 seconds (15 minutes).
- --restart=FILENAME
- Specifies an existing file containing a previously generated xymond
checkpoint. When starting up, xymond will restore its internal state from
the information in this file. You can use the same filename for
"--checkpoint-file" and "--restart".
- --ghosts={allow|drop|log|match}
- How to handle status messages from unknown hosts. The "allow"
setting accepts all status messages, regardless of whether the host is
known in the hosts.cfg file or not. "drop" silently ignores
reports from unknown hosts. "log" works like drop, but logs the
event in the xymond output file. "match" will try to match the
name of the unknown host reporting with the known names by ignoring any
domain-names - if a match is found, then a temporary client alias is
automatically generated. The default is "log".
- --no-purple
- Prevent status messages from going purple when they are no longer valid.
Unlike the standard bbd daemon, purple-handling is done by xymond.
- --merge-clientlocal
- The client-local.cfg(5) file contains client-configuration which
can be found matching a client against its hostname, its classname, or the
name of the OS the client is running. By default xymond will return one
entry from the file to the client, looking for a hostname, classname or OS
match (in that order). This option causes xymond to merge all matching
entries together into one and return all of it to the client.
- --listen=IP[:PORT]
- Specifies the IP-address and port where xymond will listen for incoming
connections. By default, xymond listens on IP 0.0.0.0 (i.e. all IP-
addresses available on the host) and port 1984.
- --lqueue=NUMBER
- Specifies the listen-queue for incoming connections. You don't need to
tune this unless you have a very busy xymond daemon.
- --no-bfq
- Tells xymond to NOT use the local messagequeue interface for receiving
status- updates from xymond_client and xymonnet.
- --daemon
- xymond is normally started by xymonlaunch(8). If you do not want to
use xymonlaunch, you can start xymond with this option; it will then
detach from the terminal and continue running as a background task.
- --timeout=N
- Set the timeout used for incoming connections. If a status has not been
received more than N seconds after the connection was accepted, then the
connection is dropped and any status message is discarded. Default: 10
seconds.
- --flap-count=N
- Track the N latest status-changes for flap-detection. See the
--flap-seconds option also. To disable flap-checks globally, set N
to zero. To disable for a specific host, you must use the
"noflap" option in hosts.cfg(5). Default: 5
- --flap-seconds=N
- If a status changes more than flap-count times in N seconds or
less, then it is considered to be flapping. In that case, the status is
locked at the most severe level until the flapping stops. The history
information is not updated after the flapping is detected. NOTE: If
this is set higher than the default value, you should also use the
--flap-count option to ensure that enough status-changes are stored
for flap detection to work. The flap-count setting should be at least
(N/300)-1, e.g. if you set flap-seconds to 3600 (1 hour), then flap-count
should be at least (3600/300)-1, i.e. 11. Default: 1800 seconds (30
minutes).
- --delay-red=N
- --delay-yellow=N
- Sets the delay before a red/yellow status causes a change in the web page
display. Is usually controlled on a per-host basis via the delayred
and delayyellow settings in hosts.cfg(5) but these options
allow you to set a default value for the delays. The value N is in
minutes. Default: 0 minutes (no delay). Note: Since most tests only
execute once every 5 minutes, it will usually not make sense to set N to
anything but a multiple of 5.
- --env=FILENAME
- Loads the content of FILENAME as environment settings before starting
xymond. This is mostly used when running as a stand-alone daemon; if
xymond is started by xymonlaunch, the environment settings are controlled
by the xymonlaunch tasks.cfg file.
- --pidfile=FILENAME
- xymond writes the process-ID it is running with to this file. This is for
use in automated startup scripts. The default file is
$XYMONSERVERLOGS/xymond.pid.
- --log=FILENAME
- Redirect all output from xymond to FILENAME.
- --store-clientlogs[=[!]COLUMN]
- Determines which status columns can cause a client message to be broadcast
to the CLICHG channel. By default, no client messages are pushed to the
CLICHG channel. If this option is specified with no parameter list, all
status columns that go into an alert state will trigger the client data to
be sent to the CLICHG channel. If a parameter list is added to this
option, only those status columns listed in the list will cause the client
data to be sent to the CLICHG channel. Several column names can be listed,
separated by commas. If all columns are given as "!COLUMNNAME",
then all status columns except those listed will cause the client data to
be sent.
- --status-senders=IP[/MASK][,IP/MASK]
- Controls which hosts may send "status", "combo",
"config" and "query" commands to xymond.
By default, any host can send status-updates. If this option
is used, then status-updates are accepted only if they are sent by one
of the IP-addresses listed here, or if they are sent from the IP-address
of the host that the updates pertains to (this is to allow Xymon clients
to send in their own status updates, without having to list all clients
here). So typically you will need to list your servers running network
tests here.
The format of this option is a list of IP-addresses,
optionally with a network mask in the form of the number of bits. E.g.
if you want to accept status-updates from the host 172.16.10.2, you
would use
--status-senders=172.16.10.2
whereas if you want to accept status updates from both 172.16.10.2 and
from all of the hosts on the 10.0.2.* network (a 24-bit IP network), you
would use
--status-senders=172.16.10.2,10.0.2.0/24
- --maint-senders=IP[/MASK][,IP/MASK]
- Controls which hosts may send maintenance commands to xymond. Maintenance
commands are the "enable", "disable", "ack"
and "notes" commands. Format of this option is as for the
--status-senders option. It is strongly recommended that you use this to
restrict access to these commands, so that monitoring of a host cannot be
disabled by a rogue user - e.g. to hide a system compromise from the
monitoring system.
Note: If messages are sent through a proxy, the
IP-address restrictions are of little use, since the messages will
appear to originate from the proxy server address. It is therefore
strongly recommended that you do NOT include the address of a server
running xymonproxy in the list of allowed addresses.
- --www-senders=IP[/MASK][,IP/MASK]
- Controls which hosts may send commands to retrieve the state of xymond.
These are the "xymondlog", "xymondboard" and
"xymondxboard" commands, which are used by xymongen(1)
and combostatus(1) to retrieve the state of the Xymon system so
they can generate the Xymon webpages.
Note: If messages are sent through a proxy, the
IP-address restrictions are of little use, since the messages will
appear to originate from the proxy server address. It is therefore
strongly recommended that you do NOT include the address of a server
running xymonproxy in the list of allowed addresses.
- --admin-senders=IP[/MASK][,IP/MASK]
- Controls which hosts may send administrative commands to xymond. These
commands are the "drop" and "rename" commands. Access
to these should be restricted, since they provide an un-authenticated
means of completely disabling monitoring of a host, and can be used to
remove all traces of e.g. a system compromise from the Xymon monitor.
Note: If messages are sent through a proxy, the
IP-address restrictions are of little use, since the messages will
appear to originate from the proxy server address. It is therefore
strongly recommended that you do NOT include the address of a server
running xymonproxy in the list of allowed addresses.
- --no-download
- Disable the "download" command which can be used by clients to
pull files from the Xymon server. The use of these may be seen as a
security risk since they allow file downloads.
- --ack-each-color
- By default, sending an ACK for a yellow status stops alerts from being
sent while the status remains yellow or red. A status change from yellow
to red will not re-enable alerts - the ACK covers all non-green statuses.
With this option, an ACK is valid only for the color of the status when
the ACK was sent. So an ACK for a yellow status is ignored if the status
later changes to red, but an ACK for a red status covers both yellow and
red.
Note: An ACK for a red status will clear any existing yellow acks. This
means that a long-lived ack for yellow is lost when you send a short-lived
ack for red. Hence alerts will restart when the red ack expires, even if
the status by then has changed to yellow.
- --ack-log=FILENAME
- Log acknowledgements created on the Critical Systems page to FILENAME. NB,
acknowledgements created by the Acknowledge Alert CGI are automatically
written to acknowledge.log in the Xymon server log directory. Alerts from
the Critical Systems page can be directed to the same log.
- --debug
- Enable debugging output.
- --dbghost=HOSTNAME
- For troubleshooting problems with a specific host, it may be useful to
track the exact communications from a single host. This option causes
xymond to dump all traffic from a single host to the file
"/tmp/xymond.dbg".
When a status arrives, xymond matches the old and new color
against the "alert" colors (from the "ALERTCOLORS"
setting) and the "OK" colors (from the "OKCOLORS"
setting). The old and new color falls into one of three categories:
OK: The color is one of the "OK" colors (e.g.
"green").
ALERT: The color is one of the "alert" colors
(e.g. "red").
UNDECIDED: The color is neither an "alert" color
nor an "OK" color (e.g. "yellow").
If the new status shows an ALERT state, then a message to the
xymond_alert(8) module is triggered. This may be a repeat of a
previous alert, but xymond_alert(8) will handle that internally, and
only send alert messages with the interval configured in
alerts.cfg(5).
If the status goes from a not-OK state (ALERT or UNDECIDED) to OK,
and there is a record of having been in a ALERT state previously, then a
recovery message is triggered.
The use of the OK, ALERT and UNDECIDED states make it possible to
avoid being flooded with alerts when a status flip-flops between e.g yellow
and red, or green and yellow.
A lot of functionality in the Xymon server is delegated to
"worker modules" that are fed various events from xymond via a
"channel". Programs access a channel using IPC mechanisms -
specifically, shared memory and semaphores - or by using an instance of the
xymond_channel(8) intermediate program. xymond_channel enables access
to a channel via a simple file I/O interface.
A skeleton program for hooking into a xymond channel is provided
as part of Xymon in the xymond_sample(8) program.
The following channels are provided by xymond:
status This channel is fed the contents of all incoming
"status" and "summary" messages.
stachg This channel is fed information about tests that
change status, i.e. the color of the status-log changes.
page This channel is fed information about tests where the
color changes between an alert color and a non-alert color. It also receives
information about "ack" messages.
data This channel is fed information about all
"data" messages.
notes This channel is fed information about all
"notes" messages.
enadis This channel is fed information about hosts or tests
that are being disabled or enabled.
client This channel is fed the contents of the client
messages sent by Xymon clients installed on the monitored servers.
clichg This channel is fed the contents of a host client
messages, whenever a status for that host goes red, yellow or purple.
Information about the data stream passed on these channels is in
the Xymon source-tree, see the "xymond/new-daemon.txt" file.
- SIGHUP
- Re-read the hosts.cfg configuration file.
- SIGUSR1
- Force an immediate dump of the checkpoint file.
Timeout of incoming connections are not strictly enforced. The
check for a timeout only triggers during the normal network handling loop,
so a connection that should timeout after N seconds may persist until some
activity happens on another (unrelated) connection.
If ghost-handling is enabled via the "--ghosts" option,
the hosts.cfg file is read to determine the names of all known hosts.