COLLECTD-THRESHOLD(5) | collectd | COLLECTD-THRESHOLD(5) |
collectd-threshold - Documentation of collectd's Threshold plugin
LoadPlugin "threshold" <Plugin "threshold"> <Type "foo"> WarningMin 0.00 WarningMax 1000.00 FailureMin 0.00 FailureMax 1200.00 Invert false Instance "bar" </Type> </Plugin>
Starting with version 4.3.0 collectd has support for monitoring. By that we mean that the values are not only stored or sent somewhere, but that they are judged and, if a problem is recognized, acted upon. The only action the Threshold plugin takes itself is to generate and dispatch a notification. Other plugins can register to receive notifications and perform appropriate further actions.
Since systems and what you expect them to do differ a lot, you can configure thresholds for your values freely. This gives you a lot of flexibility but also a lot of responsibility.
Every time a value is out of range, a notification is dispatched. This means that the idle percentage of your CPU needs to be less then the configured threshold only once for a notification to be generated. There's no such thing as a moving average or similar - at least not now.
Also, all values that match a threshold are considered to be relevant or "interesting". As a consequence collectd will issue a notification if they are not received for Timeout iterations. The Timeout configuration option is explained in section "GLOBAL OPTIONS" in collectd.conf(5). If, for example, Timeout is set to "2" (the default) and some hosts sends its CPU statistics to the server every 60 seconds, a notification will be dispatched after about 120 seconds. It may take a little longer because the timeout is checked only once each Interval on the server.
When a value comes within range again or is received after it was missing, an "OKAY-notification" is dispatched.
Here is a configuration example to get you started. Read below for more information.
LoadPlugin "threshold" <Plugin "threshold"> <Type "foo"> WarningMin 0.00 WarningMax 1000.00 FailureMin 0.00 FailureMax 1200.00 Invert false Instance "bar" </Type> <Plugin "interface"> Instance "eth0" <Type "if_octets"> FailureMax 10000000 DataSource "rx" </Type> </Plugin> <Host "hostname"> <Type "cpu"> Instance "idle" FailureMin 10 </Type> <Plugin "memory"> <Type "memory"> Instance "cached" WarningMin 100000000 </Type> </Plugin> <Type "load"> DataSource "midterm" FailureMax 4 Hits 3 Hysteresis 3 </Type> </Host> </Plugin>
There are basically two types of configuration statements: The "Host", "Plugin", and "Type" blocks select the value for which a threshold should be configured. The "Plugin" and "Type" blocks may be specified further using the "Instance" option. You can combine the block by nesting the blocks, though they must be nested in the above order, i.e. "Host" may contain either "Plugin" and "Type" blocks, "Plugin" may only contain "Type" blocks and "Type" may not contain other blocks. If multiple blocks apply to the same value the most specific block is used.
The other statements specify the threshold to configure. They must be included in a "Type" block. Currently the following statements are recognized:
Normally, all data sources are checked against a configured threshold. If this is undesirable, or if you want to specify different limits for each data source, you can use the DataSource option to have a threshold apply only to one data source.
This applies to missing values, too: If set to true a notification about a missing value is generated once every Interval seconds. If set to false only one such notification is generated until the value appears again.
Florian Forster <octo at collectd.org>
2023-02-20 | 5.12.0.git |