pmmgr - pcp daemon manager
pmmgr [-v] [-c config-directory]
[-p polling-interval] [-l log-file]
pmmgr manages a collection of PCP daemons for a set of
discovered local and remote hosts running the Performance Metrics Collection
Daemon (PMCD), according to zero or more configuration directories. It keeps
a matching set of pmie, pmlogger, pmrep and other
daemons running, and their archives/logs merged/rotated. It provides an
alternative to the default pmlogger_daily and pmie_daily
scripts that administer pmlogger and pmie ``farms'' using
cron(3).
pmmgr is largely self-configuring and perseveres despite
most run-time errors. pmmgr runs in the foreground until interrupted.
When signaled, it will stop its running daemons before exiting.
Each poll interval, pmmgr computes a list of possible
targets for a pmcd search. This list is assembled from several
configuration files, and may include explicitly listed specifications, hosts
discovered through several different mechanisms, and/or individual
containers running within them. Once the list is assembled, pmmgr
attempts to make a brief pmNewContext connection to each target, in
order to check for the existence of an actual running pmcd instance,
and to extract a hostid. The hostid is treated as a unique identifier for
the instance, so that redundant connection paths to the same server can be
filtered out. Once the final list of live pmcd instances is
identified, along with their unique hostids, pmmgr ensures that any
requested pcp client daemons are started (or restarted) for them. If any
pmcd instances disappear from the list, its pcp client daemons are
stopped. This entire cycle repeats every poll interval.
A description of the command line options specific to pmmgr
follows:
- -c
- directory adds a given configuration directory to pmmgr.
pmmgr can supervise multiple different configurations at the same
time, so this option may be repeated. Errors in the configuration may be
noted to standard error, but pmmgr will fill in missing information
with built-in defaults. The default directory is
$PCP_SYSCONF_DIR/pmmgr
- -p
- polling-interval sets the host-discovery polling interval to the
given number of seconds. The default is 60. Daemons for a particular
target host will be restarted no more frequently than this interval. There
may be a short-lived thread inside pmmgr for startup and shutdown
of each daemon for each target host.
- -l
- log-file redirects standard output and error to the given log file,
which is created anew
- -v
- adds more verbose tracing to standard output.
A pmmgr configuration identifies which hosts should be
monitored, which daemons should be maintained for them, and what options
those daemons should be run with. pmmgr uses a small number of files
in a configuration directory, instead of lines in a single text file. The
individual files carry zero or more lines of 100% pure configuration text,
and no comments. (If desired, a configuration may be commented upon within
other files, such as a free-form README.)
Some of the configuration files are forked into pairs: per-hostid
and common. This permits numerous almost-identical-configuration targets to
be managed from the same configuration directory. For these files, marked
with * below, pmmgr will concatenate a per-hostid file
(if it exists) and a common file (if it exists) in order to form the
complete configuration item.
For example, for pmie configuration for target hostid
foo, pmmgr will search files named pmie.foo then
pmie. For single-line configuration items, the first file & line
found will "win"; for multi-line configuration items, they all
"win".
This set of configuration files identifies where pmmgr
should search for pmcd instances, how to uniquely identify them, and
where state such as log files should be kept for each. Ideally, a persistent
and unique hostid string is computed for each potential target pmcd from
specified metric values. This hostid is also used as a subdirectory name for
locating daemon data. The rare empty hostid is mapped to "-".
- hostid-static
- This file contains one or more lines specifying the static string that
should be used as multiple distinct hostids for the same target
pmcd. Treatment of the distinct hostids may be customized using
per-hostid configuration files. Specifying values in this file overrides
the hostid-metrics file specified below. It should be noted that
using this option will cause all target pmcds to be assigned the same set
of hostids. Thus, this is useful in monitoring single hosts or if each
monitored host has its own configuration directory.
- hostid-metrics
- This file contains one or more lines of metric specifications in the
format accepted by pmParseMetricSpec. Metrics without instance
specifiers mean all instances of that metric. These are used to generate
the unique hostid string for each pmcd server that
pmmgr discovers. Upon discovery, all the metrics/instances named
are queried, string values fetched, and normalized/concatenated into a
single hyphenated printable string. The default is the single metric
pmcd.hostname, which is sufficient if all the hosts discovered have
unique hostname(2). If they don't, you should add other pcp metric
specifications to set them apart at your site. The more you add, the
longer the hostid string, but the more likely that accidental duplication
is prevented.
However, it may be desirable for a hostid to also be
persistent, so that if the target host goes offline and later
returns, the new hostid matches the previous one, because then old and new
histories can be joined. This argues against using metrics whose values vary
from boot to boot.
Some candidate metrics to consider:
network.interface.hw_addr,
network.interface.inet_addr["eth0"],
network.interface.ipv6_addr, kernel.uname.nodename
- log-directory
- This file contains the path of a directory beneath which the per-hostid
subdirectories are to be created by pmmgr. If it is not a full
path, it is implicitly relative to the configuration directory itself. The
default is $PCP_LOG_DIR/pmmgr/.
- target-host
- This file contains one or more lines containing pmcd host
specifications, as described on the PCPintro(1) manual page. The
default is to target pmcd at local:.
- target-discovery
- This file contains one or more lines containing specifications for the
pmDiscoverServices PMAPI call, each of which may map onto a
fluctuating set of local or remote pmcd servers. Each poll interval,
pmmgr will attempt to rerun discovery with all of the given
specifications. Again, it is not a problem if more than one specification
matches the same actual pmcd: one confirmed access path is arbitrarily
selected. The default is to do no discovery. Consider including
avahi,timeout=5 to rely in pmcd self-announcements on the local
network (searching for up to five seconds each time). Consider including
probe=192.168.1.0/24 to quickly scan the given IP address
range.
- subtarget-containers
- If this file exists, pmmgr will scan each host that is found for
running containers. For each running container, it will create independent
subtargets for running requested daemons. The hostid string for these
subtargets is the host's hostid string, followed by a double-hyphen, then
the full unique container instance-name string.
- target-threads
- This file contains a limit on the number of concurrent threads that
analyze potential target pmcds for their hostids and/or containers. The
default is a few dozen threads per CPU core, if known. Set this to
zero if remote pmcds should be analyzed sequentially. A small number of
threads is not a good idea if any potential target pmcds are unreachable,
since $PMCD_CONNECT_TIMEOUT may be several seconds long each.
- log-subdirectory-gc
- This file may contain a time interval specification as per the
PCPintro(1) manual page. All subdirectories of the log-directory
are presumed to contain data for pmmgr-monitored servers. Those
that have not been modified in at least that long, and not associated with
a currently monitored target, are deleted entirely. This value should be
longer than the longest interval that pmmgr normally recreates
archives (such as due to pmmgr restarts, and pmlogmerge
intervals). The default value is 90days.
This group of configuration options controls a pmlogger
daemon for each host. This may include generating its configuration, and
managing its archives.
- pmlogger*
- If and only if this file exists, pmmgr will maintain a
pmlogger daemon for each targeted host. This file contains one line
of additional space-separated options for the pmlogger daemon.
(pmmgr already adds -h, -H, -f, -r, -l, and perhaps -c.) The
default is to maintain no pmlogger (and no other configuration in
this section is processed).
- pmlogger-timefmt*
- Specify a time format to use in the archive-* name for pmlogger
generated archives. The default is "%Y%m%d.%H%M%S". Expected to
be in strftime(3) format.
- pmlogconf*
- If and only if this file exists, pmmgr will run pmlogconf to
generate a configuration file for each target pmcd. The file
contains one line of space-separated additional options for the
pmlogconf program. pmlogconf's generated output file will be
stored under the log-directory/hostid subdirectory. (pmmgr already
adds -c, -r, and -h.) The default is no pmlogconf, so instead, the
pmlogger file above should probably contain a -c option, to specify
a fixed pmlogger configuration.
Default pmlogger configurations can collect tens of
megabytes of data per day (possibly split into multiple archives), per
target host. If your disk space is less than infinite, or archive-splitting
unwieldy, this should be managed. In the default, unmanaged case, the system
administrator is responsible for managing the individual archive-*
files from the per-host logging subdirectories. pmmgr offers several
other options, each representing different performance and usability
tradeoffs.
This style of archive log management regularly creates a single
merged archive from prior archives for each target host, in effect lopping
off old data and appending the new. A single merged archive can be
relatively large (defaults to approximately 100-400 MB per host), and puts a
corresponding I/O load on storage, but is most convenient for a detailed
long-timeframe analysis. Once pmlogger is restarted, it always
creates a new archive, so in the steady state, there will be one merged
archive of recent history, and one current archive being written-to by
pmlogger.
- pmlogmerge*
- If this file exists, pmmgr will run pmlogextract to
periodically merge together preexisting log archives for each target pmcd
into a single large one. Then, the preexisting log archives are deleted
(including any prior merged ones). This configuration file may contain a
time interval specification as per the PCPintro(1) manual page,
representing the period after which pmlogger should be temporarily
stopped, and archives merged. It represents the maximum amount of time
that the merged archive lags the present time. The default is
24hours.
- pmlogmerge-granular*
- If this file also exists, pmmgr will merge only a subset of
preexisting log archives into the new one, instead of all of them, so as
to approximate a granular, aligned set of merged archives. The subset
chosen corresponds to the previous time interval specified by the
pmlogmerge control file. The default is no granularity.
- pmlogcheck-corrupt-gc*
- Before archives are considered for merging, they are processed through
pmlogcheck to check for corruption. In the unlikely case of a
problem, such archives are renamed out of the way (named
"corrupt-*"), and retained up to a limited time. This file
specifies how long. If this file exists, it the time interval it contains
is the maximum age. The default is 90days. To store corrupt
archives indefinitely, set this to a large quantity like
"99999weeks".
- pmlogmerge-rewrite*
- If this file exists, pmmgr will run pmlogrewrite -i (plus
any other options listed in this file) on each input archive before
merging it. This will naturally require more disk I/O. The default is
no rewriting.
- pmlogmerge-retain*
- pmmgr reduces/deletes any original-resolution archives after a time
period specified by this file, as measured by the file mtime. The period
will also be passed to pmlogextract as a negative parameter to
-S. The default is 14days. To store archives indefinitely,
set this to a large quantity like "99999weeks".
- pmlogreduce*
- If this file exists, then prior to removing archives that expire past the
pmlogmerge-retain period, they are processed with
pmlogreduce to create reduced archives (named reduced-*). If
the file contains space-separated options, they are passed onto
pmlogreduce. (By default, pmlogreduce down-samples to a 600-second
interval.)
- pmlogreduce-retain*
- If this file exists, then reduced archives (identified by the
reduced-* pattern) are deleted after a time period specified by
this file, as measured from the file mtime. Since this time is likely that
of the pmlogreduce run, the total retention time will be approximately the
pmlogmerge-retain time plus the pmlogreduce-retain time. The
default is 90days. To store reduced archives indefinitely, set this
to a large quantity like "99999weeks".
- disk-full-threshold
- If this file exists, then pmmgr will track the disk space available
where pmlogger archives are kept. If that partition fills up past the
configured percentage, pmmgr will linearly reduce the duration logs
are kept via the disk-full-retention variable.
Values must be greater than zero, and expressed either a value
between 0 and 1, or decimal value between 1 and 100.
- disk-full-retention
- If expressed, this variable scales the rate at which logs are culled when
disk-full-threshold has been surpassed. A lower percentage will
cull logs more quickly (in favour of preserving disk space), while a
higher percentage will opt to retain more pcp archives.
Normalized
Full Threshold |
Full
Retention |
Final Retention Factor |
1 |
0.0 |
0.0 |
0.75 |
0.0 |
0.25 |
0.5 |
0.0 |
0.5 |
0.0 |
0.0 |
1.0 |
1 |
0.5 |
0.5 |
0.75 |
0.5 |
0.625 |
0.5 |
0.5 |
0.75 |
0.0 |
0.5 |
1.0 |
1 |
1.0 |
1.0 |
0.75 |
1.0 |
1.0 |
0.5 |
1.0 |
1.0 |
0.0 |
1.0 |
1.0 |
This group of configuration options controls a pmie daemon
for each host. This may include generating a custom configuration.
- pmie*
- If and only if this file exists, pmmgr will maintain a pmie
daemon for each targeted pmcd. This file contains one line of
additional space-separated options for the pmie daemon.
(pmmgr already adds -h, -f, -l, and perhaps -c.) The default is to
maintain no pmie (and no other configuration in this section is
processed).
- pmieconf*
- If and only if this file exists, pmmgr will run pmieconf to
generate a configuration file for each target pmcd. The file
contains one line of space-separated additional options for the
pmieconf program. pmieconf- generated output file will be
stored under the log-directory/hostid subdirectory. (pmmgr already
adds -F, -c, and -f.) The default is no pmieconf, so instead, the
pmie file above should probably contain a -c option, to specify a
fixed pmie configuration.
pmmgr may be used to invoke arbitrary PCP client programs
for each target pmcd. This can enable automated invocation of
reporting or relaying tools, such as pmrep, pcp2graphite or
pcp2influxdb without needing a specialized system service.
- monitor*
- If this file exists, then for each line in this file, a new background
process will be invoked. (It is restarted if it exits.) The line specifies
the beginning of the command line (including the program name);
pmmgr appends a -h HOSTSPEC, and arranges to collect the standard
output and standard error into separate monitor-NN.out and
monitor-NN.err files under the log directory. Errors messages in
the latter are transcribed to pmmgr's own logs.
- $PCP_SYSCONF_DIR/pmmgr/
- default configuration directory
- $PCP_LOG_DIR/pmmgr/
- default logging directory
Environment variables with the prefix PCP_ are used to
parametrize the file and directory names used by PCP. On each installation,
the file /etc/pcp.conf contains the local values for these variables.
The $PCP_CONF variable may be used to specify an alternative
configuration file, as described in pcp.conf(5).
PCPIntro(1), cron(1), pmcd(1),
pmlogconf(1), pmlogger(1), pmlogger_daily(1),
pmieconf(1), pmie(1), pmie_daily(1), pmrep(1),
pcp2graphite(1), pcp2influxdb(1), pmlogreduce(1),
pcp.conf(5) and pcp.env(5).