PMCD(1) | General Commands Manual | PMCD(1) |
pmcd - performance metrics collector daemon
pmcd [-AfQSv] [-c config] [-C dirname] [-H hostname] [-i ipaddress] [-l logfile] [-L bytes] [-M certname] [-[n|N] pmnsfile] [-p port[,port ...]] [-P passfile] [-q timeout] [-s sockname] [-T traceflag] [-t timeout] [-U username] [-x file]
pmcd is the collector used by the Performance Co-Pilot (see PCPIntro(1)) to gather performance metrics on a system. As a rule, there must be an instance of pmcd running on a system for any performance metrics to be available to the PCP.
pmcd accepts connections from client applications running either on the same machine or remotely and provides them with metrics and other related information from the machine that pmcd is executing on. pmcd delegates most of this request servicing to a collection of Performance Metrics Domain Agents (or just agents), where each agent is responsible for a particular group of metrics, known as the domain of the agent. For example the postgresql agent is responsible for reporting information relating to the PostgreSQL database, such as the transaction and query counts, indexing and replication statistics, and so on.
The agents may be processes started by pmcd, independent processes or Dynamic Shared Objects (DSOs, see dlopen(3)) attached to pmcd's address space. The configuration section below describes how connections to agents are specified.
The options to pmcd are as follows.
Once pmcd is running, the timeout may be dynamically modified by storing an integer value (the timeout in seconds) into the metric pmcd.control.timeout via pmstore(1).
By default, event tracing is buffered using a circular buffer that is over-written as new events are recorded. The default buffer size holds the last 20 events, although this number may be over-ridden by using pmstore(1) to modify the metric pmcd.control.tracebufs.
Similarly once pmcd is running, the event tracing control may be dynamically modified by storing 1 (enable) or 0 (disable) into the metrics pmcd.control.traceconn, pmcd.control.tracepdu and pmcd.control.tracenobuf. These metrics map to the bit fields associated with the traceflag argument for the -T option.
When operating in buffered mode, the event trace buffer will be dumped whenever an agent connection is terminated by pmcd, or when any value is stored into the metric pmcd.control.dumptrace via pmstore(1).
In unbuffered mode, every event will be reported when it occurs.
If a PDU exchange with an agent times out, the agent has violated the requirement that it delivers metrics with little or no delay. This is deemed a protocol failure and the agent is disconnected from pmcd. Any subsequent requests for information from the agent will fail with a status indicating that there is no agent to provide it.
It is possible to specify access control to pmcd based on users, groups and hosts. This allows one to prevent users, groups of users, and certain hosts from accessing the metrics provided by pmcd and is described in more detail in the Section on ACCESS CONTROL below.
On startup pmcd looks for a configuration file named $PCP_PMCDCONF_PATH. This file specifies which agents cover which performance metrics domains and how pmcd should make contact with the agents. An optional section specifying access controls may follow the agent configuration data.
Warning: pmcd is usually started as part of the boot sequence and runs initially as root. The configuration file may contain shell commands to create agents, which will be executed by root. To prevent security breaches the configuration file should be writable only by root. The use of absolute path names is also recommended.
The case of the reserved words in the configuration file is unimportant, but elsewhere, the case is preserved.
Blank lines and comments are permitted (even encouraged) in the configuration file. A comment begins with a ``#'' character and finishes at the end of the line. A line may be continued by ensuring that the last character on the line is a ``\'' (backslash). A comment on a continued line ends at the end of the continued line. Spaces may be included in lexical elements by enclosing the entire element in double quotes. A double quote preceded by a backslash is always a literal double quote. A ``#'' in double quotes or preceded by a backslash is treated literally rather than as a comment delimiter. Lexical elements and separators are described further in the following sections.
Each line of the agent configuration section of the configuration file contains details of how to connect pmcd to one of its agents and specifies which metrics domain the agent deals with. An agent may be attached as a DSO, or via a socket, or a pair of pipes.
Each line of the agent configuration section of the configuration file must be either an agent specification, a comment, or a blank line. Lexical elements are separated by whitespace characters, however a single agent specification may not be broken across lines unless a \ (backslash) is used to continue the line.
Each agent specification must start with a textual label (string) followed by an integer in the range 1 to 510. The label is a tag used to refer to the agent and the integer specifies the domain for which the agent supplies data. This domain identifier corresponds to the domain portion of the PMIDs handled by the agent. Each agent must have a unique label and domain identifier.
For DSO agents a line of the form:
should appear. Where,
For agents providing socket connections, a line of the form
should appear. Where,
For agents interacting with the pmcd via stdin/stdout, a line of the form:
should appear. Where,
The access control section of the configuration file is optional, but if present it must follow the agent configuration data. The case of reserved words is ignored, but elsewhere case is preserved. Lexical elements in the access control section are separated by whitespace or the special delimiter characters: square brackets (``['' and ``]''), braces (``{'' and ``}''), colon (``:''), semicolon (``;'') and comma (``,''). The special characters are not treated as special in the agent configuration section. Lexical elements may be quoted (double quotes) as necessary.
The access control section of the file must start with a line of the form:
In addition to (or instead of) the access section in the pmcd configuration file, access control specifications are also read from a file having the same name as the pmcd configuration file, but with '.access' appended to the name. This optional file must not contain the [access] keyword.
Leading and trailing whitespace may appear around and within the brackets and the case of the access keyword is ignored. No other text may appear on the line except a trailing comment.
Following this line, the remainder of the configuration file should contain lines that allow or disallow operations from particular hosts or groups of hosts.
There are two kinds of operations that occur via pmcd:
Access to pmcd can be granted in three ways - by user, group of users, or at a host level. In the latter, all users on a host are granted the same level of access, unless the user or group access control mechanism is also in use.
User names and group names will be verified using the local /etc/passwd and /etc/groups files (or an alternative directory service), using the getpwent(3) and getgrent(3) routines.
Hosts may be identified by name, IP address, IPv6 address or by the special host specifications ``"unix:"'' or ``"local:"''. ``"unix:"'' refers to pmcd's unix domain socket, on supported platforms. ``"local:"'' is equivalent to specifying ``"unix:"'' and ``localhost``.
Wildcards may also be specified by ending the host identifier with the single wildcard character ``*'' as the last-given component of an address. The wildcard ``".*"'' refers to all inet (IPv4) addresses. The wildcard ``":*"'' refers to all IPv6 addresses. If an IPv6 wildcard contains a ``::'' component, then the final ``*'' refers to the final 16 bits of the address only, otherwise it refers to the remaining unspecified bits of the address.
The wildcard ``*'' refers to all users, groups or host addresses, including ``"unix:"''. Names of users, groups or hosts may not be wildcarded.
The following are all valid host identifiers:
boing localhost giggle.melbourne.sgi.com 129.127.112.2 129.127.114.* 129.* .* fe80::223:14ff:feaf:b62c fe80::223:14ff:feaf:* fe80:* :* "unix:" "local:" *
The following are not valid host identifiers:
*.melbourne 129.127.*.* 129.*.114.9 129.127* fe80::223:14ff:*:* fe80::223:14ff:*:b62c fe80*
The first example is not allowed because only (numeric) IP addresses may contain a wildcard. The second and fifth examples are not valid because there is more than one wildcard character. The third and sixth contain an embedded wildcard, the fourth and seventh have a wildcard character that is not the last component of the address (the last components are 127* and fe80* respectively).
The name localhost is given special treatment to make the behavior of host wildcarding consistent. Rather than being 127.0.0.1 and ::1, it is mapped to the primary inet and IPv6 addresses associated with the name of the host on which pmcd is running. Beware of this when running pmcd on multi-homed hosts.
Access for users, groups or hosts are allowed or disallowed by specifying statements of the form:
Either plural or singular forms of users, groups, and hosts keywords are allowed. If this keyword is omitted, a default of hosts will be used. This behaviour is for backward-compatibility only, it is preferable to be explicit.
Where no specific allow or disallow statement applies to an operation, the default is to allow the operation from all users, groups and hosts. In the trivial case when there is no access control section in the configuration file, all operations from all users, groups, and hosts are permitted.
If a new connection to pmcd is attempted by a user, group or host that is not permitted to perform any operations, the connection will be closed immediately after an error response PM_ERR_PERMISSION has been sent to the client attempting the connection.
Statements with the same level of wildcarding specifying identical hosts may not contradict each other. For example if a host named clank had an IP address of 129.127.112.2, specifying the following two rules would be erroneous:
allow host clank : fetch, store; disallow host 129.127.112.2 : all except fetch;
because they both refer to the same host, but disagree as to whether the fetch operation is permitted from that host.
Statements containing more specific host specifications override less specific ones according to the level of wildcarding. For example a rule of the form
allow host clank : all;
overrides
disallow host 129.127.112.* : all except fetch;
because the former contains a specific host name (equivalent to a fully specified IP address), whereas the latter has a wildcard. In turn, the latter would override
disallow host * : all;
It is possible to limit the number of connections from a user, group or host to pmcd. This may be done by adding a clause of the form
to the operations list of an allow statement. Such a clause may not be used in a disallow statement. Here, n is the maximum number of connections that will be accepted from the user, group or host matching the identifier(s) used in the statement.
An access control statement with a list of user, group or host identifiers is equivalent to a set of access control statements, with each specifying one of the identifiers in the list and all with the same access controls (both permissions and connection limits). A group should be used if you want users to contribute to a shared connection limit. A wildcard should be used if you want hosts to contribute to a shared connection limit.
When a new client requests a connection, and pmcd has determined that the client has permission to connect, it searches the matching list of access control statements for the most specific match containing a connection limit. For brevity, this will be called the limiting statement. If there is no limiting statement, the client is granted a connection. If there is a limiting statement and the number of pmcd clients with user ID, group ID, or IP addresses that match the identifier in the limiting statement is less than the connection limit in the statement, the connection is allowed. Otherwise the connection limit has been reached and the client is refused a connection.
Group access controls and the wildcarding in host identifiers means that once pmcd actually accepts a connection from a client, the connection may contribute to the current connection count of more than one access control statement - the client's host may match more than one access control statement, and similarly the user ID may be in more than one group. This may be significant for subsequent connection requests.
Note that pmcd enters a mode where it runs effectively with a higher-level of security as soon as a user or group access control section is added to the configuration. In this mode only authenticated connections are allowed - either from a SASL authenticated connection, or a Unix domain socket (which implicitly passes client credentials). This is the same mode that is entered explicitly using the -S option. Assuming permission is allowed, one can determine whether pmcd is running in this mode by querying the value of the pmcd.feature.creds_required metric.
Note also that because most specific match semantics are used when checking the connection limit, for the host-based access control case, priority is given to clients with more specific host identifiers. It is also possible to exceed connection limits in some situations. Consider the following:
This says that only 2 client connections at a time are permitted for all hosts other than "clank", which is permitted 5. If a client from host "boing" is the first to connect to pmcd, its connection is checked against the second statement (that is the most specific match with a connection limit). As there are no other clients, the connection is accepted and contributes towards the limit for only the second statement above. If the next client connects from "clank", its connection is checked against the limit for the first statement. There are no other connections from "clank", so the connection is accepted. Once this connection is accepted, it counts towards both statements' limits because "clank" matches the host identifier in both statements. Remember that the decision to accept a new connection is made using only the most specific matching access control statement with a connection limit. Now, the connection limit for the second statement has been reached. Any connections from hosts other than "clank" will be refused.
If instead, pmcd with no clients saw three successive connections arrived from "boing", the first two would be accepted and the third refused. After that, if a connection was requested from "clank" it would be accepted. It matches the first statement, which is more specific than the second, so the connection limit in the first is used to determine that the client has the right to connect. Now there are 3 connections contributing to the second statement's connection limit. Even though the connection limit for the second statement has been exceeded, the earlier connections from "boing" are maintained. The connection limit is only checked at the time a client attempts a connection rather than being re-evaluated every time a new client connects to pmcd.
This gentle scheme is designed to allow reasonable limits to be imposed on a first come first served basis, with specific exceptions.
As illustrated by the example above, a client's connection is honored once it has been accepted. However, pmcd reconfiguration (see the next section) re-evaluates all the connection counts and will cause client connections to be dropped where connection limits have been exceeded.
If the configuration file has been changed or if an agent is not responding because it has terminated or the PMNS has been changed, pmcd may be reconfigured by sending it a SIGHUP, as in
# pmsignal -a -s HUP pmcd
When pmcd receives a SIGHUP, it checks the configuration file for changes. If the file has been modified, it is reparsed and the contents become the new configuration. If there are errors in the configuration file, the existing configuration is retained and the contents of the file are ignored. Errors are reported in the pmcd log file.
It also checks the PMNS file and any labels files for changes. If any of these files have been modified, then the PMNS and/or context labels are reloaded. Use of tail(1) on the log file is recommended while reconfiguring pmcd.
If the configuration for an agent has changed (any parameter except the agent's label is different), the agent is restarted. Agents whose configurations do not change are not restarted. Any existing agents not present in the new configuration are terminated. Any deceased agents are that are still listed are restarted.
Sometimes it is necessary to restart an agent that is still running, but malfunctioning. Simply stop the agent (e.g. using SIGTERM from pmsignal(1)), then send pmcd a SIGHUP, which will cause the agent to be restarted.
Normally, pmcd is started automatically at boot time and stopped when the system is being brought down. Under certain circumstances it is necessary to start or stop pmcd manually. To do this one must become superuser and type
# $PCP_RC_DIR/pmcd start
to start pmcd, or
# $PCP_RC_DIR/pmcd stop
to stop pmcd. Starting pmcd when it is already running is the same as stopping it and then starting it again.
Sometimes it may be necessary to restart pmcd during another phase of the boot process. Time-consuming parts of the boot process are often put into the background to allow the system to become available sooner (e.g. mounting huge databases). If an agent run by pmcd requires such a task to complete before it can run properly, it is necessary to restart or reconfigure pmcd after the task completes. Consider, for example, the case of mounting a database in the background while booting. If the PMDA which provides the metrics about the database cannot function until the database is mounted and available but pmcd is started before the database is ready, the PMDA will fail (however pmcd will still service requests for metrics from other domains). If the database is initialized by running a shell script, adding a line to the end of the script to reconfigure pmcd (by sending it a SIGHUP) will restart the PMDA (if it exited because it couldn't connect to the database). If the PMDA didn't exit in such a situation it would be necessary to restart pmcd because if the PMDA was still running pmcd would not restart it.
Normally pmcd listens for client connections on TCP/IP port number 44321 (registered at http://www.iana.org/). Either the environment variable PMCD_PORT or the -p command line option may be used to specify alternative port number(s) when pmcd is started; in each case, the specification is a comma-separated list of one or more numerical port numbers. Should both methods be used or multiple -p options appear on the command line, pmcd will listen on the union of the set of ports specified via all -p options and the PMCD_PORT environment variable. If non-default ports are used with pmcd care should be taken to ensure that PMCD_PORT is also set in the environment of any client application that will connect to pmcd, or that the extended host specification syntax is used (see PCPIntro(1) for details).
In addition to the PCP environment variables described in the PCP ENVIRONMENT section below, the PMCD_PORT variable is also recognised as the TCP/IP port for incoming connections (default 44321), and the PMCD_SOCKET variable is also recognised as the path to be used for the Unix domain socket.
If set to the value 1, the PMCD_LOCAL environment variable will cause pmcd to run in a localhost-only mode of operation, where it binds only to the loopback interface. The pmcd.feature.local metric can be queried to determine if pmcd is running in this mode.
The PMCD_MAXPENDING variable can be set to indicate the maximum length to which the queue of pending client connections may grow.
The PMCD_ROOT_AGENT variable controls whether or not pmcd or pmdaroot (when available), start subsequent pmdas. When set to a non-zero value, pmcd will opt to have pmdaroot start, and stop, PMDAs.
The PMCD_RESTART_AGENTS variable determines the behaviour of pmcd in the presence of child PMDAs that have been observed to exit (this is a typical response in the presence of very large, usually domain-induced, PDU latencies). When set to a non-zero value, pmcd will attempt to restart such PMDAS once every minute. When set to zero, it uses the original behaviour of just logging the failure.
Environment variables with the prefix PCP_ are used to parameterize the file and directory names used by PCP. On each installation, the file /etc/pcp.conf contains the local values for these variables. The $PCP_CONF variable may be used to specify an alternative configuration file, as described in pcp.conf(5).
If pmcd is already running the message "Error: OpenRequestSocket bind: Address may already be in use" will appear. This may also appear if pmcd was shutdown with an outstanding request from a client. In this case, a request socket has been left in the TIME_WAIT state and until the system closes it down (after some timeout period) it will not be possible to run pmcd.
In addition to the standard PCP debugging flags, see pmdbg(1), pmcd currently uses the options: appl0 for tracing I/O and termination of agents, appl1 for tracing access control and appl2 for tracing the configuration file scanner and parser.
pmcd does not explicitly terminate its children (agents), it only closes their pipes. If an agent never checks for a closed pipe it may not terminate.
The configuration file parser will only read lines of less than 1200 characters. This is intended to prevent accidents with binary files.
The timeouts controlled by the -t option apply to IPC between pmcd and the PMDAs it spawns. This is independent of settings of the environment variables PMCD_CONNECT_TIMEOUT and PMCD_REQUEST_TIMEOUT (see PCPIntro(1)) which may be used respectively to control timeouts for client applications trying to connect to pmcd and trying to receive information from pmcd.
PCPIntro(1), pmdbg(1), pmerr(1), pmgenmap(1), pminfo(1), pmrep(1), pmstat(1), pmstore(1), pmval(1), getpwent(3), getgrent(3), pcp.conf(5), and pcp.env(5).
PCP | Performance Co-Pilot |