Monitoring overview
The aim of this part of the documentation is to explain the Ceph monitoring stack and the meaning of the main Ceph metrics.
With a good understand of the Ceph monitoring stack and metrics users can create customized monitoring tools, like Prometheus queries, Grafana dashboards, or scripts.
Ceph Monitoring stack
Ceph provides a default monitoring stack wich is installed by cephadm and explained in the Monitoring Services section of the cephadm documentation.
Ceph metrics
The main source for Ceph metrics are the performance counters exposed by each Ceph daemon. The Perf counters are native Ceph monitoring data
Performance counters are transformed into standard Prometheus metrics by the Ceph exporter daemon. This daemon runs on every Ceph cluster host and exposes a metrics end point where all the performance counters exposed by all the Ceph daemons running in the host are published in the form of Prometheus metrics.
In addition to the Ceph exporter, there is another agent to expose Ceph metrics. It is the Prometheus manager module, wich exposes metrics related to the whole cluster, basically metrics that are not produced by individual Ceph daemons.
The main source for obtaining Ceph metrics is the metrics endpoint exposed by the Cluster Prometheus server. Ceph can provide you with the Prometheus endpoint where you can obtain the complete list of metrics (coming from Ceph exporter daemons and Prometheus manager module) and exeute queries.
Use the following command to obtain the Prometheus server endpoint in your cluster:
Example:
# ceph orch ps --service_name prometheus
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID
prometheus.cephtest-node-00 cephtest-node-00.cephlab.com *:9095 running (103m) 50s ago 5w 142M - 2.33.4 514e6a882f6e efe3cbc2e521
With this information you can connect to
http://cephtest-node-00.cephlab.com:9095
to access the Prometheus server
interface.
And the complete list of metrics (with help) for your cluster will be available in:
http://cephtest-node-00.cephlab.com:9095/api/v1/targets/metadata
It is good to outline that the main tool allowing users to observe and monitor a Ceph cluster is the Ceph dashboard. It provides graphics where the most important cluster and service metrics are represented. Most of the examples in this document are extracted from the dashboard graphics or extrapolated from the metrics exposed by the Ceph dashboard.
Performance metrics
Main metrics used to measure Cluster Ceph performance:
All metrics have the following labels:
ceph_daemon
: identifier of the OSD daemon generating the metric
instance
: the IP address of the ceph exporter instance exposing the metric.
job
: prometheus scrape job
Example:
ceph_osd_op_r{ceph_daemon="osd.0", instance="192.168.122.7:9283", job="ceph"} = 73981
Cluster I/O (throughput):
Use ceph_osd_op_r_out_bytes
and ceph_osd_op_w_in_bytes
to obtain the cluster throughput generated by clients
Example:
Writes (B/s):
sum(irate(ceph_osd_op_w_in_bytes[1m]))
Reads (B/s):
sum(irate(ceph_osd_op_r_out_bytes[1m]))
Cluster I/O (operations):
Use ceph_osd_op_r
, ceph_osd_op_w
to obtain the number of operations generated by clients
Example:
Writes (ops/s):
sum(irate(ceph_osd_op_w[1m]))
Reads (ops/s):
sum(irate(ceph_osd_op_r[1m]))
Latency:
Use ceph_osd_op_latency_sum
wich represents the delay before a OSD transfer of data begins following a client instruction for its transfer
Example:
sum(irate(ceph_osd_op_latency_sum[1m]))
OSD performance
The previous explained cluster performance metrics are based in OSD metrics, selecting the right label we can obtain for a single OSD the same performance information explained for the cluster:
Example:
OSD 0 read latency
irate(ceph_osd_op_r_latency_sum{ceph_daemon=~"osd.0"}[1m]) / on (ceph_daemon) irate(ceph_osd_op_r_latency_count[1m])
OSD 0 write IOPS
irate(ceph_osd_op_w{ceph_daemon=~"osd.0"}[1m])
OSD 0 write thughtput (bytes)
irate(ceph_osd_op_w_in_bytes{ceph_daemon=~"osd.0"}[1m])
OSD.0 total raw capacity available
ceph_osd_stat_bytes{ceph_daemon="osd.0", instance="cephtest-node-00.cephlab.com:9283", job="ceph"} = 536451481
Physical disk performance:
Combining Prometheus node_exporter
metrics with Ceph metrics we can have
information about the performance provided by physical disks used by OSDs.
Example:
Read latency of device used by OSD 0:
label_replace(irate(node_disk_read_time_seconds_total[1m]) / irate(node_disk_reads_completed_total[1m]), "instance", "$1", "instance", "([^:.]*).*") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.*)"), "instance", "$1", "instance", "([^:.]*).*")
Write latency of device used by OSD 0
label_replace(irate(node_disk_write_time_seconds_total[1m]) / irate(node_disk_writes_completed_total[1m]), "instance", "$1", "instance", "([^:.]*).*") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.*)"), "instance", "$1", "instance", "([^:.]*).*")
IOPS (device used by OSD.0)
reads:
label_replace(irate(node_disk_reads_completed_total[1m]), "instance", "$1", "instance", "([^:.]*).*") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.*)"), "instance", "$1", "instance", "([^:.]*).*")
writes:
label_replace(irate(node_disk_writes_completed_total[1m]), "instance", "$1", "instance", "([^:.]*).*") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.*)"), "instance", "$1", "instance", "([^:.]*).*")
Throughput (device used by OSD.0)
reads:
label_replace(irate(node_disk_read_bytes_total[1m]), "instance", "$1", "instance", "([^:.]*).*") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.*)"), "instance", "$1", "instance", "([^:.]*).*")
writes:
label_replace(irate(node_disk_written_bytes_total[1m]), "instance", "$1", "instance", "([^:.]*).*") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.*)"), "instance", "$1", "instance", "([^:.]*).*")
Physical Device Utilization (%) for OSD.0 in the last 5 minutes
label_replace(irate(node_disk_io_time_seconds_total[5m]), "instance", "$1", "instance", "([^:.]*).*") and on (instance, device) label_replace(label_replace(ceph_disk_occupation_human{ceph_daemon=~"osd.0"}, "device", "$1", "device", "/dev/(.*)"), "instance", "$1", "instance", "([^:.]*).*")
Pool metrics
These metrics have the following labels:
instance
: the ip address of the Ceph exporter daemon producing the metric.
pool_id
: identifier of the pool
job
: prometheus scrape job
ceph_pool_metadata
: Information about the pool It can be used together with other metrics to provide more contextual information in queries and graphs. Apart of the three common labels this metric provide the following extra labels:compression_mode
: compression used in the pool (lz4, snappy, zlib, zstd, none). Example: compression_mode=”none”description
: brief description of the pool type (replica:number of replicas or Erasure code: ec profile). Example: description=”replica:3”name
: name of the pool. Example: name=”.mgr”type
: type of pool (replicated/erasure code). Example: type=”replicated”
ceph_pool_bytes_used
: Total raw capacity consumed by user data and associated overheads by pool (metadata + redundancy):ceph_pool_stored
: Total of CLIENT data stored in the poolceph_pool_compress_under_bytes
: Data eligible to be compressed in the poolceph_pool_compress_bytes_used
: Data compressed in the poolceph_pool_rd
: CLIENT read operations per pool (reads per second)ceph_pool_rd_bytes
: CLIENT read operations in bytes per poolceph_pool_wr
: CLIENT write operations per pool (writes per second)ceph_pool_wr_bytes
: CLIENT write operation in bytes per pool
Useful queries:
Total raw capacity available in the cluster:
sum(ceph_osd_stat_bytes)
Total raw capacity consumed in the cluster (including metadata + redundancy):
sum(ceph_pool_bytes_used)
Total of CLIENT data stored in the cluster:
sum(ceph_pool_stored)
Compression savings:
sum(ceph_pool_compress_under_bytes - ceph_pool_compress_bytes_used)
CLIENT IOPS for a pool (testrbdpool)
reads: irate(ceph_pool_rd[1m]) * on(pool_id) group_left(instance,name) ceph_pool_metadata{name=~"testrbdpool"}
writes: irate(ceph_pool_wr[1m]) * on(pool_id) group_left(instance,name) ceph_pool_metadata{name=~"testrbdpool"}
CLIENT Throughput for a pool
reads: irate(ceph_pool_rd_bytes[1m]) * on(pool_id) group_left(instance,name) ceph_pool_metadata{name=~"testrbdpool"}
writes: irate(ceph_pool_wr_bytes[1m]) * on(pool_id) group_left(instance,name) ceph_pool_metadata{name=~"testrbdpool"}
Object metrics
These metrics have the following labels:
instance
: the ip address of the ceph exporter daemon providing the metric
instance_id
: identifier of the rgw daemon
job
: prometheus scrape job
Example:
ceph_rgw_req{instance="192.168.122.7:9283", instance_id="154247", job="ceph"} = 12345
Generic metrics
ceph_rgw_metadata
: Provides generic information about the RGW daemon. It can be used together with other metrics to provide more contextual information in queries and graphs. Apart from the three common labels, this metric provides the following extra labels:ceph_daemon
: Name of the Ceph daemon. Example: ceph_daemon=”rgw.rgwtest.cephtest-node-00.sxizyq”,ceph_version
: Version of Ceph daemon. Example: ceph_version=”ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)”,hostname
: Name of the host where the daemon runs. Example: hostname:”cephtest-node-00.cephlab.com”,
ceph_rgw_req
: Number total of requests for the daemon (GET+PUT+DELETE)Useful to detect bottlenecks and optimize load distribution.
ceph_rgw_qlen
: RGW operations queue length for the daemon.Useful to detect bottlenecks and optimize load distribution.
ceph_rgw_failed_req
: Aborted requests.Useful to detect daemon errors
Useful queries
The average of get latencies:
rate(ceph_rgw_get_initial_lat_sum[30s]) / rate(ceph_rgw_get_initial_lat_count[30s]) * on (instance_id) group_left (ceph_daemon) ceph_rgw_metadata
The average of put latencies:
rate(ceph_rgw_put_initial_lat_sum[30s]) / rate(ceph_rgw_put_initial_lat_count[30s]) * on (instance_id) group_left (ceph_daemon) ceph_rgw_metadata
Total requests per second:
rate(ceph_rgw_req[30s]) * on (instance_id) group_left (ceph_daemon) ceph_rgw_metadata
Total number of "other" operations (LIST, DELETE)
rate(ceph_rgw_req[30s]) - (rate(ceph_rgw_get[30s]) + rate(ceph_rgw_put[30s]))
GET latencies
rate(ceph_rgw_get_initial_lat_sum[30s]) / rate(ceph_rgw_get_initial_lat_count[30s]) * on (instance_id) group_left (ceph_daemon) ceph_rgw_metadata
PUT latencies
rate(ceph_rgw_put_initial_lat_sum[30s]) / rate(ceph_rgw_put_initial_lat_count[30s]) * on (instance_id) group_left (ceph_daemon) ceph_rgw_metadata
Bandwidth consumed by GET operations
sum(rate(ceph_rgw_get_b[30s]))
Bandwidth consumed by PUT operations
sum(rate(ceph_rgw_put_b[30s]))
Bandwidth consumed by RGW instance (PUTs + GETs)
sum by (instance_id) (rate(ceph_rgw_get_b[30s]) + rate(ceph_rgw_put_b[30s])) * on (instance_id) group_left (ceph_daemon) ceph_rgw_metadata
Http errors:
rate(ceph_rgw_failed_req[30s])
Filesystem Metrics
These metrics have the following labels:
ceph_daemon
: The name of the MDS daemon
instance
: the ip address (and port) of of the Ceph exporter daemon exposing the metric
job
: prometheus scrape job
Example:
ceph_mds_request{ceph_daemon="mds.test.cephtest-node-00.hmhsoh", instance="192.168.122.7:9283", job="ceph"} = 1452
Main metrics
ceph_mds_metadata
: Provides general information about the MDS daemon. It can be used together with other metrics to provide more contextual information in queries and graphs. It provides the following extra labels:ceph_version
: MDS daemon Ceph versionfs_id
: filesystem cluster idhostname
: Host name where the MDS daemon runspublic_addr
: Public address where the MDS daemon runsrank
: Rank of the MDS daemon
Example:
ceph_mds_metadata{ceph_daemon="mds.test.cephtest-node-00.hmhsoh", ceph_version="ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)", fs_id="-1", hostname="cephtest-node-00.cephlab.com", instance="cephtest-node-00.cephlab.com:9283", job="ceph", public_addr="192.168.122.145:6801/118896446", rank="-1"}
ceph_mds_request
: Total number of requests for the MDs daemonceph_mds_reply_latency_sum
: Reply latency totalceph_mds_reply_latency_count
: Reply latency countceph_mds_server_handle_client_request
: Number of client requestsceph_mds_sessions_session_count
: Session countceph_mds_sessions_total_load
: Total loadceph_mds_sessions_sessions_open
: Sessions currently openceph_mds_sessions_sessions_stale
: Sessions currently staleceph_objecter_op_r
: Number of read operationsceph_objecter_op_w
: Number of write operationsceph_mds_root_rbytes
: Total number of bytes managed by the daemonceph_mds_root_rfiles
: Total number of files managed by the daemon
Useful queries:
Total MDS daemons read workload:
sum(rate(ceph_objecter_op_r[1m]))
Total MDS daemons write workload:
sum(rate(ceph_objecter_op_w[1m]))
MDS daemon read workload: (daemon name is "mdstest")
sum(rate(ceph_objecter_op_r{ceph_daemon=~"mdstest"}[1m]))
MDS daemon write workload: (daemon name is "mdstest")
sum(rate(ceph_objecter_op_r{ceph_daemon=~"mdstest"}[1m]))
The average of reply latencies:
rate(ceph_mds_reply_latency_sum[30s]) / rate(ceph_mds_reply_latency_count[30s])
Total requests per second:
rate(ceph_mds_request[30s]) * on (instance) group_right (ceph_daemon) ceph_mds_metadata
Block metrics
By default RBD metrics for images are not available in order to provide the best performance in the prometheus manager module.
To produce metrics for RBD images it is needed to configure properly the
manager option mgr/prometheus/rbd_stats_pools
. For more information please
see Ceph Health Checks
These metrics have the following labels:
image
: Name of the image which produces the metric value.
instance
: Node where the rbd metric is produced. (It points to the Ceph exporter daemon)
job
: Name of the Prometheus scrape job.
pool
: Image pool name.
Example:
ceph_rbd_read_bytes{image="test2", instance="cephtest-node-00.cephlab.com:9283", job="ceph", pool="testrbdpool"}
Main metrics
ceph_rbd_read_bytes
: RBD image bytes readceph_rbd_read_latency_count
: RBD image reads latency countceph_rbd_read_latency_sum
: RBD image reads latency totalceph_rbd_read_ops
: RBD image reads countceph_rbd_write_bytes
: RBD image bytes writtenceph_rbd_write_latency_count
: RBD image writes latency countceph_rbd_write_latency_sum
: RBD image writes latency totalceph_rbd_write_ops
: RBD image writes count
Useful queries
The average of read latencies:
rate(ceph_rbd_read_latency_sum[30s]) / rate(ceph_rbd_read_latency_count[30s]) * on (instance) group_left (ceph_daemon) ceph_rgw_metadata