Monitoring
Contents
Monitoring#
We provide a bundle overlay to simplify deploying Prometheus, Prometheus node_exporter and slurm-exporter to monitor the cluster and each individual node.
Prometheus node_exporter#
The subordinate charm prometheus-node-exporter
can be used to to export machine metrics to a Prometheus instance. To monitor
all nodes in the cluster, first deploy the application
prometheus-node-exporter
and then relate it to the nodes to be monitored:
$ juju deploy prometheus-node-exporter
$ juju relate prometheus-node-exporter slurmd
$ juju relate prometheus-node-exporter slurmctld
$ juju relate prometheus-node-exporter slurmdbd
This charm exposes by default all the metrics on endpoint /metrics
using
the port 9100
.
The charm prometheus-node-exporter
can be related to the prometheus2 charm to automatically scrape all units.
Deploy Prometheus and relate it to node exporter to access this functionality:
$ juju deploy prometheus2
$ juju relate prometheus-node-exporter:prometheus prometheus2:scrape
Please refer to these charms' documentation for configuration details.
Prometheus Slurm exporter#
The subordinate charm slurm-exporter exports metrics about Slurm, such as the
state of nodes, jobs, partitions, accounts, scheduler, CPUs, and GPUs. To
monitor the cluster, deploy the application and relate it to
slurmrestd-charm
:
$ juju deploy slurm-exporter
$ juju relate slurm-exporter slurmrestd
Note
We recommend deploying slurm-exporter
in the slurmrestd
node. This
component could be deployed in other nodes.
This charm exposes by default all the metrics on endpoint /metrics
using
the port 9120
.
The charm slurm-exporter
can be related to the prometheus2 charm to automatically scrape its metrics.
Deploy Prometheus and relate it to slurm-exporter
to access this
functionality:
$ juju deploy prometheus2
$ juju relate prometheus-node-exporter:prometheus prometheus2:scrape
Please refer to these charms' documentation for configuration details.
You can use the Grafana Dashboard 4323 to visualize the metrics exported via
slurm-exporter
.