.. _monitoring: ========== Monitoring ========== We provide a bundle overlay to simplify deploying `Prometheus `_, `Prometheus node_exporter `_ and `slurm-exporter `_ to monitor the cluster and each individual node. Prometheus node_exporter ======================== The subordinate charm `prometheus-node-exporter `_ can be used to to export machine metrics to a Prometheus instance. To monitor all nodes in the cluster, first deploy the application ``prometheus-node-exporter`` and then relate it to the nodes to be monitored: .. code-block:: bash $ juju deploy prometheus-node-exporter $ juju relate prometheus-node-exporter slurmd $ juju relate prometheus-node-exporter slurmctld $ juju relate prometheus-node-exporter slurmdbd This charm exposes by default all the metrics on endpoint ``/metrics`` using the port ``9100``. The charm ``prometheus-node-exporter`` can be related to the `prometheus2 `_ charm to automatically scrape all units. Deploy Prometheus and relate it to node exporter to access this functionality: .. code-block:: bash $ juju deploy prometheus2 $ juju relate prometheus-node-exporter:prometheus prometheus2:scrape Please refer to these charms' documentation for configuration details. Prometheus Slurm exporter ========================= The subordinate charm `slurm-exporter `_ exports metrics about Slurm, such as the state of nodes, jobs, partitions, accounts, scheduler, CPUs, and GPUs. To monitor the cluster, deploy the application and relate it to ``slurmrestd-charm``: .. code-block:: bash $ juju deploy slurm-exporter $ juju relate slurm-exporter slurmrestd .. note:: We recommend deploying ``slurm-exporter`` in the ``slurmrestd`` node. This component could be deployed in other nodes. This charm exposes by default all the metrics on endpoint ``/metrics`` using the port ``9120``. The charm ``slurm-exporter`` can be related to the `prometheus2 `_ charm to automatically scrape its metrics. Deploy Prometheus and relate it to ``slurm-exporter`` to access this functionality: .. code-block:: bash $ juju deploy prometheus2 $ juju relate prometheus-node-exporter:prometheus prometheus2:scrape Please refer to these charms' documentation for configuration details. You can use the `Grafana Dashboard 4323 `_ to visualize the metrics exported via ``slurm-exporter``.