.. _accounting-profiling: ======================== Accounting And Profiling ======================== Slurm can collect accounting and profiling information about jobs and job steps. Please find below instructions on how to setup accounting for the different plugins, as well as the `official documentation about accounting `_. Slurmdbd ======== .. TODO ``slurmdbd`` is used for job accounting. .. _elasticsearch-accounting: ElasticSearch plugin ===================== The `Slurm Elastic Search Plugin `_ stores accounting data for finished jobs. This plugin can be automatically enabled by relating the ``slurmctld`` charm with the `Elastic Search Charm `_: .. code-block:: bash $ juju relate slurmctld elasticsearch The ``slurmctld`` will create a new index with the name of the cluster and the document type is named ``jobcomp``. Data Saved ---------- An example to get all the documents saved in the ``osd-cluster`` index from the Elastic Search server at ``10.220.130.6`` is: .. code-block:: bash $ curl -XGET 'http://10.220.130.6:9200/osd-cluster/_search?pretty=true&q=*:*' { "took" : 2, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 2, "max_score" : 1.0, "hits" : [ { "_index" : "osd-cluster", "_type" : "jobcomp", "_id" : "bnhJ3n0BaChVi1VR3viV", "_score" : 1.0, "_source" : { "jobid" : 24, "username" : "john", "user_id" : 1200, "groupname" : "john", "group_id" : 1200, "@start" : "2021-12-21T18:37:02", "@end" : "2021-12-21T18:37:02", "elapsed" : 0, "partition" : "osd-slurmd", "alloc_node" : "juju-f48c73-285", "nodes" : "juju-f48c73-286", "total_cpus" : 16, "total_nodes" : 1, "derived_ec" : "0:0", "exit_code" : "0:0", "state" : "COMPLETED", "cpu_hours" : 0.0, "pack_job_id" : 0, "pack_job_offset" : 0, "het_job_id" : 0, "het_job_offset" : 0, "@submit" : "2021-12-21T18:37:02", "@eligible" : "2021-12-21T18:37:02", "queue_wait" : 0, "work_dir" : "/home/john/project/foo", "cluster" : "osd-cluster", "qos" : "normal", "ntasks" : 0, "ntasks_per_node" : 0, "ntasks_per_tres" : 0, "cpus_per_task" : 1, "job_name" : "hostname", "tres_req" : "cpu=1,mem=15921M,node=1,billing=1", "tres_alloc" : "cpu=16,node=1,billing=16", "account" : "john", "parent_accounts" : "/users/user" } }, { "_index" : "osd-cluster", "_type" : "jobcomp", "_id" : "b3hJ3n0BaChVi1VR3vi0", "_score" : 1.0, "_source" : { "jobid" : 25, "username" : "root", "user_id" : 0, "groupname" : "root", "group_id" : 0, "@start" : "2021-12-21T18:37:25", "@end" : "2021-12-21T18:37:25", "elapsed" : 0, "partition" : "osd-slurmd", "alloc_node" : "juju-f48c73-285", "nodes" : "juju-f48c73-286", "total_cpus" : 16, "total_nodes" : 1, "derived_ec" : "0:0", "exit_code" : "0:0", "state" : "COMPLETED", "cpu_hours" : 0.0, "pack_job_id" : 0, "pack_job_offset" : 0, "het_job_id" : 0, "het_job_offset" : 0, "@submit" : "2021-12-21T18:37:25", "@eligible" : "2021-12-21T18:37:25", "queue_wait" : 0, "work_dir" : "/root", "cluster" : "osd-cluster", "qos" : "normal", "ntasks" : 0, "ntasks_per_node" : 0, "ntasks_per_tres" : 0, "cpus_per_task" : 1, "job_name" : "hostname", "tres_req" : "cpu=1,mem=15921M,node=1,billing=1", "tres_alloc" : "cpu=16,node=1,billing=16", "account" : "root", "parent_accounts" : "/root/root" } } ] } } .. _influxdb-profiling: InfluxDB profiling plugin ========================= Slurm provides a profiling gathering plugin to collect metrics and send them to `InfluxDB `_. OSD encapsulates the configuration of this plugin in a *Juju relation* between ``slurmctld`` and ``influxdb`` charms. A basic setup involves the following steps: 1. Deploy `InfluxDB charm `_. 2. Relate ``slurmctld`` and ``influxdb``. 3. [optional] Configure the accounting frequency. The Juju commands to accomplish these steps are: .. code-block:: bash $ juju deploy influxdb $ juju relate slurmctld influxdb $ juju config slurmctld acct-gather-frequency="task=30" In this scenario, ``slurmctld`` will setup everything needed to collect and save the metrics. This includes creating an user and a database in InfluxDB. The username is ``slurm`` and the password is generated at random, while name of the database is the name of the cluster, as set in ``slurmctld``'s configuration ``cluster-name``. Data saved ---------- Slurm collects profiling metrics at a frequency specified in the ``slurmctld`` configuration option ``acct-gather-frequency``. The following field keys are saved for the tasks: ``CPUFrequency`` CPU Frequency at time of sample. Field type: ``float``. ``CPUTime`` Seconds of CPU time used during the sample. Field type: ``float``. ``CPUUtilization`` CPU Utilization during the interval. Field type: ``float`` ``RSS`` Value of RSS at time of sample. Field type: ``float``. ``VMSize`` Value of VM Size at time of sample. Field type: ``float``. ``Pages`` Pages used in sample. Field type: ``float``. ``ReadMB`` Number of megabytes read from local disk. Field type: ``float``. ``WriteMB`` Number of megabytes written to local disk. Field type: ``float``. Accessing the data ------------------ The ``slurmctld`` charm provides a convenient Juju Action to export the InfluxDB parameters to setup a Grafana Data Source: .. code-block:: bash $ juju run-action slurmctld/leader influxdb-info --wait unit-slurmctld-13: UnitId: slurmctld/13 id: "573" results: influxdb: '{''ingress'': ''10.220.130.30'', ''port'': ''8086'', ''user'': ''slurm'', ''password'': ''LeCZSef2IzyOp3GAnYNC'', ''database'': ''osd-cluster'', ''retention_policy'': ''autogen''}' status: completed timing: completed: 2021-07-20 13:00:35 +0000 UTC enqueued: 2021-07-20 13:00:31 +0000 UTC started: 2021-07-20 13:00:34 +0000 UTC