slurmctld#
The central management charm.
Configurations#
To change a configuration for this charm, use the Juju command:
$ juju config slurmctld configuration=value
custom-slurm-repo#
Use a custom repository for Slurm installation.
This can be set to the Organization's local mirror/cache of packages and supersedes the Omnivector repositories. Alternatively, it can be used to track a testing Slurm version, e.g. by setting to ppa:omnivector/osd-testing (on Ubuntu), or https://omnivector-solutions.github.io/repo/centos7/stable/$basearch (on CentOS).
Note
The configuration custom-slurm-repo must be set before deploying the units. Changing this value after deploying the units will not reinstall Slurm.
type: string
default-value: empty
cluster-name#
Name to be recorded in database for jobs from this cluster.
This is important if a single database is used to record information from multiple Slurm-managed clusters.
type: string
default-value:
osd-cluster
default-partition#
Default Slurm partition. This is only used if defined, and must match an existing partition.
type: string
default-value: empty
custom-config#
User supplied Slurm configuration.
This value supplements the charm supplied slurm.conf that is used for Slurm Controller and Compute nodes.
Example Usage:
$ juju config slurmcltd custom-config="FirstJobId=1234"
type: string
default-value: empty
proctrack-type#
Identifies the plugin to be used for process tracking on a job step basis.
type: string
default-value:
proctrack/cgroup
cgroup-config#
Configuration content for cgroup.conf.
type: string
default-value:
CgroupAutomount=yes\nConstrainCores=yes\n
health-check-params#
Extra parameters for NHC command.
This option can be used to customize how NHC is called, e.g. to send an e-mail to an admin when NHC detects an error set this value to -M admin@domain.com.
type: string
default-value: empty
health-check-interval#
Interval in seconds between executions of the Health Check.
type: int
default-value:
600
health-check-state#
Only run the Health Check on nodes in this state.
type: string
default-value:
ANY,CYCLE
acct-gather-frequency#
Accounting and profiling sampling intervals for the acct_gather plugins.
Note
A value of 0 disables the periodic sampling. In this case, the accounting information is collected when the job terminates.
Example Usage:
$ juju config slurmcltd acct-gather-frequency="task=30,network=30"
type: string
default-value:
task=30
acct-gather-custom#
User supplied acct_gather.conf configuration.
This value supplements the charm supplied acct_gather.conf file that is used for configuring the acct_gather plugins.
type: string
default-value: empty
tls-key#
A TLS server private key (.key file) to be used.
type: string
default-value: empty
tls-cert#
A TLS server certificate (.crt file) to be used.
type: string
default-value: empty
tls-ca-cert#
A CA certificate (.crt file) to be used for verification of TLS
certificates. A CA certificate should only be issued in the case of
custom CAs and nodes not having it installed.
type: string
default-value: empty
Actions#
To run an action for this charm, use the Juju run-action command:
$ juju run-action slurmctld/leader action-name [parameters=value]
show-current-config#
Display the currently used slurm.conf.
Note
This file only exists in slurmctld charm and is automatically distributed to all compute nodes by Slurm.
Example Usage:
$ juju run-action slurmctld/leader --format=json --wait | jq .[].results.slurm.conf | xargs -I % -0 python3 -c 'print(%)'
drain#
Drain specified nodes.
Example Usage:
$ juju run-action slurmctld/leader drain nodename=node-[1,2] reason="Updating kernel"
Parameters:
nodename: The nodes to drain, using the Slurm format, e.g.node-[1,2].type: string
reason: Reason to drain the nodes.type: string
resume#
Resume specified nodes.
Note
Newly added nodes will remain in the down state until configured, with the node-configured action.
Example Usage:
$ juju run-action slurmctld/leader resume nodename=node-[1,2]
Parameters:
nodename: The nodes to resume, using the Slurm format, e.g.node-[1,2].type: string
influxdb-info#
Get InfluxDB info.
This action returns the host, port, username, password, database, and retention policy regarding to InfluxDB.
etcd-get-root-password#
Get the password for the etcd root account.
etcd-get-slurmd-password#
Get the password for the etcd slurmd account.
etcd-create-munge-account#
Create a new etcd account to be able to query the munge key.
Parameters:
user: Desired usernametype: string
password: Desired account passwordtype: string