Changelog#

This file keeps track of all notable changes to the Slurm Charms.

Unreleased#

  • changed license to Apache-2.0

  • removed the action in slurmd to install InfiniBand drivers

  • removed the action in slurmd to install Nvidia drivers

  • removed the action in slurmd to install Singularity

  • removed the action in slurmd to install MPI (mpich)

1.0.0 - 2022-10-17#

0.10.1 - 2022-10-04#

  • added action in slurmd to install MPI (mpich)

0.10.0 - 2022-09-15#

  • updated NHC to version 1.4.3

  • changed how NHC is installed to be via Juju resources

  • added action in slurmd to install singularity

0.9.2 - 2022-07-04#

  • added slurmctld configuration options to use TLS certificates for etcd

0.9.1 - 2022-06-02#

  • added munge key to etcd database

  • added action in slurmctld to create etcd user to query mungekey

  • added action in slurmctld to get etcd root password

  • added action in slurmd to get etcd slurmd password

  • added set-node-inventory slurmd action

  • updated default Mellanox Infiniband drivers to version 5.4

  • fix get-node-inventory action

  • fix influxdb-info slurmctld action

  • fix etcd not being configured after an upgrade

0.9.0 - 2022-04-12#

  • fix: do not install Nvidia repository by default

  • add support for custom NHC parameters

  • improve checks when nodes need to be rebooted

  • add Nvidia GPU support: install drivers via an action on compute nodes

  • update operator framework to 1.3.0

0.8.5 - 2022-01-13#

  • add support for ElasticSearch Slurm addon

  • pin Operator Framework to 1.2.0

0.8.4 - 2021-12-10#

  • use the cluster name as the database name for InfluxDB

  • improve logrotate profiles for Slurm and NHC logs

  • add epilog-prolog relation

0.8.3 - 2021-11-05#

  • add labels to Fluentbit logs: cluster-name, partition-name, hostname, service

  • add user-group relation to create user and group on the system

  • fix typos in unit status

0.8.2 - 2021-10-13#

  • fix changing infiniband repos on Ubuntu

  • fix fluentbit parser for slurm logs

  • fix creation of user and group when those already exist

0.8.1 - 2021-10-07#

  • use Omnivector Solutions' RPM repository to install Slurm

  • add Fluentbit relation to forward logs from all the Slurm Charms

0.8.0 - 2021-09-13#

  • fix slurmd crashing when etcd in slurmctld is down/not-started/unreachable

  • allow installing Slurm from different repositories on Ubuntu

  • use Omnivector's PPA for installing Slurm on Ubuntu

0.7.0 - 2021-08-09#

  • fix: reduce number of events when a slurmd unit is added/removed

  • fix #111: install bullseye gpg keys from files

  • add support for Ubuntu partitions on CentOS deploys

  • add influxdb-info action to slurmctld

  • add grafana relation for slurmctld

  • add influxdb relation for accounting and profiling of jobs with the SLURM InfluxDB plugin

  • add acct gather configuration options

  • handle update-status hook to constantly give feedback about charms' status

  • add .jujuignore to reduce size of final charm files

  • fix cache of partition_name in slurmd charm

  • improve description of charm's state in juju status

  • fix race condition on relation data exchange between slurmctld and slurmdbd

  • change default partition name from juju-compute-random to osd-appname

  • fix slurmctld crashing when removing a slurmd application

0.6.6 - 2021-07-01#

  • fix checks for munge when the code can't run subprocess as another UID

0.6.5 - 2021-06-28#

  • fix race condition on inventory synchronization between slurmd and slurmctld before starting slurmd

  • fix breakage in charms when the slurmd leader is removed

  • remove unused code in slurmd peer relation

  • fix slurmd action to query infiniband version

  • add slurmrestd to CentOS7

  • fix slurmrestd Systemd environment variables to enable jwt auth

0.6.4 - 2021-06-11#

  • improve checks on Systemd commands to start/restart daemons, as to guarantee correct Charm initialization

  • check for munge keys before starting Slurm daemons, to avoid a race condition

  • fix possible breakage in Slurm configuration due to spaces in partition names

  • improve description of configurations and actions.

  • fix slurmrestd version in Juju status breaking the line.

0.6.3 - 2021-06-02#

  • fix systemd command to restart munge

  • improve slurmdbd logs when restarting munge

0.6.2 - 2021-06-02#

  • handle charm upgrade

  • improve juju status' message to display failure when munge/slurm does not start correctly

  • fix slurmd initialization when slurm.conf is present

  • enable munge system service, so it always start when the machine boots

0.6.1 - 2021-05-31#

  • changed charm's status message to have consistent capitalization.

  • fixed initialization order of the charms, to ensure database and controller start before compute nodes and REST server.

0.6.0 - 2021-05-28#