OSD Architecture
Contents
OSD Architecture#
The Omnivector Slurm Distribution is built on a suite of automations called "charms". Charms are the operational components that describe the lifecycle of a Slurm cluster. A full Slurm deployment comes in the form of multiple charms, one for each component of Slurm. A "bundle" is a YAML file where multiple charms can be defined. We use bundles to describe the interconnectivity and configuration of groups of charms.
OSD provisions Slurm to operate in configless mode. In this mode, the
slurmctld
process does the work of distributing the slurm.conf
file to
the nodes running slurmd
.
Slurm Charms#
The slurm-charms are the components that encapsulate the operational know-how and automation needed to facilitate the lifecycle of a Slurm cluster.
Slurm Bundles#
The slurm-bundles define the base Slurm deployment configurations for different clouds and operating systems.
OSD Components#
The Omnivector Slurm Distribution supports the following charm components as part of the Slurm-core offering:
Additionally we require the Node Health Check (NHC) with a minimal configuration and checks to
ensure the slurm
and munge
processes are active. The cluster
administrator mus provide the tar.gz
for nhc
. It is possible, and
recommended, that the cluster administrator extends these checks. Check
NHC section for details on how to configure it.
The easiest way to install Infiniband drivers on the compute nodes is to use the charm supplied actions related to Infiniband management. Check Infiniband section for more details on Infiniband driver lifecycle operations.