Rok Disk Manager¶
When you install Arrikto Enterprise Kubeflow, you also deploy Rok Disk Manager (RDM), a component that runs on all nodes of your Kubernetes cluster and prepares the underlying storage for Rok.
This guide describes the behavior of Rok Disk Manager based on its default configuration for each one of the supported cloud platforms (AWS, Azure, Google Cloud). This guide also contains commands that you can run to inspect the state of the storage resources that Rok Disk Manager creates in your Kubernetes cluster and, thus, gain a deeper understanding of how it operates internally.
Here is what you will need to follow them:
- A configured management environment.
- An existing Kubernetes cluster.
- An existing Rok deployment.
Contact Arrikto
Making changes to the default configuration of RDM is an advanced operation that will affect your cluster data. If, for any reason you wish to modify the default configuration of RDM, you should first coordinate with Arrikto to do so.
Overview
Introduction¶
Rok Disk Manager runs as a DaemonSet on each one of your cluster nodes, decides on which disks to manage, and configures them for Rok. In doing so, RDM periodically applies a Python-like script that contains a declarative disk configuration.
For each one of the supported cloud platforms, RDM runs a slightly different script that depends on the underlying infrastructure.
Selecting Disks¶
Rok Disk Manager consults the disk management script to select which of the available disks it will manage on every node. Rok will exclusively use these disks to provision volumes and take snapshots on Kubernetes.
Thereby, RDM retrieves both
- fast, ephemeral disks that are bound to the node (for example, local NVMe SSDs), and
- slower, persistent ones that do not depend on the lifetime of the node (for example, EBS volumes, Google Persistent Disks, etc.).
Note
Working with both ephemeral and persistent disks is a prerequisite for Rok to run on heterogeneous clusters, that is, clusters whose nodes have disks of different type attached to them.
It is important to note that RDM does not manage disks that Rok should not use at all, that is, disks that contain critical system data. These disks are:
- The root disk, that contains the filesystem of the node.
- Any disks attached to the node as a result of volume provisioning on Kubernetes. For example, requesting a PersistentVolumeClaim with the default storage class of the cloud platform leads to the creation of a PersistentVolume that is backed by a persistent disk (for example, EBS volume, Azure data disk, Google Persistent Disk etc.). Ultimately, this persistent disk appears at a well-known location inside the filesystem of the node.
The default RDM configuration comes with different disk requirements and decisions for each one of the supported cloud platforms, due to the heterogeneity of the underlying infrastructure. Choose one of the following options to inspect the requirements and decisions for your preferred platform:
- Local NVMe SSDs for Rok must appear under
/dev/disk/by-id/nvme-Amazon_EC2_NVMe_Instance_Storage*
on each cluster node that supports local NVMe storage. - Extra EBS volumes for Rok must appear under
/dev/sd[f-p]
on each cluster node. The admin of the EKS cluster must follow the official Amazon recommendations on naming storage devices when they add extra EBS volumes for Rok.
- Local NVMe SSDs for Rok must appear under
/dev/disk/by-id/nvme-Microsoft_NVMe_Direct_Disk*
on each cluster node that supports local NVMe storage. - Extra data disks for Rok must appear under
/dev/disk/azure/scsi[0-3]/lun6[0-3]
on each cluster node. The admin of the AKS cluster must configure Azure to assign extra data disks to well known SCSI controllers with IDs 0-3 and add them with LUNs 60-63 on each cluster node.
- One or more local NVMe SSDs for Rok must appear under
/dev/disk/by-id/google-local-nvme-ssd-*
or/dev/disk/by-id/google-local-ssd-*
on each cluster node. GKE supports adding local SSDs on all instance types. The admin of the GKE cluster must specify the interface (NVMe or SCSI) of the local SSD, based on performance requirements. - RDM will ignore all Persistent Disks that are attached to each cluster node, that is, RDM will not manage and Rok will not use any Persistent Disk.
See also
Step-by-Step Analysis¶
In this section we will go through the disk management script that RDM applies in chunks, by grouping semantically related commands together. We will explain the rationale behind each group and provide you with commands to view the state of the storage resources that RDM creates in each of your cluster nodes.
Note
Follow Along: The easiest way to inspect storage resources that RDM
creates and manages in every node of your Kubernetes cluster is to start
from a rok-tools
management environment and exec into a running RDM
Pod:
The disk management script that RDM applies in your cluster performs the following core operations that are common across all supported cloud platforms:
- Select Disks for Rok
- Assemble RAID Array
- Allocate Rok Snapshot Space
- Format Rok Snapshot Space
Note
Follow Along: Here is how you can retrieve the disk management script that RDM currently applies in your Kubernetes cluster:
Inspect the ConfigMap to retrieve the Rok Disk Manager script:
root@rok-tools:~# kubectl get cm -n rok-system disk-script -o jsonpath="{.data.disk-script}" nvme = get_disks(devices="/dev/disk/by-id/google-local-nvme-ssd-*"); scsi = get_disks(devices="/dev/disk/by-id/google-local-ssd-*"); md = raid("/dev/md/rok-disk-manager:rok", bdevs=nvme + scsi, level=0); rokpv = pv(md); rokvg = vg("rokvg", pvs=rokpv); fiskslv_size = min(200 * GiB, 0.3 * rokvg.size); fiskslv = lv(rokvg, "rok-fisks", size=fiskslv_size); filesystem = fs(fiskslv, "ext4"); mountpoint = mount(filesystem, "/mnt/data", persistent=False); _ = dir("/mnt/data/rok");
Below, you can view the default disk management script for each one of the supported cloud platforms:
Select Disks for Rok¶
Rok Disk Manager uses pattern matching to decide on which disks it will manage on each node, given the disk requirements that we described above. Below, you can view the exact pattern that RDM uses for each one of the supported cloud platforms:
Note
To discover EBS volumes, RDM searches under /dev/sd[f-p]
. To
discover local NVMe SSDs in a consistent manner, RDM always works with
persistent disk identifiers that Amazon creates under
/dev/disk/by-id/
.
Rok uses all local NVMe SSDs on the node by default:
ssds = get_disks(devices="/dev/disk/by-id/nvme-Amazon_EC2_NVMe_Instance_Storage*");Rok uses all persistent disks (EBS volumes) under
/dev/sd[f-p]
on the node by default:ebs = get_disks(devices="/dev/sd[f-p]");
Note
Follow Along: Let’s assume an EKS cluster with m5d.4xlarge
instances, each having 2 x 300 GB local NVMe SSD
. Here is how you
can verify that 2 x 279.4 GiB local NVMe SSD
are attached to your
EKS cluster node:
List all local NVMe SSDs:
root@rok-disk-manager-nxp2v:/# ls -lah /dev/disk/by-id/nvme-Amazon_EC2_NVMe_Instance_Storage* lrwxrwxrwx 1 root root 13 Dec 7 12:31 /dev/disk/by-id/nvme-Amazon_EC2_NVMe_Instance_Storage_AWS1B36EF6A69359BFC0 -> ../../nvme2n1 lrwxrwxrwx 1 root root 13 Dec 7 12:31 /dev/disk/by-id/nvme-Amazon_EC2_NVMe_Instance_Storage_AWS1B36EF6A69359BFC0-ns-1 -> ../../nvme2n1 lrwxrwxrwx 1 root root 13 Dec 7 12:31 /dev/disk/by-id/nvme-Amazon_EC2_NVMe_Instance_Storage_AWS275C52DAB795EAA4A -> ../../nvme1n1 lrwxrwxrwx 1 root root 13 Dec 7 12:31 /dev/disk/by-id/nvme-Amazon_EC2_NVMe_Instance_Storage_AWS275C52DAB795EAA4A-ns-1 -> ../../nvme1n1List all block devices:
root@rok-disk-manager-nxp2v:/# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme1n1 259:0 0 279.4G 0 disk ... nvme2n1 259:1 0 279.4G 0 disk ...
Note
To distinguish between Azure data disks and local SSDs in a consistent
manner, RDM always works with persistent disk identifiers that
Microsoft creates under /dev/disk/by-id/
.
Rok uses all local NVMe SSDs on the node by default:
ssds = get_disks(devices="/dev/disk/by-id/nvme-Microsoft_NVMe_Direct_Disk*");Rok uses all persistent disks (Azure data disks) with LUNs 60-63 under all well-known SCSI controllers on the node by default:
data_disks = get_disks(devices="/dev/disk/azure/scsi[0-3]/lun6[0-3]");
Note
Follow Along: Let’s assume an AKS cluster with Standard_L8s_v2
instances, each having 1 x 100 GiB data disk
and no local NVMe
SSDs. Here is how you can verify that 1 x 100 GiB data disk
is
attached to your AKS cluster node:
List all persistent disks with LUNs 60-63 under all well-known SCSI controllers:
root@rok-disk-manager-nxp2v:/# ls -lah /dev/disk/azure/scsi*/lun6* lrwxrwxrwx 1 root root 12 Dec 15 11:07 /dev/disk/azure/scsi1/lun63 -> ../../../sdaList all block devices:
root@rok-disk-manager-vv2t4:/# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 100G 0 disk ...
Note
To distinguish between NVMe and SCSI local SSDs in a consistent
manner, RDM always works with persistent disk identifiers that Google
creates under /dev/disk/by-id/
.
Rok uses all local NVMe SSDs on the node by default:
nvme = get_disks(devices="/dev/disk/by-id/google-local-nvme-ssd-*");Rok uses all local SCSI SSDs on the node by default:
scsi = get_disks(devices="/dev/disk/by-id/google-local-ssd-*");
Note
Follow Along: Let’s assume a GKE cluster with n1-standard-8
instances, each having 1 x 375 GiB local NVMe SSD
. Here is how you
can verify that 1 x 375 GiB local NVMe SSD
is attached to your GKE
cluster node:
List all local NVMe SSDs:
root@rok-disk-manager-265n2:/# ls -lah /dev/disk/by-id/google-local-nvme-ssd-* lrwxrwxrwx 1 root root 13 Dec 14 11:09 /dev/disk/by-id/google-local-nvme-ssd-0 -> ../../nvme0n1List all block devices:
root@rok-disk-manager-265n2:/# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT ... nvme0n1 259:0 0 375G 0 disk
Note
lsblk
expresses the size of devices in Gibibytes (GiB).
Assemble RAID Array¶
Rok Disk Manager assembles the previously selected extra disks for Rok into a RAID0 (data stripping) array to boost performance.
Important
RDM requires that the extra disks for Rok are of the same size. Using disks of unequal size to assemble the RAID array will cause errors or result in a waste of storage space.
Choose one of the following options to inspect this configuration on your preferred cloud platform:
Note
Follow Along: Let’s, again, assume an EKS cluster with
m5d.4xlarge
instances, each having 2 x 300 GB local NVMe SSD
.
Here is how you can verify that a RAID0 device with a size of
558.6 GiB appears at /dev/md0
in your EKS cluster node:
List the
/dev/md0
block device and verify that its type israid0
:root@rok-disk-manager-nxp2v:/# lsblk /dev/md0 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT md0 9:0 0 558.6G 0 raid0 ...
Note
Follow Along: Let’s, again, assume an AKS cluster with
Standard_L8s_v2
instances, each having 1 x 100 GiB
data disk,
that is, no local NVMe SSD. Here is how you can verify that a RAID0
device with a size of 100 GiB appears at
/dev/md/rok-disk-manager:rok
and points to an underlying md*
device in your AKS cluster node:
List the
/dev/md/rok-disk-manager\:rok
block device and verify that its type israid0
:root@rok-disk-manager-vv2t4:/# lsblk /dev/md/rok-disk-manager\:rok NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT md127 9:127 0 100G 0 raid0 ...
Note
Follow Along: Let’s, again, assume a GKE cluster with
n1-standard-8
instances, each having
1 x 375 GiB local NVMe SSD
. Here is how you can verify that a
RAID0 device with a size of 374.9 GiB appears at
/dev/md/rok-disk-manager:rok
and points to an underlying md*
device in your GKE cluster node:
List the
/dev/md/rok-disk-manager:rok
block device and verify that its type israid0
:root@rok-disk-manager-265n2:/# lsblk /dev/md/rok-disk-manager:rok NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT md127 9:127 0 374.9G 0 raid0 ...
Allocate Rok Snapshot Space¶
When taking snapshots of PersistentVolumeClaims, Rok needs some space to maintain transient data before uploading snapshots to the object storage service. The size of this storage space depends on the snapshot frequency and whether the disk to be snapshotted is mostly read or written.
Important
RDM preallocates space for Rok to store transient snapshot data. This space takes up part of the total storage available on each node which means that only the remaining space is left as raw storage for Rok to provision volumes on Kubernetes.
Also, when a Rok snapshot operation is active, Rok allocates additional
space from the total available storage to store live snapshot data. The
size of this space defaults to 10 GiB
and is immediately reclaimed once
the Rok snapshot operation finishes.
RDM leverages the Logical Volume Manager (LVM) framework to create and manage logical volume entities atop the previously assembled RAID0 array, via a volume group. Choose one of the following options to inspect this configuration on your preferred cloud platform:
RDM has to determine a proper size for the Rok snapshot space based on the characteristics of each environment, that is, the number of extra disks for Rok, the size of extra disks for Rok, the total amount of storage needed by running applications, etc. Therefore, RDM first uses a heuristic to calculate the size of the logical volume that will serve as the Rok snapshot space and, then, it creates the logical volume under the existing volume group:
Choose one of the following options to inspect this configuration on your preferred cloud platform:
Note
Follow Along: Let’s, again, assume an EKS cluster with
m5d.4xlarge
instances, each having 2 x 300 GB local NVMe SSD
.
Here is how you can verify that a logical volume for transient Rok
snapshot data exists in your EKS cluster node and that its size is
167.6 GiB
:
List the
/dev/md0
block device and verify that a device of typelvm
exists under it, mounted under/mnt/data
:root@rok-disk-manager-nxp2v:/# lsblk /dev/md0 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT md0 9:0 0 558.6G 0 raid0 `-rokvg-rok--fisks 253:0 0 167.6G 0 lvm /mnt/data
Note
Follow Along: Let’s, again, assume an AKS cluster with
Standard_L8s_v2
instances, each having 1 x 100 GiB
data disk,
that is, no local NVMe SSD. Here is how you can verify that a logical
volume for transient Rok snapshot data exists in your AKS cluster node
and that its size is 30 GiB
:
List the
/dev/md/rok-disk-manager:rok
block device and verify that a device of typelvm
exists under it, mounted under/mnt/data
:root@rok-disk-manager-vv2t4:/# lsblk /dev/md/rok-disk-manager:rok NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT md127 9:127 0 100G 0 raid0 `-rokvg-rok--fisks 253:0 0 30G 0 lvm /mnt/data
Note
Follow Along: Let’s, again, assume a GKE cluster with
n1-standard-8
instances, each having
1 x 375 GiB local NVMe SSD
. Here is how you can verify that a
logical volume for transient Rok snapshot data exists in your GKE
cluster node and that its size is 112.5 GiB
:
List the
/dev/md/rok-disk-manager:rok
block device and verify that a device of typelvm
exists under it, mounted under/mnt/data
:root@rok-disk-manager-265n2:/# lsblk /dev/md/rok-disk-manager:rok NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT md127 9:127 0 374.9G 0 raid0 `-rokvg-rok--fisks 253:0 0 112.5G 0 lvm /mnt/data
Format Rok Snapshot Space¶
After creating the necessary logical volume entities, Rok Disk Manager needs to make the space allocated for transient snapshot data available to Rok, that is, format the previously created logical volume and mount it to the location where Rok is configured to find it.
In this regard, RDM chooses to format the logical volume using ext4
as the
filesystem and mount it under /mnt/data
. Finally, it creates the
/mnt/data/rok/
subdirectory, which is the default data path the Rok file
daemon uses.
Note
Follow Along: Here is how you can verify that Rok is able to access the
space allocated for transient snapshot data under /mnt/data
and that the
Rok file daemon has successfully adopted the /mnt/data/rok/
subdirectory:
Verify that the logical volume mounted under
/mnt/data
is properly formatted:root@rok-disk-manager-nxp2v:/# ls -lah /mnt/data/ total 24K drwxr-xr-x 4 root root 4.0K Dec 7 12:31 . drwxr-xr-x 3 root root 18 Dec 7 12:31 .. drwx------ 2 root root 16K Dec 7 12:31 lost+found drwxr-xr-x 3 root root 4.0K Dec 7 12:35 rokVerify that the
/mnt/data/rok
subdirectory exists and that Rok has successfully adopted it:root@rok-disk-manager-nxp2v:/# ls -lah /mnt/data/rok/ total 12K drwxr-xr-x 3 root root 4.0K Dec 7 12:35 . drwxr-xr-x 4 root root 4.0K Dec 7 12:31 .. -rwxr-xr-x 1 root root 0 Dec 7 12:35 .APP_FISKS drwxr-xr-x 5 root root 4.0K Dec 7 12:35 filed
Summary¶
In this guide you gained insight on how Rok Disk Manager works on different cloud platforms, how it prepares disks for Rok and how to inspect the underlying storage resources that RDM creates in every node of your Kubernetes cluster.
What’s Next¶
To learn more about EKF and its components, check out the rest of our user guides.