Free Node Space Monitoring

One of the most important areas to monitor in your EKF cluster is the availability of free space per node. This is essential to prevent hosts from running out of space, e.g., when workloads aggressively consume storage or fail to garbage collect old artifacts.

What You’ll Need

Important

Before proceeding, ensure that you have been granted proper rights to access the Rok Monitoring Stack UI. Currently, access to the Rok Monitoring Stack is allowed only to admin users.

Metrics

The Prometheus Node Exporter exposes metrics related to the available filesystem space per node under the node_filesystem_ prefix.

The table below lists the Prometheus Node Exporter metrics that help you monitor the free space per physical node:

Name Description Type
node_filesystem_avail_bytes Filesystem space available to non-root users in bytes Gauge
node_filesystem_size_bytes Filesystem size in bytes Gauge
node_filesystem_free_bytes Filesystem free space in bytes Gauge
node_filesystem_files Filesystem total file nodes Gauge
node_filesystem_files_free Filesystem total free file nodes Gauge

View Grafana Dashboards

The Rok Monitoring Stack provides the following dashboards to visualize free space per node:

  • Node Exporter: a full-fledged dashboard that queries and visualizes the majority of the metrics that the Prometheus Node Exporter collects.
  • Nodes: a stripped-down version of the Node Exporter dashboard that queries and visualizes a subset with some of the most important metrics that the Prometheus Node Exporter collects.
  • USE Method / Node: a concise dashboard targeted on node utilization, saturation, and errors.
  • USE Method / Cluster: a concise dashboard targeted on cluster utilization, saturation, and errors.

Note

USE Method stands for Utilization Saturation and Errors Method. The USE Method is a methodology for analyzing the performance of a system.

Note

The Rok Monitoring Stack places Grafana dashboards for individual EKF components under the EKF folder.

  1. Visit the Kubeflow central dashboard with your browser at

    https://<FQDN>

    Replace <FQDN> with your the value of your domain. For example:

    https://arrikto-cluster.apps.example.com
  2. If prompted, log in using your credentials:

    ../../../_images/kubeflow-login.png
  3. Select Metrics from the left side bar to navigate to Grafana:

    ../../../_images/kubeflow-dashboard-metrics.png
  4. In the left side bar, hover your cursor over the Dashboards entry and then click Manage to navigate to the Grafana Dashboards page:

    ../../../_images/grafana-dashboard-manage.png

    Note

    In the Grafana Dashboards page you can search, view, and select dashboards.

  5. Choose one of the following options, based on your needs and preferences:

    1. Go to the EKF folder and select the Node Exporter dashboard:

      ../../../_images/selection.png
    2. Set the Host dashboard variable by selecting one of the cluster nodes from the dropdown menu:

      ../../../_images/select-host.png
    3. View the Root FS Used and RootFS Total panels in the Quick CPU / Mem / Disk row to inspect the used and total filesystem space for the selected cluster node:

      ../../../_images/cpu-mem-disk-quick.png
    4. View the Disk Space Used Basic panel in the Basic CPU / Mem / Net / Disk row to inspect the percentage of disk space each mountpoint takes up for selected cluster node:

      ../../../_images/cpu-mem-net-disk-basic.png
    5. View the Disk Space Used panel in the CPU / Mem / Net / Disk row to inspect the amount of disk space each mountpoint takes up for the selected cluster node:

      ../../../_images/cpu-mem-net-disk.png
    6. View the File System space available and File Nodes Free panels in the Storage Filesystem row to inspect how much available disk space and how many free file nodes each mountpoint has for the selected cluster node:

      ../../../_images/storage-fs.png
    1. Go to the EKF folder and select the Nodes dashboard:

      ../../../_images/selection1.png
    2. Set the instance dashboard variable by selecting one of the cluster nodes from the dropdown menu:

      ../../../_images/select-instance.png
    3. View the Disk Space Usage panel to inspect the used and available space for the selected node:

      ../../../_images/disk-space-usage.png
    1. Go to the EKF folder and select the USE Method / Node dashboard:

      ../../../_images/node-selection.png
    2. Set the instance dashboard variable by selecting one of the cluster nodes from the dropdown menu:

      ../../../_images/node-select-instance.png
    3. View the Disk Space Utilisation panel to inspect the used space for the selected cluster node:

      ../../../_images/node-disk-space-utilisation.png
    1. Go to the EKF folder and select the USE Method / Cluster dashboard:

      ../../../_images/cluster-selection.png
    2. View the Disk Space Utilisation panel to inspect the used space for all cluster nodes:

      ../../../_images/cluster-disk-space-utilisation.png

Summary

In this guide you gained insight on how to monitor the available space of your EKF cluster nodes with the Rok Monitoring Stack.

What’s Next

The next step is to learn how to monitor Rok etcd and view the Rok etcd Grafana dashboard.