Physical Node Monitoring

Learning how to monitor the physical nodes of your Kubernetes cluster is critical when running Arrikto EKF in production. Monitoring your physical nodes lets you validate that EKF performs as expected, while it also helps you detect and troubleshoot issues in a timely manner.

Inspecting the performance and status of your physical nodes is key to keep all the components of your EKF installation healthy and functional. The Rok Monitoring increases system observability by collecting and visualizing both hardware and OS metrics from your physical nodes. This helps you maintain high levels of performance and availability.

Introduction

The Rok Monitoring Stack uses the Prometheus Node Exporter to collect machine metrics and serve them at the /metrics HTTP endpoint. By default, the Prometheus Node Exporter enables a large variety of collectors that cover different areas of the underlying operating system and hardware, such as:

  1. Machine Specs
  2. CPU
  3. Memory
  4. Disk
  5. Filesystem
  6. Network

The metrics that the Prometheus Node Exporter exposes can be used for real-time monitoring, debugging, and performance testing. The Prometheus Node Exporter does not persist its metrics on its own, that is, metrics are reset upon restarts.

To persist etcd metrics on Kubernetes, the Rok Monitoring Stack creates a ServiceMonitor custom resource in the namespace where Rok is deployed to configure Rok Prometheus to periodically pull metrics from the Prometheus Node Exporter and save them in its time-series database.

Note

By default, Rok Prometheus retains metrics for three days.

Metrics

The Prometheus Node Exporter exposes metrics under the following prefixes:

  1. Go application metrics, under the go_ prefix
  2. Prometheus metric handler metrics, under the promhttp_ prefix
  3. Node metrics under the node_ prefix, e.g., node_cpu_, node_disk_, node_filesystem_, etc.

Rok Prometheus collects and stores all metrics exposed by the Prometheus Node Exporter, while Rok Grafana provides a wide variety of dashboards that query for and visualize metrics collected from physical nodes.

Guides

Below you can find a list of dedicated guides that explain in detail how you can monitor different areas of your physical EKF cluster nodes both at the hardware and the OS level: