News

This section describes new features we introduce in EKF releases.

Version 2.0.2 (Aurora)

(Released Fri, 31 Mar 2023)

The EKF 2.0.2 release is here, shipping with OSS Kubeflow 1.5, major stability improvements, and introducing a table that contains all EKF components and their versions.

Check out a more detailed overview of the new features below.

Ease of Use

EKF Components Versions

You can now find all the Enterprise Kubeflow components in a single place, along with their versions. Check out this page to see what’s included in this version.

Bug Fixes

Stability

We have fixed several bugs across the board to improve the stability of EKF. Check out our Changelog to see all the bug fixes and improvements included in this version.

Version 2.0.1 (Aurora)

(Released Mon, 19 Dec 2022)

The EKF 2.0.1 release is here, shipping with OSS Kubeflow 1.5 and introducing major stability improvements.

The most important updates concern support for Kubernetes 1.23 on all major public clouds, that is, AWS, Google Cloud, and Azure.

Check out a more detailed overview of the new features below.

Latest and Greatest

Support for Kubernetes 1.23

You can now use EKF running on Kubernetes 1.23 on all major cloud providers, that is, AWS, Google Cloud, and Azure, so that you get all new Kubernetes 1.23 features and guaranteed Kubernetes support.

Bug fixes

Stability

Various bug fixes across the board to improve the stability of EKF.

Security

Remove Critical CVEs

We are continuously improving EKF security by scanning Rok and Kubeflow images for CVEs (Common Vulnerability Exposures), and fixing any issues before we ship images to production. You can be rest assured that you are using the latest packages that come with updated security, so you can keep your work safe from any vulnerabilities.

Ease of Use

Arrikto Skin

Arrikto EKF now comes with a brand new Arrikto skin that is more eye-comforting and matches the unique Arrikto branding.

Version 2.0 (Aurora)

(Released Mon, 24 Oct 2022)

The EKF 2.0 release is here, shipping with OSS Kubeflow 1.5 and introducing major features.

The most important updates concern serving, security, and ease of use. EKF now uses KServe for model serving, an upgrade from KFServing. Moreover, you can now serve your ML models to the outside world in a path-based approach, improving the way you manage external endpoints. We are continuously improving the security of our platform by removing CVEs from EKF images, and by securing network traffic from/to Rok Pods with Istio mTLS. Last but not least, we have significantly improved the ease of use of our platform, especially when it comes to the notebooks, volumes, and models user experience.

Check out a more detailed overview of the new features below.

Serving

Path-based Serving

Querying models that are exposed outside of the cluster is now much easier since you don’t have to specify host headers anymore. We call this feature path-based serving, which makes managing the supporting external endpoints a much better experience. You can switch between host-based and path-based serving of your models at any given time.

Migration to KServe

You can now use KServe 0.8 which replaces KFServing (0.6) from the previous version of EKF. Despite the name change, this is still the same software. If you want to know more about it, head to KServe.

You can easily migrate your KFServing models over to the new version with our automated migration scripts.

Serving APIs (Technical Preview)

You can now serve your models directly from your notebooks or pipelines, without writing a single line of YAML or worrying about serializing and storing your model. Kale provides a suite of APIs that do the heavy lifting for you, allowing you to focus on what matters: building a model that performs great. For more information check out the Kale Serving APIs docs.

Portability

EKF Cluster Migration

You can now migrate all EKF resources from a source Arrikto EKF cluster of version 1.3 or later to a destination Arrikto EKF cluster of version 1.5 or later. The migration supports MLMD (Machine Learning metadata), notebooks, pipelines, models, experiments, Kubeflow profiles, and PVCs backed by the Rok storage class. We provide the necessary tools so that you can back up and restore these resources to the destination cluster in an automated way. Check out our migration documentation to see how you can do that.

Security

Additional Authentication Methods with External Identity Providers

You can now configure AuthService to authenticate external clients with opaque access tokens you create using your preferred Identity Provider (EKF can integrate with Azure AD, PingID, Okta, Google, GitLab, and Cognito). The Opaque access token authentication method authenticates each client request based on the opaque access token that the external Identity Provider granted for the client. Follow this guide to configure AuthService to use this authentication method.

Migrate Rok Images to Debian Buster

We are continuously improving EKF security, as we have upgraded our Rok images to use Buster (Debian 10) as the base Debian image. This way we drastically reduce CVEs (Common Vulnerability Exposures), and make the security scanning of our software even faster.

Remove Critical CVEs

We are continuously improving EKF security, as we have a smooth process of scanning Rok and Kubeflow images for CVEs (Common Vulnerability Exposures), and fixing any issue before images get shipped to production. You can be rest assured that you are using the latest packages that come with updated security, so you can keep your work safe from any vulnerabilities.

Network Mesh

We have extended the number of workloads of our platform that are part of the Istio network mesh. Now even more Pods rely on secure and encrypted mTLS network traffic. In this release, this applies to all Rok Pods. This extends the list of EKF components (KFP, KServe, and now Rok) that use an Istio sidecar with Istio mTLS to secure network traffic. With this you get enhanced security as network traffic gets encrypted, and you are able to define advanced authorization policies.

Restrict Privileged Containers Execution

We have made an upstream contribution to Kyverno to ensure that EKF users are not able to create privileged containers from inside a Jupyter notebook. We have made sure that with the appropriate policies, users won’t be able to do so, even if Kyverno is down. Read this blog post to find out more.

Kubeflow Backends Protection

We have secured Kubeflow backends from in-cluster requests with kubeflow-userid header and have improved intra-cluster authorization policies.

ML Workflow

Restore Notebook from a Rok Snapshot

Recreating a notebook from a pipeline step snapshot is now easier than ever. Just click Restore Notebook from the Pipelines UI and within a few seconds you are ready to go with your newly restored notebook!

High Availability

Etcd Replication

We now support running Rok etcd in a distributed manner, follow this guide to migrate to the new, redundant scheme. We provide a smooth process of scaling up or down the number of etcd members, checkout these guides to find out how. If etcd loses a Pod or PVC, the etcd cluster recovers automatically. We also include instructions regarding best practices for your etcd cluster.

Ease of Use

New Notebook Wizard

You will find a revamped notebook wizard that lets you create a new working environment in just a few seconds. You will also come across a dedicated and simplified wizard for restoring a notebook from a Rok snapshot. Check it out by clicking on the Notebooks button on the sidebar.

Sorting and Filtering

You can now sort and filter your notebooks, volumes, models, and HP tuning experiments, with our brand new responsive table. Check them out throughout the EKF UI.

Notebooks and Volumes Details Pages

You can now inspect settings and configuration details about your Notebooks and Volumes. Each one of them has its own page with a lot of useful information. Head over the Notebooks or Volumes pages from the sidebar and click on the notebook or volume that you want to

Generate Manifests for EKF deployment

You can now seamlessly generate the manifests for your EKF deployment in an automated manner, but without having to apply them to your cluster, using the extended capabilities of our installation tool, rok-deploy.You can generate the manifests in an interactive or non-interactive fashion depending on your preferences. Follow this guide to check out the instructions.

Non-interactive EKF installation

You can now install EKF in a non-interactive way using the extended capabilities of our installation tool, rok-deploy. This means that the installation process of EKF is now continuous and it won’t be interrupted waiting for your input.

Retention Policies

We have introduced a way for admins to set default retention policies for existing and new buckets. This way, we provide a mechanism so that every new bucket inherits the default retention policy. Admins can also update the default retention policy anytime and enforce it across users and buckets in a fine-grained way. Users can edit the policy in a per bucket manner.

Latest and Greatest

Support for Kubernetes 1.22

You can now use EKF running on Kubernetes 1.22 on AWS EKS and Google Cloud GKE. Therefore, you can get all new Kubernetes 1.22 features and guaranteed Kubernetes support.

EKF with Kubeflow 1.5

EKF now comes with OSS Kubeflow 1.5. This way you can have all the Kubeflow 1.5 features in your enterprise version. These include (but are not limited to) elastic training, notebook monitoring and culling, as well as simplified operations. The Unified Training Operator now supports all the most popular frameworks: Tensorflow, Pytorch, MXNet, XGBoost, and MPI. The default executor for Kubeflow Pipelines (KFP) is now Emissary, which allows KFP to support newer Kubernetes versions. The user interface (UI) has been streamlined so web apps are consistent with each other and with the KFP dashboard. Find more about Kubeflow 1.5 in the official Kubeflow 1.5 Release Notes and the Kubeflow roadmap.

Note that our distribution of Enterprise Kubeflow supports many more features and upgrades built on top of the OSS ones, for example, support for Kubernetes 1.22 on all major platforms and KServe 0.8.

Rok with Etcd 3.5.4

We have upgraded Rok to use etcd version 3.5.4, which is the latest one. This way you can be rest assured that Rok will get support for all the CVEs related to etcd, along with performance optimization and other benefits.

Infra Optimization/Enablement

Kiwi (Technical Preview)

Running GPU workloads in the Cloud is very expensive and inefficient due to the bursty nature of Machine Learning jobs. Kiwi is our new Kubernetes virtual GPU scheduling system that allows you and your peers to share GPU devices. You can run multiple training jobs on the same GPU to minimize experimentation time and reduce resource contention and costs. Head over to Kiwi to read more about how Kiwi works. This project is in its early phases, but we cannot wait for you to try it out.

Bug Fixes

Stability

Various bug fixes across the board to improve the stability of EKF.

Improve how Kale Uses MLMD

We have improved how Kale uses MLMD to pass information between steps. This prevents runtime errors when the Autoscaler deletes pipeline Pods.

Kale Retry Strategy Parameters

Kale didn’t use to respect the retry strategy parameters you were passing to the @step decorator. Now it does!

Kale Logging

Kale now is much better at reporting error messages and logging what it is doing.

No Status in Notebook CR

We have fixed a bug in Kubeflow where the Notebook Controller failed to add a .status in the Notebook custom resources (CRs). This resulted in the web app never showing the notebook as ready, while the underlying Pod was ready. We have contributed this fix in OSS Kubeflow.

PVC Viewer Culling

A crucial feature of the PVCViewer Controller is to perform culling on the idle PVCViewer Pods. This is also needed for scaling a cluster in and out. We have fixed a bug where this controller did not perform culling due to a dependency on a Prometheus instance that was discontinued in newer Istio versions. Now the culling functionality does not depend on Prometheus at all since it retrieves the necessary information from an endpoint exposed by Istio.

Known Issues

Switch to Emissary Argo Executor

You can now use the Emissary executor among the existing workflow Argo executors in EKF (PNS, and Docker). Using this executor causes failure if you do not explicitly define a command and args for the container.

Version 1.5.3 (Ultramarine)

(Released Wed, 03 Aug 2022)

Bug Fixes

Various bug fixes across the board to improve the stability of EKF.

Version 1.5.2 (Ultramarine)

(Released Tue, 28 Jun 2022)

Security

Remove Critical CVEs

Now our software is more secure than ever, as we have upgraded a lot of packages to fix critical CVEs (Common Vulnerability Exposures). You can be rest assured that you have the latest security fixes, so you can keep your work safe from any vulnerabilities.

Improvements/Bugfixes

Take Into Account Type Hints in Kale Conditional Statements

We have fixed a bug in the Kale local execution of pipelines where in case of a conditional statement, Kale didn’t take into consideration the types of the conditional’s operands and treated them by default as strings. This lead the comparison to fail or produce wrong results. Now you can continue using type hints as usual and be assured that Kale will always produce the correct results on your conditional statements.

Accept Hundreds of New Connections on Rok Controller

We have improved the Rok root controller to accept hundreds of new connections more efficiently in a short period of time. This allows you to seamlessly use Rok on hundreds of nodes in your Kubernetes cluster.

Fix Race on Garbage Collection Tasks

We have fixed a bug where running multiple garbage collection (GC) tasks concurrently in different nodes lead most of these tasks to fail.

Fix Kernel Bug Leading to Failed Snapshots

We have fixed an upstream kernel bug in the module used for changed block tracking which could lead to the corruption of the CBT (changed block tracking) volume metadata and the failure of Rok snapshots. Follow the relevant guide in our upgrade procedure to upgrade the kernel module that Rok uses to track changed blocks.

Fix Bug in Rok CSI Driver Volume Provisioning Response

We have fixed a bug in our Rok CSI driver, where it didn’t set a field in the response to volume provisioning requests. This field is required by the external-provisioner sidecar, and, when missing, results in the sidecar failing requests to create volumes using a VolumeSnapshot as source.

Version 1.5.1 (Ultramarine)

(Released Thu, 02 Jun 2022)

Bug Fixes

We have fixed a critical bug in the way we reload tokens in Rok that would lead snapshots to fail after some time. This would break the Rok snapshotting functionality, making pipeline runs to go into error.

Version 1.5 (Ultramarine)

(Released Fri, 27 May 2022)

Security

Configure AuthService Audiences

Depending on your EKF deployment, you may need to authenticate certain clients with specific audiences. Thence, we have introduced a guide that describes how to configure the audiences that AuthService accepts for authentication. The AutheService authenticators that currently check the audiences of a token are the Kubernetes and JWT (JSON Web Token) access token authenticators.

Improvements

Configure Authentication Methods

Now you can configure AuthService to use the authentication methods of your preference. For this purpose, we have introduced a new guide that will walk you through enabling or disabling each authenticator based on your needs. AuthService will only use active authenticators to validate application requests and thus the authentication overhead will be reduced significantly.

Version 1.5-rc4 (Ultramarine)

(Released Wed, 25 May 2022)

Security

Remove Critical CVEs

Now our software is more secure than ever, as we have a smooth process of scanning Rok and Kubeflow images for CVEs (Common Vulnerability Exposures) and fixing any issue before images get shipped to production. You can be rest assured that you are using the latest packages that come with updated security, so you can keep your work safe from any vulnerabilities.

Improvement/Bugfix

Support Bound Service Account Tokens on Kubernetes 1.21

On Amazon EKS, Kubernetes 1.21 rotates service account tokens every one hour, thus applications should reload these tokens before they expire. We have updated the Kubernetes API client libraries of EKF components to periodically reload the token and keep communicating with Kubernetes seamlessly.

Delete Rok Accounts Properly

We have fixed a bug in the Rok API where deleting a Rok account could lead to leaving stale resources behind.

Version 1.5-rc3 (Ultramarine)

(Released Mon, 09 May 2022)

Serving

End-to-End Network Performance Evaluation for Serving

We are actively working on evaluating and improving performance for accessing models from outside the cluster. Read this guide to find out how Serving works in Arrikto EKF and how you can fine-tune it for better performance. The documentation includes a complete analysis of the request path for both internal and external clients, a breakdown of the overhead of each involved component, and a testbed so that you can reproduce the evaluation.

Security

Authenticate External Applications with Identity Providers

You can now allow external applications to start ML pipelines with full authentication/authorization support. We have extended AuthService to authenticate external clients with JSON Web Token (JWT) access tokens you create using your preferred Identity Provider (EKF can integrate with Azure AD, PingID, Okta, Google, GitLab, and Cognito). The JWT access token authentication method authenticates each client request locally, based on the signature of that access token that the external Identity Provider granted for the client.

Portability

Support Rok Registry on Google Cloud

Arrikto EKF now supports Rok Registry on Google Cloud GKE. You can discover and share notebooks, models, artifacts, and datasets with your colleagues and collaborators. Rok Registry is also available on EKF clusters running on AWS EKS.

Improvement/Bugfix

Mitigate CoreDNS Flooding Caused by Knative Inference Services

Knative creates ExternalName services for each inference service to redirect traffic to local Istio IngressGateway. For each such service, all Istio sidecars end up trying to resolve the service DNS name every five seconds. On EKS, this produced five DNS queries, one of which got forwarded to Amazon nameservers outside the cluster. For a cluster with many Pods and inference services, such requests would flood CoreDNS and probably make it unresponsive. We have worked around this Knative issue and now let Knative use an FQDN for this type of services. This reduces DNS queries down to 20% and prevents any related DNS queries from leaking the cluster.

Version 1.5-rc2 (Ultramarine)

(Released Fri, 15 Apr 2022)

Serving

Enable AuthService Cache

Now you can serve your models even faster, as we have introduced a caching mechanism for AuthService. Enabling the AuthService caching mechanism makes authentication to your models API faster since it removes the overhead of communicating with the Kubernetes API server or your OIDC provider.

Integrations

Integrate EKF with AWS Cognito

We have integrated AWS Cognito with EKF for identity and access management. This extends the list of identity providers that we support (PingID, Okta, Google, GitLab, Azure AD).

Latest and Greatest

Support for Kubernetes 1.21

We have made sure that EKF runs on Kubernetes 1.21 on AWS EKS and Google Cloud GKE so that you get all new Kubernetes 1.21 features and guaranteed Kubernetes support.

Version 1.5-rc1 (Ultramarine)

(Released Thu, 31 Mar 2022)

The EKF 1.5 release is here, shipping with OSS Kubeflow 1.4 and introducing major features. The most important ones concern portability, serving, and security. You can now lift and shift an existing EKF cluster to a different location. You can expose an ML model to the outside world with complete authentication and authorization support. We are continuously improving the security of our platform by securing network traffic with Istio mTLS and protecting Kubeflow components with Kubernetes RBAC. Last but not least, we have enabled the Kubernetes scheduler to schedule Pods intelligently, taking into account storage capacity among other resources, even when using only local volumes.

Check out a more detailed overview of the new features below.

Portability

Lift and Shift an Arrikto EKF Cluster

You are now able to migrate your whole cluster along with all its data. You can seamlessly move clusters along with workloads for migration or backup purposes. Check out how to do that in these Arrikto user guides.

Serving

Access to KFServing from Outside the Cluster

You can now expose your trained models to the outside world and get a prediction from a trained model from outside of your cluster, after making an authenticated request. This way, you can collaborate with your colleagues and iterate faster, continuously improving your models and testing against a production-like environment. Moreover, you have a fast and easy way to expose models to the outside world with authentication support. This allows you to integrate your external applications with your prediction endpoints faster and easier than ever.

End-to-End Network Performance Evaluation for Serving

We are actively working on evaluating and improving performance for accessing models from outside the cluster.

Security

Secure Traffic of Served Model with Istio

We have enabled Kale to serve models with an Istio sidecar. With this you can get enhanced security for the served models, as network traffic gets encrypted.

Usage of Istio Sidecar with mTLS

We have configured Istio to run as a sidecar on certain Pods to secure traffic with mTLS and apply authorization policies. In this release, this applies to Pods running KFP and KFServing applications. With this you get enhanced security as network traffic gets encrypted. You are now able to define advanced authorization policies.

Security Report

We have created a document regarding the Arrikto security best practices. Now you get a single, authoritative source for all security best practices on Kubernetes across all major public clouds.

Namespaced Pipeline Definitions

You can now upload and view pipeline definitions that are private to your own namespace. All the pipelines that you upload from your notebook will be private by default, but you can still choose to upload shared pipelines - they will be visible to everyone (see here for more information). Check out the new Private and Shared tabs in the Kubeflow Pipelines UI.

Note

You cannot upload private pipelines via the KFP Client yet. You can upload shared pipelines though. All the pipelines you upload from the Kale SDK or the Kale UI will be private by default. If you want to upload a shared Kale pipeline, use the ‘–shared-pipeline’ CLI argument. You will soon be able to upload shared pipelines from the Kale UI as well.

Remove Contributors UI from Central Dashboard

We have removed the Contributors tab from the Kubeflow central dashboard, as it didn’t agree with our GitOps-oriented process for setting up user profiles and Kubernetes RBAC authorization. Instead, check out how you can share a namespace with other users in EKF. In a subsequent release, we plan on recreating the Contributors UI in a Kubernetes-native way.

Infra Optimization/Enablement

Capacity Tracking for Local Volumes in Kubernetes

We have enabled tracking of storage capacity so that the Kubernetes scheduler can schedule Pods onto nodes that have access to enough local storage capacity. With this, you can focus on building ML pipelines without worrying about unexpected breakage/failure due to how Kubernetes is scheduling Pods. You now have less reasons to worry about the smooth operation of your Kubernetes cluster, as the Kubernetes scheduler can now make smarter decisions when scheduling Pods to nodes.

View Resources Across All Namespaces in EKF Dashboard

You are now able to view PVCs and TensorBoard servers in all namespaces you have access to. You get a holistic view of your resources in the cluster allowing you to make smart decisions for optimizing your infrastructure, including cutting down cost by deleting unused resources.

From your dashboard, click on the namespace selector on the top left and click on All namespaces.

Fine-Grained Control for Kale Pipelines

You now have full control over the deploy configuration of your Kale pipelines, both from the Kale UI and from the Kale SDK. This includes settings per-step labels, annotations, limits, requests, node selector, node affinity, and tolerations. Taking advantage of the flexibility and scalability of your cluster has never been easier, you can intelligently optimize resource allocation, reducing cost and maximizing efficiency. Head over to our guide for JupyterLab and the guide for the Kale SDK for more information.

Monitoring Dashboard

We have created a dashboard to monitor critical system metrics, based on the open-source kube-prometheus project. We have integrated it with Istio for easy authentication and authorization of users, and exposed Grafana in the Kubeflow central dashboard. We now provide you with insight on physical nodes and Kubernetes, based on the default kube-prometheus metrics, as well as additional custom metrics on physical nodes, etcd, Redis, and Rok. With these you get a holistic view of your Kubernetes and Rok cluster and their status, and you get to make smart decisions to optimize your infrastructure. We will keep adding more systems and improving the monitoring of the whole platform in consecutive releases.

ML Workflow

Pipeline Looping with Kale

You can now write looping statements in Kale SDK pipelines. By writing a simple Python for-loop you can convert it to hundreds of pipeline steps without having to write any KFP DSL or produce any Docker images.This enables you to perform data preprocessing on several datasets without having to build complicated scripts or submit separate jobs.

Kale Step Functions with Keyword Arguments

You can now call step functions with keyword arguments in the Kale SDK. This allows you to write Python code with less restrictions when creating ML pipelines via the Kale SDK. Look at how to create a Kale SDK pipeline for more information.

Support Type Hints for Step Definitions in the Kale SDK

Kale now supports type annotations in step definitions. You can (and should!) produce pipeline components that are explicit about their inputs and outputs using the Kale SDK.

Existing pipelines that use step functions without type annotations will still work. You will notice a few deprecation warnings, as we will not support untyped steps in the next release. Please make sure to update your code to match the new typing system. Head over to the data typing guide to learn more about how to write type hints, and find out the limitations and corner cases.

Important

Breaking Change

These is one corner case that will make your existing step definitions fail. We used to ignore default values in step function definitions. Now these default values are valid, but you need to add the corresponding type hints, otherwise Kale will assume that this is a generic file-based input, and will throw an error.

So, if you have pipeline steps with default values like:

@step(name="no-type-hints") def foo(a, b=42) pass

Make sure to type them properly:

@step(name="no-type-hints") def foo(a: str, b: int = 42) pass

Distributed Training with Kale and PyTorch

Run PyTorch distributed training jobs using Kale pipelines. You can now define and experiment with distributed training jobs using a single Python API, thus reducing the time to orchestrate and run complex and heavy distributed jobs to minutes. Stay tuned for more frameworks coming soon!

Ease of Use

Kale Images with GPU Support

You can now self-serve Jupyter Kale images with GPU support. They come with all the libraries required for GPU acceleration so that you don’t have to spend time in additional setup and configuration. Enjoy a pre-packaged GPU-enabled TensorFlow image or install PyTorch with a single command. Follow this guide to check out the instructions.

Kale-powered VSCode Νotebook Ιmage

We have created a Kale-powered VSCode image to be deployed using JWA so you can now use Kale alongside VSCode with the click of a button in a self-service way.

Automation

Automate the Log Gathering Process

We have automated the log gathering process for your EKF cluster. This way, you can gather logs in an automated manner for easier debugging and getting better and faster support.

Enhance Deployment Automation Process (AWS)

We have improved our automated EKF installation tool for easier and smoother deployments, so you now have a faster, easier, reproducible, automated, and declarative way to install EKF on AWS.

Integrations

Integrate EKF with Azure AD

We have integrated Azure AD with EKF for identity and access management. This extends the list of identity providers that we support (PingID, Okta, Google, GitLab).

Latest and Greatest

Support for Kubernetes 1.20

We have made sure that EKF runs on Kubernetes 1.20 so that you get all new Kubernetes 1.20 features and guaranteed Kubernetes support on all major cloud providers.

Upgrade CSI Snapshot CRDs from v1alpha1 to v1beta1

We have upgraded our version of Snapshot CRDs from v1alpha1 to v1beta1 to comply with the version that is available in the public clouds. Thus, you can now get all new Kubernetes 1.20 features and guaranteed Kubernetes support on all major cloud providers.

Ship Kale Notebook Images with Python 3.8

We now ship Kale Docker images for creating notebook servers with Python 3.8. Thus, you no longer need to be creating, maintaining, and supporting new Docker images by yourself every time a new Python version is out.

Support for JupyterLab 3.X

The JupyterLab images now come with the newer version 3.

Improvement/Bugfix

Configurable Rok Registry URL

We have made the default Rok Registry URL configurable, allowing admins to configure it when they are publishing or subscribing to a bucket. This makes it easier for Data Scientists and ML Engineers to publish and subscribe to Rok Registry buckets.

Support the KFP Caching Mechanism

We have improved EKF caching to support both the Arrikto and the native KFP caching mechanisms.

Version 1.4.4 (Titanium)

(Released Tue, 07 Jun 2022)

New features

  • Update the base Ubuntu Bionic image MiniKF uses on GCP to v20220419 and on AWS to v20220411.
  • Improve how Rok handles hundreds of simultaneous, new connections from individual nodes.

Bug Fixes

  • Fix MiniKF breakage caused by git fixing CVE-2022-24765.
  • Migrate legacy RWX volumes to new Rok releases.
  • Update the APT key MiniKF uses for Nvidia APT repositories.
  • Improve the time new Rok pods need to join the cluster.
  • Improve memory consumption in RokE pods.

Version 1.4.3 (Titanium)

(Released Mon, 21 Mar 2022)

New features

  • Add support for templated resources in Profile Controller.

Bug Fixes

  • Fix accounting for tolerations when the Rok Operator scales a Rok cluster.
  • Fix Cluster Autoscaler to sanitize node label on template nodes.

Version 1.4.2 (Titanium)

(Released Tue, 01 Mar 2022)

Bug Fixes

  • Fix an issue where the Rok etcd Pod had a memory request of 40GiB.

Version 1.4.1 (Titanium)

(Released Fri, 25 Feb 2022)

New features

  • Add argument for the number of workers to the task-gc command of the Rok API management tool.
  • Upgrade to a CentralDashboard image that exposes the EKF version in the dashboard’s sidebar.

Bug Fixes

  • Drastically decrease the time rok-operator takes to reconcile Rok cluster members.
  • Increase Rok Operator’s verbosity during the reconciliation loop.
  • Reduce the number of Rok Thrower updates to Etcd, for version stats that have not changed.
  • Fix an issue that prevented several command line tools from working inside a Kubernetes Pod due to a conflict with Kubernetes service environment variables.
  • Fix an issue where the tool performing garbage collection of Rok API tasks failed due to a missing environment variable.
  • Set a memory limit to Rok’s etcd.
  • Rename Rok’s Node Exporter cluster-scoped RBAC resources to avoid conflicts with Knative.
  • Fix an issue with parsing the revision number in Rok versions.
  • Fix calculating short Rok versions when the commit hash is missing.

Version 1.4 (Titanium)

(Released Mon, 14 Feb 2022)

New features

  • Introduce feature documentation about our GitOps process.

Bug Fixes

  • Fix typos in MiniKF central dashboard cards.
  • Fix a bug in the Rok API notebook driver where it failed to retrieve suggestions if a notebook with missing volumes existed in the namespace.
  • Update Kale commit to use its release branch containing bug fixes.
  • Fix regressions in JWA regarding form inputs not respecting the ConfigMap, backwards-compatibility for volume mount paths and PVC names, and the default size of volumes.
  • Fix backwards-compatibility issue in KFP UI regarding MLMetadata artifact name display.
  • Update the Kale images in EKF with respect to its commit update.
  • Fix a bug in the cleanup code of Rok CSI’s NodeStageVolume where it failed to remove the node from the staged list for RWX volumes.
  • rok-csi: Allow nodes on GKE to reach the Rok Access Server Pods.
  • Support kernel version 4.14.252-195.481.amzn2.x86_64 for node groups on EKS.
  • Support AMI release 1.19.15-20220112 [kernel version 5.4.162-86.275.amzn2.x86_64] for node groups on EKS.
  • Support AMI release 1.18.20-20220112 [kernel version 4.14.256-197.484.amzn2.x86_64] for node groups on EKS.
  • Replace deprecated GKE cluster version 1.19.12-gke.2100 with 1.19.16-gke.1500.
  • Remove stale home directories from MiniKF on GCP.
  • Update CentralDashboard image that supports dashboard new content.

Known Issues

  • The Python Kubernetes client and kubectl prioritize in-cluster configs and kubeconfigs in opposite order. This means that, when both configs are present, the clients will use different configs. This affects the objectstorage task of rok-deploy. The workaround is to unset the KUBERNETES_SERVICE_HOST env var.

Version 1.4-rc8 (Titanium)

(Released Fri, 07 Jan 2022)

New features

  • Suppress questions for AWS_ACCOUNT_ID and AWS_DEFAULT_REGION in task envvars-aws.
  • Extend the fast-forward section of the “Set Up Cloud Environment for AWS” guide to set up the environment context.
  • Enable the ‘Kubernetes’ path on ‘rok-deploy’.
  • Automate the missing Verify steps for AWS in the “Authorize Access to Object Storage” guide.
  • Implement a verification loop in order to amend the user experience when an error occurs in a verify section
  • Support AMI releases 1.18.20-20211109, 1.18.20-20211117 and 1.18.20-20211206 [kernel version 4.14.252-195.483.amzn2.x86_64] for node groups on EKS.
  • Support AMI release 1.19.13-20211009 [kernel version 5.4.149-73.259.amzn2.x86_64] for node groups on EKS.
  • Support AMI releases 1.19.15-20211117 and 1.19.15-20211206 [kernel version 5.4.156-83.273.amzn2.x86_64] for node groups on EKS.
  • Enable automatic Rok snapshot policies for Kubeflow system PVCs.
  • Add user guide for Rok Disk Manager.
  • Automate “Gather Logs for Troubleshooting”.

Bug Fixes

  • Remove unnecessary env vars AWS_ACCOUNT and AWS_IAM_USER from AWS docs.
  • Fix Knative Serving to avoid potential Istio misconfiguration due to conflicting ports between Istio Gateways.
  • Fix a bug where the Rok file chooser did not highlight the selected file.
  • Fix bug where the fingerprint check emitted wrong result in the “Configure Git” guide.
  • Fix bug where we trusted GitHub SSH keys without checking their fingerprints.
  • Fix a Rok API bug when adding a Rok Registry token to a Rok account from within a Kubernetes cluster.
  • Fix the volume unpinning logic of rok-csi to make forward progress, even if some volumes fail to unpin.
  • Update scale-in guides to point to the Arrikto provided Cluster Autoscaler.
  • Extend the global_filter option in the host’s LVM configuration to ignore devices created by CSI/LVMd.
  • Relocate misplaced panels in Rok’s Grafana dashboard.
  • Fix Cluster Autoscaler to not ignore PV affinities when scaling out.
  • Fix Cluster Autoscaler to disable scaling in unready nodes.
  • Reduce the logging output of the Rok S3 daemon under normal operation.
  • Drop support for the potentially vulnerable PMML predictor of KFServing.
  • rok-csi: Don’t set force-remount flag when recovering an unused RWX volume.
  • Periodically collect stale volumes in rok-csi.
  • Upgrade NodeJS in KFP UI and Centraldashboard to eliminate CVEs.
  • Upgrade the S3Proxy image in Azure deployments to include the latest version of log4j.
  • Fix typo when checking if snapshot is full in LVMd.
  • Fix era_invalidate failures resulting in failed CSI snapshots.

Version 1.4-rc7 (Titanium)

(Released Thu, 02 Dec 2021)

Bug Fixes

  • Fix a broken import in the rok-gw-client Debian package.

Version 1.4-rc6 (Titanium)

(Released Mon, 29 Nov 2021)

New features

  • Add instructions to present a notebook programmatically.
  • Introduce a user guide about updating the Rok wheels inside a notebook server.
  • Add an operations guide about how to create a privileged notebook server.
  • Introduce a script to snapshot all notebooks in a Kubernetes cluster and publish them to a Rok Registry.
  • Introduce a script to restore all notebooks in the buckets of a Rok Registry user with a given prefix.

Bug Fixes

  • Increase the initial delay for the liveness probe of etcd to handle slow startups.

Version 1.4-rc5 (Titanium)

(Released Fri, 19 Nov 2021)

New features

  • Introduce an operations guide about recovering RWX volumes after a node failure.
  • Add an operations guide on how to issue Rok Registry tokens.
  • Extend the docs with instructions to retrieve the logs of a Rok API task via the Rok UI.

Bug Fixes

  • Fix typo in the “Gather Logs for Troubleshooting” guide.
  • Refactor rok-deploy to use client-side apply for the Rok Monitoring Stack.
  • rok-csi: Don’t try to record events on non-existing resources.

Version 1.4-rc4 (Titanium)

(Released Tue, 16 Nov 2021)

New features

  • Use CloudFormation in the “Create Hosted Zone” guide.
  • Support using existing hosted zones in the “Create Hosted Zone” guide.
  • Use CloudFormation in the “Create IAM Role for ExternalDNS” guide.
  • Use CloudFormation in the “Create IAM Role for AWS Load Balancer Controller” guide.
  • Support AMI releases 1.18.20-20211001, 1.18.20-20211003 and 1.18.20-20211004 [kernel version 4.14.246-187.474.amzn2.x86_64] for node groups on EKS.
  • Support AMI releases 1.18.20-20211008 and 1.18.20-20211013 [kernel version 4.14.248-189.473.amzn2.x86_64] for node groups on EKS.
  • Support AMI releases 1.19.13-20211001, 1.19.13-20211003 [kernel version 5.4.144-69.257.amzn2.x86_64] for node groups on EKS.
  • Support AMI releases 1.19.13-20211004, 1.19.14-20211008 and 1.19.14-20211013 [kernel version 5.4.149-73.259.amzn2.x86_64] for node groups on EKS.
  • Re-introduce the new CentralDashboard updated main view. Now the first page will be showing specialized cards with Arrikto content.
  • Rebase our EKF manifests on top of the upstream stable 1.4 release.
  • Add instructions to snapshot a notebook using the Rok UI, command line and Rok Python client.
  • Use CloudFormation in the “Create ACM Certificate” guide.

Bug Fixes

  • Fix a rendering bug in rok-deploy for the “Create IAM Role for Cluster Autoscaler” task.
  • Fix LIOd waiting forever for the TCM loop device to appear.
  • Use simulate-principal-policy to verify permissions of IAM role for ExternalDNS.
  • Use simulate-principal-policy to verify permissions of IAM role for AWS Load Balancer Controller.
  • Use simulate-principal-policy to verify permissions of IAM role for EKS Cluster and EKS Node IAM Role guides.
  • PVCViewers Controller will be setting annotations to the Pods to allow the Cluster Autoscaler to delete them when scaling down a nodegroup.
  • The CentralDashboard can graciously handle the case where RST can’t be fetched, which is the case for airgapped environments.
  • Fix a bug when creating snapshots of notebooks with emptyDir volumes.
  • Fix a bug in the NodeUnstageVolume Rok CSI method that could result in stale (not deactivated) volumes.
  • Work around kernel bugs that could result in rok-csi not being able to deactivate and delete a RWX volume.

Version 1.4-rc3 (Titanium)

(Released Thu, 21 Oct 2021)

New features

  • Introduce a PyTorch distributed example for Kale.
  • Support the new ML Notebook driver in Kale.
  • Improve the Rok driver for Jupyter Notebooks to handle Notebook CRs instead of Pods.
  • Update manifests with Kale images which support the recent changes.
  • Support deploying Rok monitoring stack in air-gapped environments.

Bug Fixes

  • Use edit commands in Deploy Autoscaler guide instead of kustomize edit.
  • Fix KFP UI not finding the MLMD executions of cached steps.
  • Fix the KFP UI logic of retrieving archived step logs.
  • Show logs for cached steps in KFP UI.
  • Fix a KFP client credentials initialization bug in Kale images.
  • Update the Test Rok section of the installation docs to deploy an application in the user’s rather than the default namespace, so it is compatible with the task authentication changes introduced in Rok 1.4.
  • Add toleration to ensure that RWX volumes work on GPU dedicated nodes.
  • Revert: Update CentralDashboard to use specialized cards with content crafted for Minikf/EKF.
  • Patch knative-serving Deployments and set the safe-to-evict annotation to true.
  • Minify the Node Exporter Grafana dashboard JSON definition to avoid server-side applying the Rok Monitoring Stack.

Version 1.4-rc2 (Titanium)

(Released Fri, 15 Oct 2021)

New features

  • Use responsive tables in the VWA/TWA.
  • Show resources across all namespaces in the EKF dashboard.

Bug Fixes

  • Fix a bug in the Notebook Controller where it would override all existing annotations when setting the last-activity annotation.
  • Fix a typo in the JWA configmap where accessModes was wrongly spelled accessmodes.
  • Add a missing permission to JWA to be able to patch PVCs.

Version 1.4-rc1 (Titanium)

(Released Wed, 13 Oct 2021)

New features

  • Add Verify and Troubleshooting sections in the AKS docs to ensure that managed identities are enabled on AKS clusters.
  • Introduce arrikto-admin admonition in docs.
  • Introduce fast-forward admonition in docs.
  • Introduce custom design in nested lists in docs.
  • Improve the cleanup instructions of Rok, breaking them into multiple documents and improving the structure of each document.
  • Extend our docs with instructions to clean up a Rok installation on Azure.
  • Support installing Rok on Azure using only the Azure CLI.
  • Introduce persistent state for toggles and admonitions.
  • Add account management in Rok deployments.
  • Remove the section about draining CSI nodes from the upgrade instructions.
  • Add Verify section for Azure in the “Authorize Access to Object Storage” guide.
  • Introduce persistent state for tabs.
  • Automate the “Clone GitOps Repository” guide.
  • Automate the “Configure Access to Arrikto’s Private Registry” guide.
  • Extend literalinclude directive in docs.
  • Support adding EBS volumes to managed node groups.
  • Automate the “Create Cloud Identity” guide.
  • Automate the “Authorize Access to Object Storage” guide.
  • Automate the “Grant Rok Access to Private Docker Registry” guide.
  • Add styles for the :guilabel: role in docs.
  • Add instructions for logging in to EKF via the Okta Provider.
  • Automate granting access to Rok and Kubeflow Pipelines to user namespaces using skel resources.
  • Update Kale images to work with KF 1.4.
  • Introduce user guides for the Kale integration with the Kubeflow PyTorch Operator.
  • Support disabling automatic Profile creation upon login.
  • Automate the “Configure Git” guide.
  • Automate the “Configure AWS CLI” guide.
  • Automate the “Create VPC” guide for AWS.
  • Automate the “Configure Subnets” guide for AWS.
  • Introduce user guides for Rok.
  • Introduce user guides for the Kale support for pipeline conditionals, and the use of volumes for data passing.
  • Support unpinning of RWX volumes in rok-csi.
  • Add Verify section for AWS in the “Authorize Access to Object Storage” guide.
  • Upgrade Kubeflow to version 1.4.
  • Introduce user guides for the Kale support for Kubernetes metadata and spec configuration of pipeline steps.
  • Extend user guides with Kale-KFServing integration docs.
  • Extend rok-do to build the access server image that Rok CSI uses to provide RWX volumes on Kubernetes.
  • Make Rok API tasks impersonate the rok-task-runner service account in their namespace, instead of the last user that created or updated them.
  • Introduce a Kubernetes controller for Rok policies.
  • Introduce an operations guide for setting a culling policy for your Notebook Controller.
  • Introduce user guides for Kale container-based steps.
  • Patch Cluster Autoscaler to support scale-in operations in clusters running Rok.
  • Extend rok-deploy to support server-side applying resources to Kubernetes.
  • Always deploy Rok Monitoring Stack on Kubernetes using rok-deploy.
  • Upgrade Kale to support numerous new features and fix bugs.
  • Add support for tolerations in the RokCluster CR.
  • Automate the “Create EKS Cluster IAM Role” guide.
  • Automate the “Create EKS Node IAM Role” guide.
  • Automate the “Create EKS Cluster” guide.
  • Automate the “Enable IAM Roles for Kubernetes Service Accounts” guide.
  • Automate the “Access EKS Cluster” guide.
  • Automate the “Create EKS Node Group” guide.
  • Introduce an Ops guide to create a default snapshot policy for notebooks.
  • Introduce an Ops guide to create a snapshot policy for Kubeflow PVCs.
  • Automate the “Set Up Users for Rok” guide.
  • Automate the “Deploy Rok Components” guide.
  • Automate the “Set Up Rok Storage Class” guide.
  • Automate the “Install Kubeflow” guide.
  • Automate the “Integrate Rok with Kubeflow Dashboard” guide.
  • Introduce an operations guide to share a namespace with a group inherited from the OIDC provider.
  • Automate the “Create Hosted Zone” guide.
  • Automate the “Create IAM Role for ExternalDNS” guide.
  • Automate the “Deploy ExternalDNS” guide.
  • Support hiding the first paragraph of admonitions on demand.
  • Automate the “Create ACM Certificate” guide.
  • Automate the “Deploy cert-manager” guide.
  • Automate the “Create IAM Role for AWS Load Balancer Controller” guide.
  • Automate the “Deploy AWS Load Balancer Controller” guide.
  • Automate the “Deploy NGINX Ingress Controller” guide.
  • Automate the “Expose Istio” guide.
  • Support running rok-k8s-reboot in air-gapped environments.
  • Automate the “Deploy Cluster Autoscaler on AWS” guide.
  • Replace manual manifest edits with j2 commands in the “Authorize Access to Object Storage” guide.
  • Support mounting existing PVCs in JWA.
  • Allow starting notebooks that use GPUs in cluster that doesn’t have any GPU nodes.
  • Update CentralDashboard to use specialized cards with content crafted for Minikf/EKF.
  • Extend the culler mechanism of the Notebook Controller by monitoring Notebooks for idleness.

Bug Fixes

  • Improve anchor links scroll behavior in doc.
  • Extend the Azure docs to add tags in storage accounts.
  • Fix a bug in presentation policy suggestions in Rok UI.
  • Improve the indentation of code examples in the docs.
  • Handle all image references of KFServing in air-gapped deployments.
  • Handle deleted resource types in rok-deploy --delete.
  • Support RWX volumes in air-gapped deployments.
  • Improve toggle formatting in docs.
  • Improve the display of task logs for tasks with large numbers of log lines in Chromium browsers.
  • Improve handling of unknown labels in navigation buttons in the docs.
  • Improve nested lists styles in docs.
  • Improve explicit numbering in doc’s numbered lists.
  • Improve the style of tabs inside admonitions.
  • Use responsive tables in the JWA.
  • Patch AuthService to properly handle the revocation of the refresh and access tokens.
  • Update KFP-Cache to work with newer Argo.

Version 1.3.1 (Sapphire)

(Released Wed, 13 Oct 2021)

New features

  • Extend Istio to support regular expressions in Authorization Policies.

Version 1.3 (Sapphire)

(Released Mon, 20 Sep 2021)

New features

  • Upgrade Kale to fix local execution bugs and introduce some new features.
  • Support AMI releases 1.18.20-20210826 and 1.18.20-20210830 [kernel version 4.14.243-185.433.amzn2.x86_64] for node groups on EKS.
  • Support AMI releases 1.19.13-20210826 [kernel version 5.4.129-63.229.amzn2.x86_64] and 1.19.13-20210830 [kernel version 5.4.141-67.229.amzn2.x86_64] for node groups on EKS.

Bug Fixes

  • Fix port conflict between Rok’s and Knative’s monitoring stack.
  • Fix a bug where Rok Operator mishandled the trusted_CA_certs configvar during cluster upgrades.

Version 1.3-rc8 (Sapphire)

(Released Thu, 26 Aug 2021)

New features

  • Support containerd as a container runtime for Kubernetes, by configuring Argo to use the PNS executor.
  • Update the “Scale-in Kubernetes Cluster” documentation and remove the single node group, single Availability Zone requirement.
  • Support Ubuntu Bionic kernels 5.4.0-1049-azure and 5.4.0-1051-azure for AKS node pools.
  • Support AMI release 1.18.9-20210722 [kernel version 4.14.238-182.422.amzn2.x86_64] for managed node groups on EKS.
  • Support AMI release 1.18.20-20210813 [kernel version 4.14.241-184.433.amzn2.x86_64] for managed node groups on EKS.
  • Support AMI release 1.19.13-20210813 [kernel version 5.4.129-63.229.amzn2.x86_64] for managed node groups on EKS.
  • Support Ubuntu bionic kernel 5.4.0-1044-gke for GKE.
  • Support Kubernetes version 1.19.

Bug Fixes

  • Restore “clone an existing Rok snapshot” functionality in VWA.
  • Set the failurePolicy to Fail in the MutatingWebhookConfiguration for the admission-webhook controller.

Version 1.3-rc7 (Sapphire)

(Released Fri, 20 Aug 2021)

New features

  • Restructure the “Deploy Rok Registry” guide.

Bug Fixes

  • Fix a bug in rok-kf-prune which resulted in it removing resources cert-manager uses for leader election.

Version 1.3-rc6 (Sapphire)

(Released Wed, 11 Aug 2021)

New features

  • Introduce the Kale - Katib integration docs.
  • Expose the Rok sync daemon using Classic Load Balancers on EKS.
  • Add a guide on how to configure a Rok cluster to sync data with other peers.

Version 1.3-rc5 (Sapphire)

(Released Tue, 27 Jul 2021)

Bug Fixes

  • Set imagePullPolicy to IfNotPresent in Istio manifests.

Version 1.3-rc4 (Sapphire)

(Released Mon, 26 Jul 2021)

Bug Fixes

  • Upgrade Kale due to bug fixes.
  • Update the Rok 1.3 upgrade guide to check for Kubernetes version 1.17 or 1.18.

Version 1.3-rc3 (Sapphire)

(Released Thu, 22 Jul 2021)

New features

  • Improve the Kale SDK docs.
  • Enable the AutoML-related features of Kale.
  • Extend rok-notebook-upgrade script to support label selectors.
  • Extend rok-notebook-upgrade script to remove PodDefaults from notebooks.
  • Extend rok-notebook-upgrade script to add PodDefaults to notebooks.
  • Remove the AGPL-licensed libjbig2dec0 package from rok-tools.

Version 1.3-rc2 (Sapphire)

(Released Tue, 13 Jul 2021)

New features

  • Support AMI releases 1.17.12-20210628 and 1.18.9-20210628 [kernel version 4.14.232-177.418.amzn2.x86_64] for managed node groups on EKS.

Version 1.3-rc1 (Sapphire)

(Released Mon, 12 Jul 2021)

New features

  • Support RDM on Google Cloud.
  • Support creating a GKE cluster.
  • Expose services on Google Cloud.
  • Add instructions for logging in to EKF via the Google Identity Provider.
  • Rename the --aws-region argument of the Rok S3 daemon to --region.
  • Introduce the --authentication-scheme argument to the Rok S3 daemon, which controls the authentication scheme used when accessing the S3 service.
  • Introduce the --gcp-access-token argument to the Rok S3 daemon to pass the OAuth2 token when using the GCP authentication scheme.
  • Introduce the --gcp-project-id argument to the Rok S3 daemon to pass the Google project ID to use when accessing Google Cloud Storage.
  • Extend the Rok Operator to support deploying Rok using Workload Identities on GKE.
  • Support deploying Rok in GKE using Workload Identities.
  • Add instructions to deploy Rok using a Workload Identity on GKE.
  • Prevent GKE from forcing v1beta1 CSI snapshot CRDs.
  • Use high performance storage for Rok external services on GKE.
  • Support running nginx-ingress-controller in security-wise strict environments where privilege escalation is not allowed.
  • Introduce nav-buttons directive in docs.
  • Support air gapped deployments on AWS.
  • Support seamless upgrades for rok-image-patch.
  • Extend the Rok common download helper to automatically encode the downloaded content using the encoding found in the HTTP headers of the response.
  • Implement a Python 3 client for the MiniKF DDNS API.
  • Upgrade cert-manager to version 1.3.1.
  • Support Ubuntu Bionic kernel 5.4.0-1048-azure for AKS node pools.
  • Extend rok-kf-rebase to handle commits made with rok-image-patch, mainly used in airgapped installations.
  • Drop support for Kubernetes 1.16.
  • Support multiple node groups and Availability Zones in rok-k8s-drain tool.
  • Add instructions for logging in to EKF via the PingID Identity Provider.

Bug Fixes

  • Improve ordered list styles in docs.
  • Improve toggle directive’s nested functionality in docs.
  • Improve list design in docs.
  • Improve numbering in nested ordered lists in docs.
  • Make all documentation’s headers black.
  • Improve code-block’s desing in docs.
  • Specify the S3 bucket prefix when deploying Rok on Azure.
  • Ensure TCP keepalives are enabled in requests performed by the Rok API to the Kubernetes API server.
  • Use a predictable and unique storage account name on Azure.
  • Use a predictable and unique Managed Identity name on Azure.
  • Improve nested lists style in docs.

Version 1.2.2 (Ruby)

(Released Tue, 27 Jul 2021)

Bug Fixes

  • Set imagePullPolicy to IfNotPresent in Istio manifests.

Version 1.2.1 (Ruby)

(Released Mon, 26 Jul 2021)

Bug Fixes

  • Update the Rok 1.2 upgrade guide to check for Kubernetes version 1.17 or 1.18.
  • Restart Kubeflow conversion webhooks during the upgrade from 1.1 to 1.2.

Version 1.2 (Ruby)

(Released Mon, 12 Jul 2021)

New features

  • Add upgrade instructions for NGINX Ingress Controller.
  • Support AMI releases 1.17.12-20210526, 1.17.12-20210621, 1.18.9-20210526 and 1.18.9-20210621 [kernel version 4.14.232-176.381.amzn2.x86_64] for managed node groups on EKS.

Bug Fixes

  • Increase the buffer size that NGINX Ingress Controller allocates for reading HTTP response headers, so that it doesn’t fail when the Rok UI returns large headers.

Version 1.2-rc2 (Ruby)

(Released Wed, 04 Jun 2021)

New features

  • Introduce script to protect Arrikto EKF Pods from OOM conditions and CPU starvation.

Bug Fixes

  • Fix a bug in the rok-k8s-drain where it did properly handle expired credentials and failed with Unauthorized error after a period of time.

Version 1.2-rc1 (Ruby)

(Released Wed, 02 Jun 2021)

New features

  • Natively integrate Rok with Prometheus by serving basic Rok metrics at /metrics using Prometheus’s text-based format.
  • Introduce a Grafana dashboard to visualize Rok metrics extracted from Prometheus’s TSDB.
  • Separate Istio deployment from Rok and Rok Registry in rok-deploy.
  • Introduce “Arrikto” and “air gapped” custom admonitions in docs.
  • Ensure all libs3 library calls and S3 actions performed by the Rok S3 daemon are accurately recorded in the logs.
  • Use Kubernetes 1.17 for EKS clusters.
  • Introduce a test PU to easily test individual target PUs.
  • Add a Rok cluster config variable to allow connecting to an S3 service without verifying its SSL certificate.
  • Introduce fields for external OIDC providers in Rok and Rok Registry manifests.
  • Add support for authentication via external OIDC providers in Rok Fort.
  • Extend Rok Registry UI to initialize/finalize OIDC cycles.
  • Increase documentation’s content width.
  • Change ordered list design in docs.
  • Remove depth limitation from doc’s menu.
  • Update our docs with instructions on how to edit Registry-related images.
  • Extend Rok Operator to upgrade cluster config variables that are not specified under .spec.configVars, but are provided by the users as fields in the CR’s spec.
  • Implement an etcd backend for the Dynamic DNS API for MiniKF.
  • Introduce a Dynamic DNS API for MiniKF, that will serve names to MiniKF AWS instances under the domain minikf.arrikto.ai.
  • Introduce arrikto-dev, arrikto-contact and air-gapped admonition directives in docs.
  • Allow long links to wrap in docs.
  • Upgrade Istio to 1.9.0.
  • Make Rok Disk Manager work on Azure.
  • Upgrade Linux kernel in MiniKF to 5.4.104-0504104-generic.
  • Declaratively manage IAM roles needed to create an EKS cluster with AWS CloudFormation stacks.
  • Rename the assume-no-versioning command line argument of the Rok S3 daemon to --no-validate-versioning, and make it skip validation of S3 bucket versioning status when provided, regardless of whether versioning is used by the daemon.
  • Remove the --no-versioning argument from the Rok S3 daemon and automatically enable versioning when the IFC library is enabled via the --enable-ifc argument.
  • Update to Enterprise Kubeflow 1.3 manifests.
  • Enable TCP keepalives globally in Istio.
  • Allow admonitions to be toggled in the docs.
  • Introduce foldable admonitions in the docs.
  • Introduce tabs in the docs.
  • Automatically produce the list of images used by MiniKF.
  • Redesign MiniKF’s landing page for Vagrant.
  • Use our own nginx-ingress-controller kustomization instead of Minikube’s ingress addon.
  • Use manifests to deploy Istio Ingress instead of applying a formatted string value.
  • Produce a smaller Vagrant box for MiniKF by excluding non-critical images from the pre-pull list.
  • Extend rok-version to generate a valid SemVer for MiniKF.
  • Implement an authentication backend for the Dynamic DNS API for MiniKF AWS instances.
  • Support Ubuntu Bionic kernel 5.4.0-1040-azure for AKS node pools.
  • Support Ubuntu Xenial kernel 4.15.0-1108-azure for AKS node pools.
  • Support Ubuntu Xenial kernel 4.15.0-1109-azure for AKS node pools.
  • Support Ubuntu Xenial kernel 4.15.0-1111-azure for AKS node pools.
  • Disable Azure’s Admissions Enforcer for Istio.
  • Support RDM on Azure.
  • Enable TCP keepalives in rok-kubernetes Python module.
  • Install Azure CLI in rok-tools.
  • Introduce manifests to deploy S3Proxy on AKS.
  • Extend the docs with instructions to deploy Rok over S3Proxy on Azure cloud.
  • Introduce the rok-kf-rebase CLI tool to help with manifests rebase.
  • Introduce the rok-kf-prune CLI tool to help with resource pruning during upgrades.
  • Add upgrade instructions for EKF 1.3.
  • Add a maintenance guide with instructions on how to add an internal GitHub repository as a backup GitOps remote.
  • Add a maintenance guide with instructions on how set up cluster-wide access to a Docker Registry.
  • Introduce script to scale-in a Kubernetes cluster.
  • Upgrade to Istio 1.9.5.
  • Use Kubernetes 1.18 for EKS clusters.

Bug Fixes

  • Fix a bug where the Rok S3 daemon would not verify the SSL certificate of the S3 service it connected to.
  • Prevent updating browser’s history in docs when scrolling.
  • Fix a bug in Registry UI that was showing the “Sign In” form when there’s a single Social provider.
  • Fix a bug that prevented setting cluster config variables to values that contain braces.
  • Fix an incompatibility issue in Rok APIs that caused Prometheus metrics to be registered more than once in Python 3.
  • Fix a Python 3 compatibility bug in the Rok etcd3 client.
  • Fix Go runtime issue that made CSI sidecars crash because of hitting max locked memory limits by upgrading Linux kernel in MiniKF.
  • Fix a bug where the Rok S3 daemon would attempt to validate the versioning status of S3 buckets regardless of the value of the --assume-no-versioning flag.
  • Fix a bug where the Rok S3 daemon would fail to be deployed over an S3 API that does not support versioning related API calls due to listing versions during its initialization.
  • Fix a bug where custom admonitions did not support multiple CSS classes.
  • Fix a bug where a user couldn’t register a new Rok Registry from the settings page in the UI.
  • Fix email symbols handling in Rok Registry links in the UI.
  • Fix a bug causing the login page of the Rok UI to fail with a NullInjectorError exception.
  • Fix a bug that resulted in an incorrect suggested file name in Dataset snapshot policies.
  • Fix a bug where after changing the file name of a snapshot policy, the Rok UI would still display the default value.
  • Improve the copy button, implement exactly the same behavior as manually selecting and copying text.
  • Retry Kubernetes watch() operations on ProtocolError exceptions.
  • Improve copy behavior for secondary promts in doc’s code blocks.
  • Improve highlighting of prompts in doc’s code blocks.
  • Improve text color for command’s output in doc’s code blocks.
  • Improve copy behavior in doc’s code blocks with command’s outputs.

Version 1.1.1 (Quartz)

(Released Thu, 22 Jul 2021)

Bug Fixes

  • Update the Rok 1.1 upgrade guide to check for Kubernetes version 1.17.

Version 1.1 (Quartz)

(Released Wed, 02 Jun 2021)

New features

  • Simplify Rok’s cleanup on Kubernetes by adding the –delete mode for rok-deploy
  • Improve the periodic rule of Rok API version retention policies to retain the latest instead of the earliest version in each interval.
  • Allow easily retrieving the members of a group in the Rok API via a distinct API call, instead of them being part of the files list API call.
  • Extend the files list API call of the Rok API to support including deleted files in the response.
  • Include the number of versions of each object in the files list API call of the Rok API.
  • Support pagination in the files list API call of the Rok API.
  • Introduce pagination in the files list page of the Rok UI.
  • Replace the coarse grained authorization which was applied by the Rok API to provide namespace isolation with fine grained authorization tests for each API call, ensuring the user is authorized to perform the specific action they requested.
  • Only allow authenticating via a token in the Rok client and CLI.
  • Drop the GW_ part of all environment variables used by the Rok client. For example, rename ROK_GW_TOKEN to ROK_TOKEN.
  • Use the Authorization: Bearer <token> header instead of the X-Auth-Token: <token> header for authentication in the Rok API and client.
  • Support using more than one authentication backend simultaneously in the Rok API.
  • Support authentication via Kubernetes tokens in the Rok API.
  • Retrieve the CSRF token from the X-XSRF-Token header in the Rok API.
  • Introduce more fine-grained ClusterRoles for users and administrators to provide access to the Rok API.
  • Restrict access to individual Rok API services via RBAC rules.
  • Use a distinct call to list group members in the versions list page of the Rok UI.
  • Rename the Rok CLI from rok-gw to rok.
  • Automatically reload tokens before every request in the Rok client if they have been provided via a file.
  • Add EKS manifests for Rok Registry.
  • Extend Rok’s build version with the source branch of the release.
  • Introduce a script to upgrade the image of all notebooks in a cluster.
  • csi: Add ignore-cbt VolumeSnapshot annotation to disable the CBT functionality for a specific snapshot.
  • csi: Add reset-cbt PVC annotation to disable the CBT functionality for the next snapshot of the PVC.
  • Introduce a script to perform a rolling reboot of a Kubernetes cluster.
  • Introduce a script to reset the CBT data of all Rok PVCs.
  • Extend rok-deploy to deploy Rok Registry clusters and split the deployment process into three steps: Deploy, Generate manifests, Apply manifests.
  • Include the user’s AWS account ID in the default S3 bucket name prefix.
  • Introduce a script to list all images in a kustomization tree
  • Introduce a script to mirror Docker images to private registries.
  • Support mirroring images to ECR registries.
  • Support Docker-in-Docker from within rok-tools.
  • Remove the v prefix from Rok version and related artifacts.
  • Support deploying Rok over pre-existing, empty S3 buckets
  • Support patching kustomization to use mirrored images.
  • Use AWS Load Balancer Controller instead of old ALB Ingress Controller.
  • Implement a new footer design in the docs.
  • Make the header and sidebar of the docs always visible.
  • Highlight the active menu item in the Rok docs.
  • Ensure that the active menu link in the docs sidebar is always visible when a user navigates to a new page.
  • Upgrade Font Awesome version in docs.
  • Improve the appearance of admonitions in the docs.
  • Make Rok operator handle long Rok versions in upgrade jobs, by truncating them to 63 characters.
  • Support setting kubelet’s dockerconfig.json for accessing private registries without using imagePullSecrets on all pods.
  • Support using classic ELB instead of ALB to expose NGINX.
  • Support terminating TLS on NGINX.
  • Support using self-signed certificates with ELB in front of NGINX.

Bug Fixes

  • Improve the copy button, implement exactly the same behavior as manually selecting and copying text.
  • Improve copy behavior for secondary prompts in doc’s code blocks.
  • Improve text color for command’s output in doc’s code blocks.
  • Improve copy behavior in doc’s code blocks with command’s outputs.

Version 1.1-rc8 (Quartz)

(Released Thu, 27 May 2021)

New features

  • Mark Rok and RokCSI Pods as critical, to avoid OOM kills and evictions.

Version 1.1-rc7 (Quartz)

(Released Mon, 24 May 2021)

New features

  • Support AMI releases 1.16.15-20210329 and 1.16.15-20210414 [kernel version 4.14.225-169.362.amzn2.x86_64] for managed node groups on EKS.
  • Support AMI release 1.16.15-20210501 [kernel version 4.14.231-173.360.amzn2.x86_64] for managed node groups on EKS.
  • Support AMI releases 1.16.15-20210504, 1.16.15-20210512 and 1.16.15-20210518 [kernel version 4.14.231-173.361.amzn2.x86_64] for managed node groups on EKS.

Version 1.1-rc6 (Quartz)

(Released Fri, 02 Apr 2021)

New features

  • Support AMI release 1.16.15-20210322 [kernel version 4.14.225-168.357.amzn2.x86_64] for managed node groups on EKS.
  • Support serving multiple versions of the docs.

Version 1.1-rc5 (Quartz)

(Released Thu, 18 Mar 2021)

Bug Fixes

  • Fix rok-lio bug that causes rok-csi to misdetect whether a Fisk is exposed as a block device.
  • Fix race in the pre-clone verification step of LVMd that could lead to errors, such as failures to unexport the origin Fisk, I/O errors, and stale TCMU handlers.
  • Fix a bug in the KFP API server causing it to crash if the default parameter values are empty.

Version 1.1-rc4 (Quartz)

(Released Fri, 12 Mar 2021)

New features

  • Support AMI release 1.16.15-20210310 [kernel version 4.14.219-164.354.amzn2.x86_64] for managed node groups on EKS.

Version 1.1-rc3 (Quartz)

(Released Thu, 11 Mar 2021)

Version 1.1-rc2 (Quartz)

(Released Thu, 04 Mar 2021)

Version 1.1-rc1 (Quartz)

(Released Tue, 23 Feb 2021)

Bug Fixes

  • Remove a workaround that automatically added the Kubeflow-UserID header in all Rok client requests performed inside a Kubernetes cluster.
  • Fix a bug where Rok API tasks created using a Kubernetes token failed with a 403 error when trying to access the Kubernetes API.
  • Fix a security issue where Rok CSI would allow registering a VolumeSnapshot in a different Rok account than the snapshot’s namespace.
  • Fix a security issue where Rok CSI would allow creating a PVC from a Rok URL in a different Rok account than the PVC’s namespace.
  • Remove support for the rok/origin-fisk and rok/origin-fisk-group annotations from Rok CSI, which violated namespace isolation by allowing users to register any fisk into their account.
  • Fix 2 CRITICAL dm-era bugs which can result in corrupted Rok snapshots.
  • Fix a bug where the Rok etcd library would sometimes report an incorrect number of retries in its logs.
  • Fix a bug where the Rok DLM CLI would incorrectly log warnings about all other DLM clients being missing when requested to retrieve information for one of them.
  • Fix a bug in the Python bindings of the Rok DLM library that resulted in the rok-dlm CLI not breaking stale locks after a pod restart.
  • Omit the -rok-rok suffix from the name of the CF stack and related IAM resources needed to grant Rok full access to S3 buckets.
  • Restore rok-do’s execution time back to normal by removing unneeded sleep() calls when logging to rok-do’s frontend.
  • Fix a bug where the Rok S3 daemon would attempt to assume an AWS role using the AWS STS endpoint of an incorrect region.
  • Fix a bug where the Rok S3 daemon would attempt to enable versioning in S3 buckets regardless of whether it was already enabled.
  • Fix inline markup wrapping in documentation on smaller screens.
  • Fixed a bug where certain code blocks in the docs could not be selected.
  • Use the correct Registry base URL in the Rok UI during the Rok registration process.

Version 1.0 (Platinum)

(Released Mon, 18 Jan 2021)

New features

  • Extend Rok’s provisioning tool for Kubernetes with the –apply mode to avoid questions, skip regeneration of manifests and only apply specified Kustomize packages.
  • The home directory of user root inside the rok-tools container is now persisted to retain personal keys and settings across restarts or upgrades.
  • Introduce the rok-cluster-admin ClusterRole for Rok cluster administrators on Kubernetes.
  • Add support for building reproducible rok-kmod images with rok-do.
  • Rok’s provisioning tool for Kubernetes now installs Istio 1.5.7 instead of 1.3.1.
  • Improve the style of all links in the Rok UI.
  • Display the number of versions in the object list of Rok UI
  • Introduce search support for buckets and objects in Rok UI.
  • Display and edit bucket descriptions in Rok UI.
  • do: Add support for task caching
  • Introduce group delete for objects and versions in Rok UI.
  • Add rok-do tasks that build bootstrap images for Ubuntu, CentOS and Amazon Linux.
  • Allow full access to the Rok API from within the roke and rok-operator pods.
  • Make Rok Python bindings compatible with Python 3.
  • Add support for Jupyter notebook and dataset snapshot policies in the Rok API.
  • Run local Garbage collection more frequently to avoid out of space errors in systems with lots of snapshots.
  • Add support for garbage collecting stale resources for rok-csi/rok-lvmd.
  • Remove the “escalate” permission from the Rok Operator/Cluster pods.
  • Handle transient disconnections in a less intrusive way in Rok UI.
  • Introduce a user guide for snapshot and retention policies.
  • Allow using non-default VPC in the EKS installation guide.
  • Allow easily retrieving the members of a group in the Rok API.
  • Increase system observability by adding a monitoring stack alongside Rok on Kubernetes, based on Prometheus and Grafana.
  • Configure Prometheus and Grafana to collect and visualize metrics from Rok’s etcd cluster.
  • Configure Prometheus and Grafana to collect and visualize metrics from Rok’s Redis server.
  • Add public document with description and deployment steps for Rok’s monitoring stack on Kubernetes.
  • Use Kubernetes 1.16 for EKS clusters.
  • Enhance the validation section of the Kubeflow integration guide.
  • Support force-updates with rok-deploy.
  • Enable Rok CSI to migrate PVs from cordoned nodes.
  • Update to Kubeflow 1.1 manifests that are directly kustomize-buildable and thus stop using kfctl.
  • Provide a simple way to run a single program instance across the Rok cluster.
  • Ensure that master-specific commands running in the Rok cluster’s master node do not leave behind stale locks.
  • Allow garbage collecting Rok API tasks based on their status.
  • Enable automatic garbage collection of Rok API tasks in the Rok cluster.
  • Upgrade Kubernetes to 1.16.15 in MiniKF
  • Upgrade Minikube to 1.11.0 in MiniKF
  • Support MiniKF on AWS even in air gapped environments.
  • Use the j2 CLI to render Jinja templates instead of using envsubst and environment variables.
  • Use Cluster Autoscaler to support scale-in/scale-out with minimum manual interaction and without losing any data on Rok.
  • Increase observability into Rok operator’s decisions by emitting more events on the primary CR during cluster scaling.
  • RDM: Support setting the size of an LV based on the size of the corresponding VG.
  • Revamp Rok Disk Manager to always request LVs with size that is a multiple of the block size, i.e. 512.
  • Introduce the min/max methods in Rok Disk Manager.
  • Modify Rok Disk Manager’s script to search for extra EBS volumes as well.

Version 1.0-rc6 (Platinum)

(Released Thu, 07 Jan 2021)

Bug Fixes

  • Fix a bug where the Rok StatefulSet driver would create a group resource with the wrong order for the registered disks.
  • Fix a bug where the Rok StatefulSet driver would not sort the Pod names correctly, placing pod-10 before pod-2 inside the generated group resource.

Version 1.0-rc5 (Platinum)

(Released Tue, 05 Jan 2021)

Version 1.0-rc4 (Platinum)

(Released Mon, 21 Dec 2020)

Bug Fixes

  • Fix a bug where rok-deploy modified the kustomization file for Istio, removing some useful resources/transformers.

Version 1.0-rc3 (Platinum)

(Released Thu, 17 Dec 2020)

Version 1.0-rc2 (Platinum)

(Released Thu, 03 Dec 2020)

Bug Fixes

  • Fix a cache invalidation problem that caused Rok Operator to report a stale software version in chained cluster upgrades.
  • Fix an issue where the Rok Gateway task daemon was not able to submit tasks for execution in a timely manner when a large number of policies were active in the Rok API.

Version 1.0-rc1 (Platinum)

(Released Wed, 25 Nov 2020)

Bug Fixes

  • Fix a bug where the account selector in the Rok UI sometimes displayed the incorrect account.
  • Do not display a logout button when logging out is not possible in the Rok UI.
  • Fix a bug where Rok API drivers would use the account instead of the user to perform authorization checks for tasks.
  • Fix a bug where the Rok UI would sometimes raise an undefined variable exception after logging in.
  • Fix a bug where the Rok UI would ignore the namespace selected via the Kubeflow dashboard selector.
  • Fix a bug where the Rok UI would not render correctly in a Kubeflow environment.
  • Fix a bug where the messages of Kubernetes errors would not be visible in Rok task logs.
  • Fix a bug where the Rok client would fail to retrieve the user’s ID when using static authentication.
  • Fix a bug where Rok CSI would fail to hydrate a PVC or auto-register a VolumeSnapshot when the Rok API was using AuthService authentication.
  • Correctly display the account name instead of the user ID in Rok CLI.
  • Fix a bug where users were able to access resources for non-existent Kubernetes namespaces.
  • Fix a bug where users were able to access resources created for a no longer existing Kubernetes namespace after a new namespace with that name was created.
  • Fix a bug where the Rok file chooser would be displayed under the Kubeflow dashboard, and as a result be unable to select files.
  • Fix a bug where Rok CSI would freeze and stop serving requests during the initialization of the Rok client.
  • Fix Chrome’s bouncy behavior when using the copy button in docs.
  • Fix a bug where Rok CSI would sometimes use the incorrect Rok API version when restoring a volume from the Rok URL of a group.
  • Fixed an issue where Rok deployment would sometimes fail due to S3 temporarily reporting that a bucket does not exist shortly after its creation.
  • Remove dependency on lib2to3 when importing rok_common.sysutils.
  • Enable log output in the console of Rok and Rok Registry UIs.
  • Fixed a bug where the auto-snapshot feature or Rok CSI failed with a 403 Forbidden error because Rok CSI nodes were not allowed to access Rok API.
  • Fix a bug where the Rok Composer could deadlock while deleting a fisk when GC was running.
  • Reduce the log output of the Rok election daemon while the cluster is idle.
  • Preserve query parameters when the namespace changes in Rok UI.
  • Fix a bug where the Rok Operator could not properly scale a Rok cluster licensed for N nodes, when the cluster was scaled down to N-1 nodes and then back to N, with the Nth node changing.
  • Allow LVMd to recover from an interrupted snapshot.
  • Fix a bug where the UI was showing the wrong object count when deleting objects.
  • Fix a bug in Rok UI where the navigation to a version failed in Kubeflow environment, the first time.
  • Fine-tune the update strategy for rok-disk-manager and rok-kmod DaemonSets so that they can be upgraded in parallel.
  • Fixed an issue where Rok API components could fail to list tasks in etcd if their total size exceeded a certain threshold, preventing new tasks from being executed.
  • Fix EKS deployment guide not to use all subnets for external load balancers.
  • Fix GitLab integration guide not to disable firewall.
  • Make the task-gc management command more efficient for large numbers of tasks.
  • Fix a performance issue when protecting large numbers of objects against logging.
  • Fix a performance issue when listing keys recursively in the Rok etcd v2 emulation client.
  • Fix a bug where the locks of task-gc would not be cleaned up after a restart, preventing the gc of Rok API tasks.
  • Increase the minimum size of Data disk on GCP to avoid failures during provisioning MiniKF.
  • Fix a bug where if the Rok master node was permanently removed, other nodes did not attempt to become master.
  • Fix a bug where a failed rok-election-ctl command could prevent clustering from electing a new master.
  • Reduce the amount of time that the provisioning script might stuck on waiting for APT in an air gapped environment.
  • Fix a bug where tcmu_handler could timeout while waiting for a connection with Rok to succeed.

Version 0.15.1 (Onyx)

(Fri, 17 Jul 2020)

Bug Fixes

  • Make the building of docs depend on version-specific manifests.
  • lvmd: Fix bug that could lead to data corruption when snapshotting a filesystem that needs recovery.

Version 0.15 (Onyx)

(Thu, 25 Jun 2020)

New features

  • Extend Rok Operator to support Rok updates in Kubernetes, by handling mutations of the RokCluster CR.
  • Support syncing versions with custom chock size.
  • Extend the Rok Operator to be able to deploy Rok Registry clusters.
  • Create Python 3 wheels for several Rok components, mainly the Rok Gateway client.
  • Create separate ConfigMaps for the cluster init Job and the cluster itself, when deploying on Kubernetes.
  • Support restricting the deployment of Rok and Rok CSI Pods on specific Kubernetes nodes, based on the new nodeLabels field of the RokCluster spec.
  • Add support in the Rok Gateway client for JSON/CSV output.
  • Add support in the Rok client to delete multiple fisks.
  • Add a dialog to copy files in the Rok UI.
  • Enable default connection of all PUs to the external OpenStore controller
  • Revamp the icons of the Rok UI.
  • Switch lvmd over to using transient DM snapshots to avoid the overhead of storing on-disk the snapshot metadata.
  • Update Angular to version 8.2.14
  • Simplify the initialization of the Rok Gateway Client.
  • Accelerate the loading of Rok’s landing page.
  • Extend Rok Operator to produce events on the cluster CR in various phases of its deployment.
  • Allow users to deploy Rok Registry with Istio.
  • Revamp the policy schedule and login components in the Rok UI.
  • Change RDM’s operating method from parse-all then load-or-save, to parse and load-or-save line by line.
  • lvmd: Avoid copying deleted files when taking a volume snapshot.
  • Extend rok-csi to support the mountOptions field of a StorageClass.
  • Support setting arbitrary device attributes for CSI volumes (e.g., readahead)
  • Introduce a C Redis client library.
  • Disengage Rok Operator from managing Rok’s Storage and VolumeSnapshot classes and delegate this task to kustomize.
  • Support auto-recover from latest snapshot for rok-csi volumes for which the corresponding Kubernetes node has been removed.
  • Add bucket’s icon in Rok UI’s breadcrumb trail.
  • Extend the nexus thread so that it can defer the transmission of a CCB if the underlying transport has run out of resources.
  • Introduce the shared memory transport. The shared memory transport is a SCSI transport that allows PUs to share common memory space and thus exchanging SCSI commands with less data copies. It is expected to offer better I/O performance in comparison to the TCP and iSCSI transports.
  • Extend the Controller’s Static Policy to allow choosing transport for a nexus. The transport for a nexus can be either TCP or Shared Memory.
  • Extend the Rok and Rok Registry cluster CRDs with the status subresource, to monitor their health/state at any time.
  • Fix a bug where fisks with names between 126 and 128 characters failed to be created.
  • Migrate libmap’s epoch cache from etcd to Redis.
  • Introduce preliminary support for replicated volumes in lvmd.
  • Extend rok-init’s installation modes with upgrade to coordinate Rok and Rok Registry cluster upgrades.
  • Extend Rok Operator’s business logic to carry out software upgrades of Rok and Rok Registry clusters on Kubernetes.
  • Implement a badge to display files in a uniform manner in the Rok UI.
  • Users can now pass Rok configuration variables as objects, instead of strings, in the Rok and Rok Registry CRs.
  • Allow running rok-do tasks over SSH.
  • Add support for IAM Roles for Service Accounts to the Rok S3 daemon.
  • Extended the Rok AWS library with scripts to purge S3 buckets, authorize a Rok EKS installation to access S3 and attach EBS volumes to EKS cluster nodes.
  • Support upgrading kernel modules with rok-kmod.
  • Introduce mechanism to mark sections of Sphinx documents as ignored for specific doc builds e.g., internal.
  • Prefer custom built modules in case the kernel already supports them.
  • Support the file: argument prefix in Rok C daemons, to allow passing an argument value from a file.
  • Use the common AWS_* environment variables to provide credentials to the Rok S3 daemon, and drop support for the old ROK_S3_* variables.
  • Make rok-probes library Python3 compatible.
  • Make rok-kmod Python3 compatible.
  • Employ rok-probed to securely probe Rok and Rok Registry external services in Kubernetes initContainers.
  • Support building reproducible rok-kmod images locally.
  • Allow fine-grained control over the sanitization of logs, which enables accurately logging multi-line messages or messages that include otherwise unsafe characters if required.
  • Extend Rok clusters with a UUID, that can be used to identify resources owned by the cluster.
  • Perform fine-grained tests regarding the S3 storage of a Rok cluster during its initialization phase, to catch and report configuration errors as fast as possible.
  • Add initial version of rok-deploy, an interactive CLI tool to declaratively configure and install Rok on Kubernetes.
  • Automatically allow access to Rok API resources to users that have access to Kubeflow resources in the same Kubernetes namespace.
  • Enable the Rok API to run behind Istio using AuthService for authentication.
  • Allow switching the displayed namespace in the Rok UI.
  • Allow selecting the account displayed in the Rok UI via the Kubeflow namespace selector.
  • Configure Sphinx to fail if warnings are emitted when building Rok docs.
  • Extend Rok Operator’s business logic to conditionally update the cluster configuration on etcd based on the spec.configVars field of the cluster CR.
  • Do not store the Rok Cluster’s configuration on etcd, but instead only store them locally in the image.
  • Add instruction to upgrade a Rok Cluster on Kubernetes.
  • Extend the Rok API to allow its clients to interact with any desired account, instead of necessarily the one that matches the UUID of their user.
  • Improve messaging in UI’s network errors.

Bug Fixes

  • Improve the Rok Thrower’s security, by sanitizing certain messages of other peers.
  • Fix a bug where Target PUs would access fields of a CDB before checking whether its length matches the expected length (inferred from the opcode).
  • Speed up Rok cluster initialization when deploying Rok on kubernetes.
  • Fix race between dm-clone device removal and ongoing snapshot in LVMd.
  • Remove size restrictions for LDAP IDs in Fort.
  • Check if a user’s LDAP attributes are UTF-8 encoded and warn appropriately.
  • Manually wipe filesystem signatures on new LVs in LVMd to workaround lvcreate hangs on Stretch caused by lvcreate asking for user confirmation before wiping any FS signatures found.
  • Fix an issue where tooltips would sometimes appear outside the edge of the screen in the Rok UI.
  • Fix a bug in the files list page of the Rok UI where some files were not being displayed correctly if the bucket contained groups and standalone files with the same name.
  • Improve termination of Rok Operator on Kubernetes by properly handling the SIGTERM signal.
  • Fix keeping logs in case a cron job fails (like rok-cluster-gc).
  • Support retries in libredis.
  • Fix creating RAID with 1 drive using RDM.
  • Fix object protection, when multiple reclassing is involved.
  • Fix a bug where Rok cluster members would sometimes fail to detect the master has changed after the master reboots (perhaps with a changed IP), ending up with stale information about the master.
  • Fix dm-clone bug which could lead to discarding the wrong blocks, causing data corruption.
  • Add missing oveflow check for total number of regions in dm-clone.
  • Add missing casts to dm-clone to prevent overflows and data corruption.
  • Fix dm-clone’s status output in case the total number of regions is 2^31.
  • Fix a bug where the Rok client failed to output errors in Python 3.
  • Fix a bug where the Rok S3 daemon would retry retrieving security credentials using all supported methods instead of only the one that previously succeeded.
  • Fix a bug where the s3 credentials provided via a Kubernetes secret would be stored unencrypted in the Cluster’s etcd.
  • Fix a bug where the Rok S3 daemon would attempt to retrieve credentials from EC2 instance metadata if assuming a role with AWS Web Identity was requested but failed.
  • Fix a bug where the Rok S3 daemon could not start if it had failed to create its S3 buckets in a previous execution.
  • Fix a bug where requests towards the Rok API could fail when Kubernetes authorization was enabled, due to a 401 Unauthorized error when submitting the SubjectAccessReview.
  • Always invalidate the cached cluster configuration manager that Rok Operator uses to interact with the cluster configuration stored in etcd.
  • Fix a bug where the Rok S3 daemon would not log what request is being performed when performing retries.
  • Fix a bug where the Rok cluster would not go into an error state if the security credentials of the Rok S3 daemon became invalid.

Version 0.14.1 (Nephrite)

(Released Fri, 19 Jun 2020)

New features

  • Enable rok-csi to identify the case where a Kubernetes node has been removed, and allow the volume deletion to succeed.
  • Add readiness and liveness probes in RDM.

Bug Fixes

  • Fix a bug where RDM would operate on all available devices, if devices passed to get_disks() did not exist.
  • Fix a bug where the cluster configuration would get messed up when multiple members were joining the Rok cluster at the same time.
  • Fix bug in the progress reporting of LVMd where some steps appeared as stuck.
  • Handle Kubernetes node removals/renames, by breaking the corresponding Rok member locks.
  • Fix a bug in the Rok Gateway Kubernetes driver causing StatefulSet presentations to fail.
  • Fix rok-liod to work with ec2-utils.
  • Fix a bug in the transport library which could result in a Nexus failing to handle a request with Bad message error.
  • Fix a bug where the Rok S3 daemon would sometimes fail with an InvalidToken error due to incorrect parsing of the session token from the environment.

Version 0.14 (Nephrite)

(Released Mon, 16 Dec 2019)

New features

  • Use etcd v3 as the default store of the Rok Gateway.
  • dm-clone: Hydrate regions in batches to achieve better hydration throughput.
  • Use latest Minikube v1.2.0 for MiniKF.
  • Make rok-csi topology aware
  • Support Kubernetes v1.14
  • Escape non-ASCII, non-printable characters from logs. This should provide protection against terminal manipulation and homoglyph attacks by malicious user input.
  • Support using expressions as values for cluster configuration variables.
  • Implement a new design in the bucket info page of the Rok UI.
  • Support deploying Rok with LDAP integration.
  • Allow the Kubernetes Rok operator to scale Rok clusters up and down.
  • Support glob pattern for secrets in rok-csi.
  • Enable user impersonation in Rok Gateway Kubernetes service drivers.
  • Support Debian Stretch.
  • Enable user to configure HTTP for Rok Gateway on RokE.
  • LVMd now automatically removes the dm-clone device when hydration finishes.
  • Extend SCSI INQUIRY command to support retrieving composer parameters.
  • Record access events of Fort tokens against the Fort API.
  • Support resizing LVMd/CSI volumes restored from snapshots.
  • Enable rok-s3d to detect invalid bucket prefix.
  • Support adjusting dm-clone’s hydration parameters in LVMd.
  • Use etcd v3 as the store for libmap’s epochs.
  • Add support for configuring the snapshot chunk size in LVMd.
  • Improve LVMd performance for snapshots on EBS backed devices.
  • Add support for configuring the size of the COW device for thick snapshots.
  • Allow rok-csi to rewrite the URLs provided by the user, so that they work even if the user visits Rok behind a proxy.
  • Add RAID support in RDM.
  • Support exposing extra block device attributes (e.g., model or wwid) in RDM.

Incompatible/important changes

  • The opcode for the ROK_RESIZE_FISK SCSI command was changed. Since the resize command is currently used only by rok-client and rokfs, this should not cause any problems to existing deployments.
  • The format of the ROK_FISK_CREATE SCSI command was changed.
  • The format of the ROK_FISK_COMPOSE SCSI command was changed.

Bug Fixes

  • Use direct I/O when accessing block devices with LVMd.
  • Fix a bug where hasherd would erroneously delete a not-completed task.
  • Gracefully handle the deletion of the last version of a file in the Rok UI.
  • Fix a bug where the etcd v2 emulation client would fail to create the client prefix.
  • Fix a bug where rok-csi failed to provision volumes where the provided size was not a multiple of blksize (512 bytes).
  • Fix a bug where lvmd kept nodelocal references to a volume’s internal snapshot, that might get lost as nodelocal storage is ephemeral.
  • Fix a bug where lvmd failed to update dm-clone’s origin device.
  • Make logs from C daemons visible to Kubernetes.
  • Fix a bug where rok-operator would remove and then re-add a cluster member during after a node reboot.
  • Fix a bug where the ML/Kubernetes Rok GW driver would sometimes fail to complete the snapshot operation.
  • Fix lvmd bug where the size of volume’s underlying LV and the size of its internal snapshot could be different, because LVM automatically rounds up the size of LVs to be a multiple of its physical extent size.
  • Fix a race which could crash tcmu-handler.
  • Fix a race where PVs provisioned by local-static-provisioner could be used before the underlying storage has been mounted.
  • Fix a race where RDM would not call udevadm settle after calling parted, which opens devices for write and can cause partitions to briefly disappear.
  • Fix a bug in RDM where loading a Partition when the partition table is on a device with a symlink as path failed.
  • Fix a bug which could cause stale fisks not being deleted when using rok-csi.
  • Fix a race which could cause pui_map_fisk_get_plu() to fail with EAGAIN
  • Workaround an issue where the kernel fails to allocate a SCSI loopback device, causing liod to hang.
  • Fix a bug which resulted in the Rok Gateway driver for Kubernetes showing an empty YAML during presentation.
  • Increase etcd’s maximum supported size of incoming and outgoing messages to workaround a bug where requests to Rok API buckets exceed the default limits.
  • Increase csi-snapshotter’s timeout to workaround the fact that Rok cannot take concurrent snapshots of the same source volume.
  • Fix a bug in csi-snapshotter sidecar which could cause a snapshot operation to fail with VolumeSnapshotContent is missing error.
  • Fix the way Rok Gateway calculates the object stats when object groups are involved.

Version 0.13 (Marble)

(Released Mon, 01 Jul 2019)

New features

  • Introduce a thread pool in C.
  • Redirect the user to the requested page after a successful login in the Gateway UI.
  • Support deploying Rok and Rok registry on Kubernetes with helm.
  • Run RokE as an non-privileged container.
  • Add support for account roles in Fort.
  • Introduce Fort services, i.e., applications that can access privileged Fort endpoints, such as the Rok Registry.
  • Add support for Ahead-of-Time (AoT) compilation in the Rok Registry UI and Rok Gateway UI.
  • Rewrite Rok Operator in Python.
  • The embedded Controller now honors the ROK_PHYSICAL_HOST environment variable and reports it upstream.
  • Introduce Rok licensing mechanism that constrains Rok cluster-size.
  • Introduce one-time secrets in Fort, which can be used for various tasks, such as email confirmations and password resets.
  • Add support in the Fort API for confirming an account’s email.
  • Reduce poll interval for external Controller.
  • Add support in the Fort API for resetting a user’s password.
  • Add support for email confirmations to the Rok Registry.
  • Add support for resetting a user’s password in the Rok Registry API.
  • Add support for VolumeSnapshot CRs on rok-csi.
  • Do not allow multiple Fort confirmed accounts with the same email address.
  • Require users to confirm their email addresses before allowing them full access to the Fort API.
  • Use HTTP liveness and readiness probes that query a rok-probed server in the Rok Kubernetes pod.
  • Improve discard performance for dm-clone.
  • Disable by default the endpoint that unconditionally lists the swarm IDs that the Rok Tracker tracks. Also, add a setting in the Rok Tracker that allows enabling it.
  • Add reCAPTCHA protection to some Rok Registry API endpoints (email confirmations and password resets).
  • Improve dm-clone’s overall performance.
  • Add etcd-specific SSL arguments and environment variables, namely --etcd-ssl-cert, --etcd-ssl-key, --etcd-ssl-cacert and ROK_ETCD_SSL_CERT, ROK_ETCD_SSL_KEY, ROK_ETCD_SSL_CACERT respectively.
  • Add support for multiple environment variables to Rok arguments in C.
  • Add breadcrumbs to the Rok UI.
  • Make prefix for S3 bucket names for rok-s3d configurable by the cluster configuration mechanism.
  • Add script to gather logs in MiniKF.

Incompatible/important changes

  • The rok-conf tool has been renamed to rok-init.
  • The rok-config-* tools have been renamed to rok-cached-*.
  • The rok-ssh-root-password tool has been removed. Users can use the cluster.ssh.root_password_login cluster configuration variable instead.
  • The rok-dlm tool has been refactored. Users should use the rok-dlm client-break command to break both DLM locks and clients. The --break-unknown and --skip-unknown arguments have been removed in favor of the --force arguments, whose semantics have been changed for this reason. The old --force behavior is now achieved by the --yes argument.

Bug Fixes

  • Fix typo that caused rok-election-ctl member-list to fail.

Version 0.13-rc1 (Marble)

(Released Fri, 07 Jun 2019)

Bug Fixes

  • Fix dm-clone compilation error when building dm-clone for Linux kernel versions >= 4.17.0.
  • Add copyright notices to all of our .yaml/.yml files, which were erroneously omitted.
  • Fix refcount leak for LU objects.
  • Break stale rok-csi locks.
  • Fix a bug that would erroneously make successful commands appear as failed, if they produced a lot of output or if the server was under load.
  • Fix a buffer overflow bug in libdlm.
  • Improve the algorithm for detecting stale PID files.
  • Fix a bug where a DLM client could be deleted despite holding locks.
  • Use DT_RPATH instead DT_RUNPATH to work-around Debian bug #859732.
  • Fix a bug where rok-csi would not terminate when initialization failed.
  • Fixed a bug where tasks could fail to cleanup their temporary state after a restart of the Rok Gateway Task Daemon.
  • Fix various memory errors discovered by AddressSanitizer.
  • Fix an issue where the Rok thrower was not accessible from the NodePort service.
  • Fix the alignment of empty list messages in the Rok UI.

Version 0.12 (Lignite)

(Released Mon, 18 Mar 2019)

New features

  • Update the default values of some performance-related thrower options, in order to make the thrower faster by default.
  • Support providing an existing CA certificate when creating a RokE cluster, rather than always creating a self-signed internal CA.
  • Support deploying a RokE cluster which uses an external etcd.
  • Port rok-csi to CSI spec v0.3.0
  • Add support for Kubernetes >= v1.10, and remove the need for custom kubelet.
  • Extend the Rok Python etcd v3 client with the ability to emulate the etcd v2 API, to allow the easy transition of Rok components to etcd v3.
  • Support transactions in the Rok Python etcd v3 client.
  • Allow users to add a new provider from the Rok UI.
  • Add a new page in the Rok UI to display task details.
  • Introduce settings section in the Gateway UI wich contains available providers and user tokens.
  • Introduce a writeback cache for map updates in rok-composerd.
  • Support running hooks when a RokE member is promoted to master.
  • Allow rok-gc to run in parallel on all nodes when running in nodelocal mode.
  • Make RokE appliances create a cluster-wide SSH keypair for the root user.
  • Support setting the root password in RokE appliances by providing the hash of the password in the preseed file.
  • Make RokE disk setup script specify mounts instead of filesystem labels.
  • [EXPERIMENTAL] Introduce hierarchical maps. Only basic operations and garbage collection are currently implemented.
  • Display a footer in the Gateway UI with copyright information and Rok’s build ID.
  • Allow parsing URLs of Rok Gateway object versions via the Rok Gateway API and client.
  • Support gevent in the Rok Python etcd3 client.
  • Strongly associate Rok Gateway buckets with Indexers. Previously, only the Indexer link was stored in the bucket.
  • Introduce a management command in the Rok Gateway to allow the garbage collection of old tasks.

Bug Fixes

  • Fix various memory leaks throughout the code base.
  • Fix syncing of group versions with same members.
  • Fix bulk deletion of Gateway versions.

Version 0.12-rc1 (Lignite)

(Released Thu, 01 Nov 2018)

Bug Fixes

  • Fix a bug where secret service parameters were ignored when providing suggestions for a policy update.
  • Fix handling of nodelocal mode in rok-composer-tool.
  • Support unregistering an Indexer from the Rok Gateway while the thrower is running.
  • Fix a bug in the Rok Gateway where secret service parameters were ignored when providing suggestions for a policy update.
  • Fix a bug in Gateway policies where policy run tasks would display outdated parameters if the policy was updated.
  • Correctly display the filters, register name, retention rules and backup action parameters of policy tasks in the Gateway client.
  • Fix a bug in the Rok Gateway where suggestions could not be provided for variables in the info namespace, e.g., object and version name, if the service driver defined a parameter with the same name.
  • Fix a bug where the Rok Gateway task daemon could leave behind stale tasks in the pending or running state if the connection to the Gateway store was lost during the task’s execution.
  • Fix a bug in the Rok Gateway where the subtasks of failed or interrupted tasks would not be canceled even though the parent task had failed.
  • Fix a bug where security sensitive fields could accidentally be logged by the Rok Gateway or by Rok Gateway service drivers.
  • Fix a bug where the thrower did not update the error reason of a bucket, when a new error, with the same error code as the previous one but a different reason, was encountered.
  • Make the Rok S3 daemon compatible with the IBM S3 service, by introducing the –assume-no-versioning command line option.
  • Fix a bug in the S3 daemon where if a protocol was provided as part of the S3 endpoint it would be included twice in API calls towards the S3 service.
  • Fix a bug where the Rok S3 daemon would not exit if its S3 endpoint and credentials were incorrect, causing the Rok composer to fail when attempting to create OSD partitions.
  • Fix a bug where LIOd manager failed list_volumes() request if a loop device disappeared, because someone else un-looped the device.

Version 0.11.1 (Kryptonite)

(Released Fri, 19 Oct 2018)

Bug Fixes

  • Install proper kubernetes version in Appliances.
  • Fix templating error in Gateway configuration file in Appliances.
  • Minor fixes regarding management tools for Rok clusters on AWS.

Version 0.11 (Kryptonite)

(Released Mon, 15 Oct 2018)

New features

  • Rework tcmu-handler to process requests in parallel.
  • Support thick volume pools in lvmd.
  • Add support for customizing the appliance disk setup procedure using preseed file.
  • Use internal jessie-backports repo for all non jessie dependencies.
  • Support daemon reload.
  • Support mail notifications in appliances.
  • Use systemctl for managing Rok daemons.
  • Improve the efficiency of subtasks.
  • Support tasks with service-defined actions.
  • Support nested subtasks in the Gateway API and UI.
  • Revamp the Kubernetes Gateway driver to use subtasks.
  • Revamp task, event and policy icons in the Gateway UI.
  • Support subscribing existing bucket in the Gateway UI.
  • Invite or remove bucket collaborators in the Indexer UI.
  • Update collaborator’s permissions in the Indexer UI.
  • Display inactive Rok’s in the Indexer UI.
  • Implement new design for the OAuth page in the Indexer UI.
  • Introduce a Python Rok client for etcd v3.
  • Revamp the format of version retention policy rules.
  • Allow version retention policies to delete current object versions.
  • Display version retention information for all Gateway objects.
  • Support creating object groups via the Gateway client.
  • Add rok-cluster-aws script to manage a Rok cluster on AWS.
  • Allow tuning rok-gc with cluster configuration.
  • Discard zero CA chocks from fisks.
  • Rename rok-clusterd to rok-electiond.
  • Improve the execution logic of external commands.
  • Add support Subject Alternative Names (SAN) in X.509 certificates.
  • Add support for templated cluster configuration variables.
  • Support per-host cluster configuration variables.
  • Make rok-conf use the cluster configuration mechanism to configure the appliance.
  • Secure all etcd communications using SSL in appliances.
  • Secure the cluster join procedure using authentication.
  • Add widgets for multiline input.
  • Ensure users can SSH in appliances early to continue the appliance configuration with copy/paste ability.
  • Introduce the rok-cluster tool to manage the Rok cluster.
  • Introduce various config variables that help tweaking Rok clusters.

Bug Fixes

  • Fix various configuration rendering issues in appliances.
  • Fix backwards compatibility issue with the OpenStore controller.
  • Fix a bug that prevented the thrower from recovering a bucket from an error.
  • Fix a thrower issue, where it would not show the connected peers if the tracker was down.
  • Fix a thrower bug that randomly prevented two peers from connecting, if they were registered with two or more Indexers with the same common name for their CA.
  • Fix cluster GC to work with SSL.
  • Make etcd clients handle transient etcd errors (SSL connect).

Version 0.11-rc1 (Kryptonite)

(Released Tue, 25 Sep 2018)

Bug Fixes

  • Fix dm-clone bug wrt overwrite BIOs, which could lead to data corruption.
  • Include version hash in all JS and CSS artifacts into fix unwanted cache effects.
  • Fix false requests cancellation if responses are slower than the polling interval in the Gateway UI.
  • Fix the retry logic of the Gateway client and prevent timeouts in object uploads.
  • Fix a potential security issue that involved mutable values in Python function definitions.

Incompatible/important changes

  • The format of Rok-specific SCSI management commands has changed, so PUs of this version cannot connect with PUs of previous versions.
  • The format of OpenStore Controller endpoints has changed (see upgrade notes).

Version 0.10.3 (Jade)

(Released Tue, 17 Jul 2018)

New features

  • Adjust tracker announcement frequency in the thrower
  • Visually distinguish inactive Roks in the Indexer UI
  • Make rok-csi use heartbeat for its DLM locks
  • Track progress of volume/snapshot creation in lvmd
  • Report progress in Kubernetes Gateway driver
  • Improve performance of Copy-on-write on rok-composerd

Bug Fixes

  • Make embedded controller (libctrl) retry failed connections
  • Fix rok-composerd to correctly handle namespace aliasing
  • Fix a bug in the task daemon where policy tasks would sometimes not start at their scheduled timestamp
  • Fix a GC bug that could result in data corruption
  • Retry operations when S3 throws unknown errors
  • Fix logrotate script that did not rotate all Rok daemons
  • Fix a thrower bug that would result in rapidly opening/closing HTTP connections to the tracker
  • Fix a thrower issue with regards to unfair job scheduling
  • Fix a thrower bug where important PPSPP messages would get reordered, leading to connection issues
  • Fix a bug where retention policies would not work as expected for versions created by the Rok Thrower
  • Fix a number of UI problems in smaller screens
  • Accelerate stats calculation for versions
  • Handle transient Indexer related errors and update bucket state accordingly
  • Fix CEF shutdown when running workstation with VBox
  • Fix VMware Fusion port forwarding in workstation
  • Display bucket errors properly in UI

Version 0.10.2 (Jade)

(Released Fri, 04 May 2018)

New features

  • Handle user provided CA certificates, SSH keys, APT keys in Appliances
  • Use sparse disks for RokW helper VM
  • Enable SSH on RokW

Version 0.10.1 (Jade)

(Released Thu, 19 Apr 2018)

New features

  • Support suggestions in the Rok Gateway client
  • Add dm-messages to disable/enable hydration in dm-clone
  • Support creation of dm-clone devices with hydration disabled
  • Support RokE deployment on Oracle Cloud Infrastructure with automatic creation/expansion/upgrade of Rok cluster
  • Introduce port-forwarding for RokW helper VM to enable thrower communication between Rok Workstations
  • Use DKMS for our kernel modules
  • Create rok-csi docker image using Debian packages

Bug Fixes

  • Fix a race condition in Rok Gateway that could cause a policy task to be canceled just after it starts execution
  • Fix bug in lvmd’s snapshot workflow that could cause the creation of corrupted volume snapshots
  • Fix GC in RokE clusters

Version 0.10 (Jade)

(Released Tue, 27 Mar 2018)

New features

  • Introduce the Rok S3 daemon
  • Support named OSD partitions
  • Support storing metadata in persistent storage for dm-clone
  • Support Rok snapshots in lvmd
  • Support handling concurrent requests in lvmd
  • Use Gunicorn as lvmd’s HTTP server
  • Support ACL in Indexer buckets
  • Implement CSI plugin for Rok
  • Implement Rok GW driver to register/present Kubernetes StatefulSets
  • Introduce clustered config in Rok Appliances
  • Support Rok Workstation on Mac OS
  • Use CEF browser for Gateway UI in Rok Workstation
  • Introduce IFC library to support non-strict consistency object stores (S3)
  • Support discard requests in dm-clone
  • Support object groups in the Gateway
  • Discard unallocated and zero chocks when creating a volume from a fisk in lvmd
  • Add “Local Filesystem + Amazon S3” Rok storage backend in RokE appliances
  • Support making RokE appliances part of an existing Kubernetes cluster

Known Issues

  • Master failover in a RokE cluster is not working when running in node-local mode or when RokE is deployed on Amazon Web Services (AWS).
  • The CSI Plugin for Rok (rok-csi) will fail to unpublish a volume that is being snapshotted at the same time, due to a bug in Kubernetes error handling.

Incompatible/important changes

  • The LVMd is not upgradable from version 0.9 without deleting its database, since the etcd directory format has changed from the previous version.

Version 0.10-rc1 (Jade)

(Released Tue, 27 Feb 2018)

Bug Fixes

  • Fix hashing in nodelocal mode

Version 0.9 (Iron)

(Released Wed, 14 Feb 2018)

New features

  • Implement resizing fisks
  • Support regular expressions for zoning on auto-exported LUs
  • Implement Docker LVM Volume Plugin with support for changed block tracking
  • Implement Rok GW driver to support smart backups of Docker Volumes
  • Implement dm-clone, a Linux kernel module for live cloning of block devices
  • Support maintenance mode in Rok appliances
  • Use VASA events to discover new PEs

Known Issues

  • The effect of resizing a fisk that is used by a VM will become visible only after restarting the VM.
  • The VASA OpenStore policy does not support setups where the same ProtocolEndpoint is being exported by more than one PUs.
  • If local affinity is not enforced in controller policies and an unclean shutdown of an Appliance takes place then stale locks should be manually broken otherwise the I/O of the VMs that get re-connected will be frozen.

Version 0.9-rc3 (Iron)

(Released Mon, 29 Jan 2018)

Bug Fixes

  • Fix Rok Workstation issues with Windows Samba caching
  • Cleanup SSH keys from appliance images and generate new during initial config
  • Stop using dummy Django secrets
  • Prevent master failover ping-pongs in case the new elected master fails to setup everything properly
  • Fix sorting issues in UI

Version 0.9-rc2 (Iron)

(Released Thu, 11 Jan 2018)

Bug Fixes

  • Fix RokW issue with VMDK snapshot chain in VMware Fusion
  • Fix compatibility issue with Django 1.7
  • Fix controller compatibility with PUs of older versions
  • Enable static policy for the Root Controller in RokE appliances
  • Fix RokE appliance to export multiple protocol endpoints
  • Verify that all protocol endpoints are visible per ESXi host when connecting Rok
  • Allow SSH password authentication for root in RokE appliances
  • Fix locking when xstat()ing a map
  • Allow the user to specify the physical host on which the RokE appliance is running

Version 0.9-rc1 (Iron)

(Released Wed, 20 Dec 2017)

Bug Fixes

  • Restarting the Composer no longer needs a rescan in the ESXi host, and will not cause running VMs on the Rok VASA datastore to become invalid.

Version 0.8.1 (Hematite)

(Released Fri, 01 Dec 2017)

Bug Fixes

  • Fix upgrade notes to run the required database migrations
  • Fix controller issues when listening to 0.0.0.0
  • Fix StaticPolicy to work with PUs running on older Rok versions

Version 0.8 (Hematite)

(Released Thu, 30 Nov 2017)

New features

  • Add scheduling to Gateway policies
  • Implement version retention policies in the Gateway
  • Enable Rok Controller to work in a clustered environment
  • Add support for more than one Protocol Endpoints to the Rok VASA provider
  • Implement Virtual Machine and Virtual Disk registration on Rok Workstation
  • Support Gateway migrations
  • Handle temporary etcd failures in Rok base
  • Add support for appliance clustering with Rok Enterprise

Bug Fixes

  • Address Samba client caching in Rok Workstation
  • Use proper image path in Rok Workstation on Linux
  • Add missing desktop entry on Linux
  • Fix VirtualBox thread handling in Rok Workstation
  • Make scheduling resilient to restarts in Gateway
  • Handle early timeouts in policyd

Version 0.8-rc1 (Hematite)

(Released Mon, 27 Nov 2017)

Bug Fixes

  • Fix VirtualBox DNS bug on Windows
  • Fix liod crashing on signal reception (during logrotate)

Version 0.7.2 (Granite)

(Released Mon, 27 Nov 2017)

Bug Fixes

  • Fix a bug in Thrower stats
  • Fix Gateway client password authentication against Fort
  • Make Gateway management commands compatible with Django 1.7
  • Fix registration policy of Synnefo machines

Version 0.7.1 (Granite)

(Released Mon, 13 Nov 2017)

Bug Fixes

  • Fix librok_trpt to handle real-time signals
  • Fix description of intermediate snapshots in Synnefo Gateway driver
  • Fix librok_sg to retry only transient request failures

Version 0.7 (Granite)

(Released Tue, 07 Nov 2017)

New features

  • Encrypt password hashes with secure key in Fort
  • Support service tasks in Gateway Client
  • Report login errors in UI
  • Fix handling of chunked responses in Gateway Client
  • Prevent unmounting the filesystem used by Filed
  • Support Fort authentication in Gateway Client
  • Support Keystone v2.0 authentication in Gateway Client
  • Automatically refresh expiring tokens in Gateway Client
  • Use PostgreSQL instead of SQLite as the proposed database
  • Support mounting RokFS via a systemd mount unit
  • Introduce subtasks for service drivers
  • Run policies as tasks
  • Support presenting VMDK in RokW
  • Integrate UI with Fort
  • Introduce management tool for VASA
  • Update pagination logic in Django Apps
  • Switch to Angular 4.3.6 in UI
  • Introduce management tool for Gateway
  • Introduce command to verify state of maps
  • Split docs into public and internal
  • Move common UI to separate/reusable Angular module
  • Support GC when Ceph is full
  • Add tool to integrate Rok platform with vSphere
  • Generate MSI for RokW using WiX toolset
  • Use multiple/separate disks in Rok Appliances
  • Add support for overlayfs in Appliances
  • Add initramfs scripts to grow partitions in Appliances
  • Introduce migrations for RokVP
  • Integrate Fort with LDAP
  • Support fast recovery from a closed connection in Controller
  • Support per-user auto-configuration during first run in RokW
  • Configure Rok VASA Provider with Rok Enterprise
  • Disallow passwords shorter than 8 characters in Fort
  • Create and configure Rok Indexer appliance
  • Use tasks instead of synchronous service events in Gateway
  • Report peer connection status in stats
  • Extend Virtualbox driver to register VMs and register/present VMDKs
  • Hash the stored token in the Indexer
  • Obtain the Rok VASA Provider Storage Container UUID automatically
  • Implement a virtual scroll component to allow dynamic rendering of list elements
  • Limit outstanding requests per Thrower peer to one
  • Add application level keepalives in Thrower
  • Add sorting arrows on table headers in UI
  • Implement dummy Task Management on iSCSI
  • Extend OpenStack Service Driver to be able to register Nova instances
  • Various user-visible improvements in Rok Appliances
  • Split etcd cluster in Rok Appliances
  • Enhance appliances with configuration management tools
  • OpenStack support works with multi-storage backends

Known Issues

  • VASA Provider
    • Upgrading from version 0.6 to 0.7 is supported using the provided migration script, but with the following limitations:
      • Any Virtual Machines that have one or more Virtual Disks located in the Rok VVol datastore and these Virtual Disk VMDK files do not reside inside the VM folder (along with the .vmx file), will fail to power on after the upgrade. The administrator should manually edit the above problematic VM files and replace the old Storage Container UUID with the new one.

Version 0.7-rc2

(Released Fri, 27 Oct 2017)

Bug Fixes

  • Fix Thrower handling of forgotten chunks
  • Fix Fort DB migration to handle existing user passwords
  • Fix Controller to continue applying policies even after one fails
  • Fix image handling in rok-data MSI upon upgrades
  • OpenStack support does not require a patched version of nova any more

Version 0.7-rc1 (Granite)

(Released Thu, 05 Oct 2017)

Bug Fixes

  • Suggest only the accessible datastores in VMware driver
  • Fix concurrent presentations in VMware driver
  • Don’t use smart copy if VM is on Rok when registering a VM in VMware driver
  • Correctly handle timed out requests in Thrower
  • Do not require passwords for any ESXi hosts in VMware VM driver

Version 0.6.2 (Flint)

(Released Thu, 05 Oct 2017)

Bug Fixes

  • Fix a race in Thrower that resulted in high memory consumption

Version 0.6.1 (Flint)

(Released Wed, 20 Sep 2017)

Bug Fixes

  • Fix concurrency issues of Composer

Version 0.6 (Flint)

(Released Mon, 04 Sep 2017)

New features

  • Introduce Fort user system, make Indexer depend on it and integrate Gateway with it
  • Allow administrators to limit the number of parallel compositions in Rok Thrower
  • Add support for multiple trackers and automatic discovery in Rok Thrower
  • Show more peer info in the Gateway UI stats (name, country, connection status)
  • Download chunks only once in Rok Thrower
  • Reduce the memory usage of stream swarms in Rok Thrower
  • Add support for LZ4 compression in Rok Thrower
  • Implement signature verification of bucket updates in Rok Thrower
  • Handle garbage collection while Rok Thrower runs
  • Enhance Gateway client to support service actions
  • Introduce Rok FUSE
  • Improve SSL certificate handling in VASA provider
  • Split Gateway API and backend configuration
  • Support bucket/object names including Unicode characters
  • Support booting instances on AWS
  • Introduce Rok Workstation
  • Support booting instances on Virtualbox/VMware Workstation
  • Support TLS/SSL everywhere
  • Support etcd authentication
  • Introduce OpenStore VASA policy
  • Make Rok transport more robust
  • Rok VASA Provider spawns its own nginx instance
  • Introduce map version v2 and support map migrations
  • Add support for breaking DLM locks automatically
  • Support logrotate in Python daemon, C daemons and gunicorn Apps
  • Support accurate stats in the Gateway [via new “Stats” daemon]
  • Support asynchronous tasks in the Gateway [via new “Task” daemon]
  • Add suggestions in Gateway drivers
  • Introduce suggested buckets in the Gateway
  • Sign and verify swarms in Rok Thrower
  • Support sessions in Rok VASA Provider
  • Support on-demand auto-exported LUs and initiator zoning
  • Introduce OpenStore iSCSI policy
  • Ship Tracker within Indexer
  • Use a more user friendly error handling in UI
  • Let API serve UI settings
  • Implement sign-up in Indexer
  • Use deterministic PU names
  • Improve performance of policy runs
  • Extend the Gateway VMware drivers to register/present both VMDKs and VMs
  • Implement bucket deletion in Indexer UI
  • Use atomic requests in Fort and Indexer
  • Add various management commands in Fort
  • Implement password update in Fort
  • Show Rok installations in Indexer UI
  • Make OS suggestions case insensitive in Gateway Services

Known issues

  • Composer
    • The composer has a number of concurrency issues which may cause data loss when a fisk is being accessed by more than one composerd instances. This will most likely occur during VM live migrations.
  • Gateway
    • client: The Gateway client does not support asynchronous presentations and registrations via tasks
    • VMware: Cannot take a quiesced snapshot using VSS of a Windows VM that is backed by a disk on the Rok VVol datastore
    • User creation is not supported in Rok Gateway
    • When user logs out, some requests might be sent after the deletion of the token resulting to error popups in the UI
  • Thrower
    • Failures due to composer errors or GC or not tested
    • When the thrower restarts, it may not create a swarm for a bucket if it can’t connect to the Indexer
    • Port checking is done every second, which adds a significant load to the tracker
  • Platforms
    • VMware: The Rok VASA provider does not report metrics properly, including Storage Container size and free space
    • VMware: A VM snapshot operation may fail if the Rok backend fails to complete the fisk snapshot operation in time
    • VMware: The Rok VASA Provider does not support SPBM (Storage Policy Based Management) and its derivatives, such as Storage Profiles, Capabilities and compliance checks
    • VMware: The Rok VASA Provider supports only one Protocol Endpoint per Storage Container
    • OpenStack: A patched version of Nova and libvirt is required
  • Controller
    • The iSCSI policy does not have a state of its own, hence fisks exported over iSCSI will not persist over Composer restarts
  • VASA Provider
    • The RokVP is not upgradable from version 0.5 without deleting its database, since the etcd directory format has changed from the previous version.

Version 0.6-rc4 (Flint)

(Released Wed, 09 Aug 2017)

Bug Fixes

  • Handle IO transport failures in SG
  • Re-discover LUs on a re-established nexus
  • Garbage collect stale ports in Controller’s context
  • Fix a use-after-free in iSCSI when nexus closes before CCB completes
  • Fix tasks polling in UI
  • Kill Thrower greenlets in a pool properly
  • Fix race in rokfs service unit
  • Filter expired OAuth tokens in Gateway

Version 0.6-rc3 (Flint)

(Released Tue, 18 Jul 2017)

Bug Fixes

  • No need to define ESXi initiators in Rok VASA Provider config
  • Avoid executing tasks more than once in Gateway
  • Handle errors on configuration parsing in RokW
  • Report failed logins in Gateway UI
  • Various fixes in VMware services
  • Correctly negotiate metadata need info between throwers
  • Do not fail when the thrower composes a version and the underlying fisk exists

Version 0.6-rc2 (Flint)

(Released Mon, 10 Jul 2017)

Bug Fixes

  • Fix bug for Rok folder path on Windows in VMware Workstation service driver
  • Fix typos in AWS service driver
  • Various improvements in Rok VASA Provider (configuration files, snapshot handling)
  • Various fixes in services configuration files
  • Fix Rok GC error handling
  • Handle old buckets with no tasks properly
  • Don’t validate token scope’s ACL in Indexer’s UI

Version 0.6-rc1 (Flint)

(Released Tue, 04 Jul 2017)

Bug Fixes

  • Handle etcd Raft errors properly
  • Close old nexus when a new one has been created in Rok Transport
  • Fix memory leak when joining nexus thread in Rok Transport
  • Export LUs only on target nexuses
  • Create and write OSD objects in filed atomically
  • Fix epochs of COWed chocks
  • Fix a memory leak error by freeing sg_bidi in liod
  • Fix memory leak on ioctl() in Rok SG
  • Make OpenStack Gateway driver thread safe

Version 0.5 (Emerald)

(Released Wed, 22 Feb 2017)

New features

  • Implement iSCSI transport for Rok
  • Use LUN addressing methods defined in SAM5
  • Support Logical Unit Conglomerate structures
  • Extend OpenStore to export/unexport LUs
  • Pass VASA 2.0 certification tests for Rok VASA provider (nonVVol)
  • Add support for new FUSE-based filesystem (RokFS)
  • Implement username/password authentication method for the Indexer
  • Implement an OAuth flow for publishing buckets to the Indexer
  • Make Garbage Collector multithreaded
  • Add Synnefo Plankton Driver for Rok
  • Add Gateway Service Driver for Synnefo Volumes
  • Implement command-line client for the Rok Gateway
  • Implement new designs in services forms of Gateway
  • Implement OAuth 2.0
  • Update Angular to version 2.1.0
  • Ship UI’s static files under static folder

Version 0.4.5 (Diamond)

(Released Thu, 19 Jan 2017)

Bugfixes

  • Use access=userspace when creating Ganeti disks in the Gateway driver
  • Check if the Ganeti instance already exists before creating it
  • Fix single registration message in the UI
  • Fix critical bug in libtrpt

Version 0.4.4 (Diamond)

(Released Mon, 09 Jan 2017)

New features

  • Support presenting Gateway versions as disks of existing Ganeti instances
  • Introduce driver that creates Ganeti instances with a Gateway version as boot disk
  • Add support for remote initiator PUs in controller
  • Support different Identity API versions in our Django apps

Version 0.4.3 (Diamond)

(Released Fri, 16 Dec 2016)

Bugfixes

  • Remove unused dependency on django middleware

Version 0.4.2 (Diamond)

(Released Fri, 16 Dec 2016)

New features

  • Update Gateway logo
  • Implement cookie authentication method

Version 0.4.1 (Diamond)

(Released Tue, 10 Oct 2016)

Bugfixes

  • Fix urllib3 compatibility issue in the controller

Version 0.4 (Diamond)

(Released Tue, 25 Oct 2016)

Incompatible/important changes

  • Composer
    • The map format has changed from previous versions. You must recreate all fisks (clones, snapshots).
  • Thrower
    • This version of the thrower cannot connect to throwers of previous versions, due to changes in the on-the-write protocol.

New features

  • Rok GW
    • Policies: The GW supports user-defined policies.
    • Policies: The policy execution framework is now multithreaded, and can support registration/presentation of multiple objects concurrently.
    • Thrower: The GW now exports thrower stats via the API and the UI.
    • Drivers: Rok now comes with VMware, OpenStack, Ganeti, Synnefo drivers out of the box.
  • Composer
    • The composer can now garbage-collect live (UUID) maps.
  • Platforms
    • VMware: Rok now includes a VASA provider, and can expose VVols directly to VMware ESXi over iSCSI.
  • Bindings
    • Python: PU bindings for Python are now green and can now be used in the context of a greenlet, without blocking the whole process.

New dependencies

There are too many dependencies to mention individually. Please see the dependencies of individual Debian packages and the contents of setup.py files throughout the Rok repository for the initial list of dependencies.

We will record individual changes in the set of dependencies in forthcoming versions.

Known issues

  • Platforms
    • VMware: The semantics of the current implementation of PrepareToSnapshot / Snapshot deviate from the VASA 2.0 specification and will be fixed in the next version.
    • VMware: The VASA provider does not return proper SOAP Faults on error conditions.
    • VMware: The VASA provider does not report metrics properly, including Container size, and free space.
  • Composer
    • The composer may leak memory after long periods of time, potentially due to allocating but not releasing memory in the case of missing fisks during map requests.
  • Thrower
    • There is currently no way to enforce a global limit on the memory usage or TCP connections of thrower.
  • Libraries
    • libpu: On the target side, libpu-managed objects are not garbage-collected when the corresponding nexuses disappear.
    • libctrl: The embedded controller has a use-after-free bug for nexuses returned by trpt_port_getnexuses().
    • libmpath: Mpath will erroneously close a nexus when failing to submit a request to it because it was full (returned EAGAIN). It should check if its queue is full, and not consider it for submission instead.
    • libmpath: libmpath will not release a nexus from an mpath if this nexus has pending I/O, even when this I/O is not related with said mpath.
  • Bindings
    • Python: Have the Python bindings return finer-grained errors as subclasses of PUError exceptions. Currently, the GW assumes specific type of errors when a generic PUError exception is thrown.

Version 0.3 (Celestite)

Version 0.2-rc1 (Beryl)

Version 0.2 (Beryl)

Version 0.1 (Amethyst)

Version 0.1-rc1 (Amethyst)