Changelog

This file describes code and packaging changes for all Rok releases starting with Rok 0.15. It is mostly of interest to packagers, administrators, and developers.

Version 2.0.2 (Aurora)

(Released Fri, 31 Mar 2023)

New Features

  • Add upgrade doc for the Cluster Autoscaler.
  • Add the Rok Scheduler and the Rok Scheduler Webhook to the Rok Kustomize inflator.
  • Improve performance of rok-cluster member-list operation.
  • Rewrite rok-cronic in Python, improving logging.
  • Extend Rok Operator and RokCluster CRD to allow CSI to work on Kubernetes clusters with custom kubelet root directory.
  • Mount the whole kubelet root directory of the host to the Rok CSI Node container for compatibility with Kubernetes 1.24+.
  • Upgrade kubectl in rok-tools to v1.23.17.
  • Do not explicitly enable Istio sidecar injection for Rok, Rok CSI workloads via Pod annotations.
  • Add command line argument to Rok CSI Controller to explicitly enable Istio sidecar injection for Rok Access Servers.
  • Update the integration guide for Okta.
  • Introduce a table that contains all the components of Enterprise Kubeflow and their version.
  • Detect and handle cases where a Rok Pod has a stale role=master label.
  • Add support for custom meta tags using server side includes (SSI).
  • Add sphinx-sitemap extension for sitemap generation in docs.
  • Reduce the number of Rok Operator’s outgoing network connections to handle hundreds of Rok API CRs more efficiently.

Bug Fixes

  • Fix Rok Registry Pod occasionally failing to become master after initial deployment.
  • Ensure GC runs successfully in clusters with hundreds of nodes.
  • Amend AuthService OIDC scopes to include offline_access scope when using Okta as an OIDC provider.
  • Fix an issue where the task-gc management command would fail if a task was to be deleted by multiple rules.
  • Workaround AWS bug in the verification of the iam-alb task.
  • Fix an issue where the rok-gw-manage task-gc management command could fail when the etcd server was under heavy load.
  • Fix installing kernel-specific build dependencies for Ubuntu kernels when building the rok-kmod Docker image.
  • Fix Rok Disk Manager to use HostToContainer mount propagation for mounting the host’s root directory.
  • Add the --retries -1 flag to kubectl cp, to fix failures when sending gathered logs.
  • Fix misinterpreting signatures on block devices when creating LVM PVs with rok-disk-manager.
  • Fix the frontend rendering and backend parsing of very large KFP runs.
  • Update the Scale In EKS Cluster guide with instructions to delete the Rok master Pod if it runs on the node to be scaled in.
  • Fix invalid EnvoyFilter for authenticating Serving related requests.
  • Fix a bug where the embedded OpenStore Controller would constantly retry to connect to a remote OpenStore Controller when using Istio.
  • Use Istio TPROXY mode for Rok etcd and Redis to improve scaling performance.
  • Fix Rok Registry RBAC issue that caused readiness/liveness probes to fail.
  • Automatically restart RokE Pods that are holding stale locks, to clean up their locks.
  • Fix rok-gc leaving stale DLM locks when failing to update current epoch.
  • Increase CPU requests for istiod to avoid starvation under heavy load.
  • Remove unnecessary probing of the etcd endpoint from inside Rok pods.
  • Fix Rok CSI Controller to not submit jobs on unready nodes, as a workaround to prevent the controller from hanging when a node is unready and multiple jobs are directed to that node.
  • Fix a leak of PULU objects in LIOd.
  • Fix a gevent issue that caused the Rok API to increase its CPU utilization over time.
  • Update the rok-do Debian Buster base image to use 20230328 snapshot APT repositories.
  • Fix libmap and hasherd to use callbacks instead of timeouts in etcd watches, which stops etcd connections from closing every one second.
  • Prevent Rok Operator from attempting unneeded status updates in Rok API CRs.
  • Reduce the number of Rok Operator outgoing connections to prevent its readiness probe server from failing.

Version 2.0.1 (Aurora)

(Released Mon, 19 Dec 2022)

New Features

  • Support ignoring kustomizations when listing or patching images with rok-image-* tools.
  • Support deploying Falco and Falco Exporter to detect threats or abusive behavior in Kubernetes.
  • Support deploying Promtail to forward application and system logs to a remote logs backend.
  • Deploy Prometheus agent in MiniKF to forward metrics to a remote backend.
  • Support deploying gtoken for automatic GCP token issuing.
  • Upgrade Minikube and Kubernetes in MiniKF to versions 1.27.1 and 1.22.15 respectively.
  • Deploy telemetry stack in MiniKF when in the context of KFaaS.
  • Add a ‘subscription’ field to KFaaS user object necessary for integrating KFaaS with Stripe.
  • Expose API in KFaaS for handling incoming events from Stripe.
  • Remove deleted deployments from total count of deployments for a KFaaS user.
  • Allow machine-to-machine communication in the KFaaS app.
  • Introduce guides and DeployTasks for the EBS CSI driver to add support for Kubernetes 1.23.
  • Introduce Google Tag Manager in KFaaS frontend.
  • Modify UX for free trial extension in KFaaS frontend.
  • Implement a new design for the pricing page in KFaaS frontend.
  • Amend KFaaS emails to make sense to KFaaS 1.1.
  • Expose setting to allow disabling new deployments for all users in KFaaS.
  • Upgrade Kyverno to v1.8.1.
  • Improve API to expire a KFaaS user’s subscription.
  • Add CLI tool to manage KFaaS.
  • Add rok-do task to create a MiniKF image for KFaaS with all pre-pulled Docker images included.
  • Use Arrikto specific branding in the KF CentralDashboard
  • Support enabling Azure Monitor to collect logs and metrics from an AKS cluster.
  • Use a 470.x Nvidia Driver in MiniKF to support K80 GPUs.
  • Update Minikf GPU tests with K80-specific instructions.
  • Update host’s mdadm config to ignore RAID arrays managed by Rok Disk Manager.
  • Add placeholders for Kubernetes versions in our docs.
  • Add instructions about installing the EBS CSI Driver to the Rok Upgrade guide.
  • Selectively enable users for product subscriptions.
  • Modify Payments to support products with a recurring price and return subscription information for customers.
  • Refactor handling of subscription changes in the KFaaS backend needed for the integration with Payments.
  • Support AMI releases 20221104, 20221112 or newer [kernel version 5.4.219-126.411.amzn2.x86_64 or newer] for node groups on EKS.
  • Support Ubuntu focal kernel 5.4.0-1081-gke for GKE.
  • Support Kubernetes 1.23 on AWS and GKE.
  • Implement KFaaS Authorizer.
  • Extend Rok Operator with CLI options to control reconciliation intervals of Rok GW API resources.
  • Update Rok Scheduler image and manifests for Kubernetes 1.23 clusters.
  • Update Cluster Autoscaler image for Kubernetes 1.23 clusters.
  • Improve our Upgrade docs for EKF 2.0.X.

Bug Fixes

  • Fix PVC garbage collection for distributed training jobs created by Kale.
  • Load the Kubernetes config in the snapshotcontroller task of rok-deploy.
  • Fix race in Rok Operator where it failed to reconcile resources after a successful cluster initialization.
  • Taint AKS system node pool upon creation.
  • Revert the “Start here” button behavior for new users in KFaaS frontend.
  • Fix Istio configuration to allow syncing data between Rok clusters.
  • Fix Tracker to return a proper 204 when checking if thrower port is open.
  • Update the rok-do Debian Buster base image to use 20221107 snapshot APT repositories.
  • Cleanup system logs from MiniKF build phase when packaging it.
  • Include the AuthService theme URL to SKIP_AUTH_URLS only once in MiniKF manifests.
  • Monkey patch the Python Kubernetes client to fix a bug and improve the token refresh logic.
  • Fix Rok CSI Node GC to properly handle snapshots of unpinned volumes.
  • Garbage collect stale snapshots that took place on nodes that no longer exist.
  • Fix compatibility with Azure Storage Accounts that have the Hierarchical Namespaces feature enabled.
  • Pin npm version in CentralDashboard image
  • Fix regression in the configfs path we bind mount into the csi-node Pods.
  • Fix an issue where the Rok API Task daemon would stop executing tasks if a task was deleted while it was initializing.
  • Fix running logrotate hourly in RokE and Rok Registry Docker images.
  • Fix an issue where Rok gunicorn applications would not rotate their logs properly in response to a SIGUSR1 signal.
  • Fix rok-kserve-migrate to wait for better check for whether the migrated inference services have become ready.
  • Accelerate the computation of retention policies and bucket stats for groups with large numbers of versions in the Rok API.
  • Update the rok-do Debian Buster base image to use 20221206 snapshot APT repositories.
  • Forcefully apply the manifests of the EKF monitoring stack so that upgrades to 2.0.X remain seamless.
  • Fix rok-do to install missing commands in Rok Access Server image.
  • Do not deploy the Snapshot Controller on AKS clusters, since it is already predeployed.

Version 2.0 (Aurora)

(Released Mon, 24 Oct 2022)

New Features

  • Introduce “News” section for release 2.0.
  • Support Golang-based images in rok-do.
  • Introduce SubtaskSummaryMixin in rok-do to calculate the summaries of a task’s subtasks.
  • Build Kiwi Device Plugin using rok-do.
  • Build Kiwi Webhook using rok-do.
  • Build all Kiwi images as part of the umbrella Rok Docker image rok-do task.
  • Extend release process to also build and publish Kiwi images.
  • Use centos:stream8 as base image for rok-access-server.
  • Support kernel version 5.4.214-120.368.amzn2.x86_64.
  • Upgrade Istio to 1.14.3.
  • Introduce admin section in kfaas frontend.

Bug Fixes

  • Use predictable URLs for download files in our docs.
  • Use correct container names when Rok Operator retrieves cluster Job containers.
  • Update the Rok Registry VirtualService to not add an extra heading slash that breaks the API.
  • Update the rok-do Debian Buster base image to use 20221013 snapshot APT repositories.
  • Fix the Rok GW Kubernetes driver to not fail snapshots in case of conflict errors.
  • Revert renaming of Workbenches back to Notebooks.
  • Eliminate CVEs from rok-access-server image.
  • Upgrade Rok Registry etcd to 3.5.
  • Eliminate CVEs from Istio images.
  • Update the rok-do Debian Buster snapshot date to use 20221019 snapshot APT repositories.

Bug Fixes

  • rok-csi: Do not delete access-server resources if the csi-node job to create a volume times out.

Version 2.0-rc3 (Aurora)

(Released Mon, 10 Oct 2022)

New Features

  • Update rok-ddns to Debian Buster.
  • Update rok-docs and images that build Rok to Debian Buster.
  • Enable sorting in the JWA’s frontend.

Bug Fixes

  • Verify that the desired CRD versions of the volume snapshot CRDs are served.
  • Fix a Kale extension bug which broke the compilation of new IPYNBs.
  • Allow setting the container in which to execute the command in the Kubernetes Python client.
  • Remove critical CVEs from Katib images.
  • Remove critical CVEs from KFP frontend image.
  • Remove critical CVEs from Dex image.
  • Remove critical CVEs from Argo image.
  • Remove critical CVEs from MWA image.
  • Remove critical CVEs from Kserve images.
  • Update Kale images to include a JupyterLab extension bugfix.
  • Remove critical CVEs from NGINX Ingress controller image.
  • Fix port names of Rok etcd Service before upgrading Rok to ensure Rok Pods with Istio Sidecar can access Rok etcd.
  • Use idle timeout for connections to root controller to handle Pod restarts efficiently.
  • Do not override user changes in MiniKF GitOps repo.

Version 2.0-rc2 (Aurora)

(Released Fri, 30 Sep 2022)

New Features

  • Allow configuring the default retention policy for new Rok API buckets via the RokCluster CR.
  • Restructure rok_csi Python package.
  • Update Rok Scheduler image for Kubernetes 1.21 clusters.
  • Update Rok Scheduler image for Kubernetes 1.22 clusters.
  • Support AMI release 20220824 [kernel version 5.4.209-116.363.amzn2.x86_64] for node groups on EKS.
  • Support AMI release 20220914 [kernel version 5.4.209-116.367.amzn2.x86_64] for node groups on EKS.
  • Support updating the version retention policies of all Rok API buckets.
  • Add proxy support in rok-do.
  • Enable Istio sidecar injection for Rok external services.
  • Rename notebooks to workbenches in the Kubeflow web apps
  • Use responsive tables in MWA UI
  • Add sorting support in MWA’s tables
  • Use responsive tables in KWA UI
  • Revamp the experiments graph in the KWA
  • Add sorting support to all KWA’s tables
  • Update the Cluster Autoscaler image for Kubernetes 1.21 clusters.
  • Update the Cluster Autoscaler image for Kubernetes 1.22 clusters.
  • Add the rok-shutdown CLI to run tasks when Rok appliances terminate.
  • Terminate the Istio proxy sidecar only when rok-init runs as a Kubernetes Job.
  • Delegate the creation and management of the headless Rok Service needed for Istio to Rok Operator.
  • Enabled predictor and transformer components of an InferenceService CR to download objects stored in external locations (e.g., on S3)
  • Add a rok-do task to install and configure APT packages in a single task.
  • Upgrade MiniKF to Ubuntu Focal (20.04).
  • Add a rok-do task to upgrade packages in the MiniKF base images to fix CVEs.
  • Use an Envoy sidecar for culling metrics in the PVCViewer controller.
  • Update Kale images to bring new features and fixes in EKF.
  • Configure Istio peer authentication for Rok components and enable STRICT mTLS where applicable.
  • Define Istio authorization policies for Rok components.
  • Add documentation to clean up the Rok Scheduler and the Rok Scheduler Webhook.

Bug Fixes

  • Disable Istio sidecar injection in rok-k8s-reboot Jobs.
  • Fix an issue where the Rok API would perform namespace UUID validation more than once when authorizing some API calls.
  • Fix an issue where Rok API requests attempting to delete a non-existent account did not fail with 404 Not Found.
  • Fix deadlock with validatingwebhookconfiguration when upgrading KNative Serving.
  • Properly show the runtime information of ISVCs in the MWA
  • Fix the order that JWA creates PVCs. Workspace volume is always first
  • Update the rok-do Debian Buster base image to use 20220920 snapshot APT repositories.
  • Update sample outputs of kubectl get commands in our docs so that Istio-enabled workloads are consistent.
  • Upgrade AWS Load Balancer Controller to work with IMDSv1 disabled.
  • Fix Kale entrypoint when trying to run predefined steps.
  • Upgrade Istio before upgrading Rok to allow the upgrade Pod to become ready.
  • Fix the upgrade docs for EKF 2.0 to deploy the Kyverno resources before upgrading the Profile controller, because the latter depends on the former.
  • Fix removal of cloud-init in Ubuntu Focal bootstrap cloud images.
  • Fix MiniKF provisioning on Ubuntu Focal by configuring sudo to preserve the HOME environment variable.
  • Configure MiniKF to use the system’s available Python3 instead of hardcoding it to Python 3.6. This fixes support for Ubuntu Focal, where the default Python3 version is 3.8.
  • Remove an unnecessary version restriction in Python setuptools in MiniKF.
  • Prefer apt-get over pip to install packages in MiniKF.
  • Fix MiniKF provisioning on Ubuntu Focal when unusable /dev/md* devices exist.
  • Fix MiniKF provisioning on Ubuntu Focal by removing references to lvmetad, a component that no longer exists in Ubuntu Focal.
  • Fix starting Minikube in MiniKF on Ubuntu Focal by disabling the fs.protected_regular kernel parameter using sysctl.
  • Fix waiting for APT lock in MiniKF on Ubuntu Focal.
  • Fix minikf-insist exiting on Ctrl+C in MiniKF on Ubuntu Focal.
  • Fix automatically provisioning MiniKF on boot on Ubuntu Focal.
  • Fix kernel support in MiniKF on Ubuntu Focal, using Linux 5.4.
  • Disable unattended upgrades in MiniKF.
  • Fix bug in tcmu-handler that could result in commands failing, or in rare cases to data corruption, when commands are completed out of order.
  • Remove critical CVEs from the KFP API server image.
  • Fix authorization bug in the KFP API server when creating a pipeline via URL.
  • Resolve CVEs in the KFAM image.
  • Upgrade cert-manager to version 1.5.5 to support Kubernetes 1.22.
  • Remove critical CVEs from Kubeflow app images.
  • Remove critical CVEs from Authservice image.
  • Remove critical CVEs from KFP images.
  • Rename Rok Redis ports to include protocol.
  • Deploy Istio before Rok and its components to ensure Istio CRDs are present.
  • Update the rok-do Debian Buster snapshot date to use 20220929 snapshot APT repositories.
  • Ensure JWA won’t crash when the gpu.readOnly property of the ConfigMap is set to true
  • Ensure MWA properly parses the protocolVersion of ISVCs

Version 2.0-rc1 (Aurora)

(Released Fri, 16 Sep 2022)

New Features

  • Use Kustomize 4.3.0 inside rok-tools.
  • Add Verify section for GCP in the “Authorize Access to Object Storage” guide.
  • Add Google Analytics to public docs.
  • Introduce the Kale Serve API.
  • Allow building component images in userspace when compiling a pipeline.
  • Introduce Kale’s integration with TFJob to create and run TensorFlow distributed jobs.
  • Add the Kale TF Keras distributed user guides.
  • Deploy Kyverno to allow for adding policies for security.
  • Introduce devguide for configuring GPG.
  • Introduce arrikto-dev-welcome admonitions in install/tools docs.
  • Introduce arrikto-dev-welcome admonitions in install/cluster docs.
  • Introduce arrikto-dev-welcome admonitions in install/rok docs.
  • Introduce arrikto-dev-welcome admonitions in install/expose/eks docs.
  • Introduce arrikto-dev-welcome admonitions in install/vpc/aws docs.
  • Introduce arrikto-dev-welcome admonitions in ops and ops-arr docs.
  • Introduce the “Identify the Public IP Address of Your Cloud Instance” guide.
  • Introduce the “Authorize Inbound Traffic for Your EKS Cluster from Your Management Environment” guide.
  • Add support for EC2 instances in rok-deploy.
  • Introduce libkiwi.
  • Extend libkiwi to correctly report free memory.
  • Support CUDA Runtime API versions >=11.3 for Kiwi.
  • Introduce Kiwi’s anti-thrashing mechanism.
  • Add early release functionality to Kiwi client.
  • Synchronize and throttle GPU kernel launches from Kiwi clients.
  • Track CUDA memory allocations in Kiwi clients and optionally allow internal oversubscription.
  • Introduce KFaaS API schema and enable authorization and authentication in views.
  • Introduce ops guides for managing security policies with Kyverno.
  • Create Debian packages for rok-kiwi-libkiwi and rok-kiwi-scheduler.
  • Add rok-do tasks to build Kiwi images for libkiwi and the scheduler.
  • Introduce Kiwi device plugin.
  • Introduce Kustomizations for Kiwi.
  • Extend AuthService to authenticate with opaque access tokens.
  • Implement the KFaaS APIs to get, create, or list users.
  • Implement the KFaaS APIs to get, set, or delete the message of the day.
  • Create internal documents to describe the process of creating and merging PRs in GitHub.
  • Introduce devguide for configuring Google Workspace.
  • Introduce the Kiwi CLI (kiwictl) to allow configuring the Kiwi scheduler at runtime.
  • Make rok-fort-client and python-keystoneclient optional dependencies of rok-django-lib.
  • Implement the KFaaS APIs to manage deployments.
  • Send emails on various occasions of the Kubeflow as a Service user journey.
  • Drop support for AWS 4.* kernels from rok-kmod.
  • Retain a set number of deployments in error state of Kubeflow as a Service for debugging purposes.
  • Support receiving resources as JSON, tables or YAML in our K8s client.
  • Introduce the Kubeflow as a Service taskd for scheduling execution of tasks.
  • Introduce a script to gather usage statistics in Kubeflow as a Service.
  • Introduce devguide for configuring SSH.
  • Introduce devguide for configuring VPN.
  • Introduce devguide for configuring DNS.
  • Support providing decrypted configs in git and arriktoreg tasks in rok-deploy.
  • Introduce kfaas Debian package.
  • Introduce task to build the kfaas Docker image.
  • Introduce entrypoint to start the kfaas application using NGINX.
  • Install python3-googleapi in MiniKF explicitly, needed by the rok-gcp package.
  • Introduce the Transformers class.
  • Add Kale Serving API user guides.
  • Update Kale user guides due to the new Kale serve API.
  • Modify file structure for images and examples in docs.
  • Extend the rok-notebook-backup script to capture the YAML of the snapshotted notebook in the metadata of the Rok snapshot.
  • Introduce the deploy Python library which offers a programmatic interface for rok-deploy.
  • Update our guidelines for creating Debian packages from Python packages.
  • Support running ctypesgen on buster.
  • Implement the task-gc Django management command for Kubeflow as a Service.
  • Introduce devguide for configuring Full-Disk Encryption.
  • Update Operations and User Authentication guides to include the opaque access token authenticator that AuthService now supports.
  • Extend the Notebook backup script to capture the YAML of the snapshotted notebook in the metadata of the Rok snapshot.
  • Introduce the rok-backup script to manage the backup procedure of a Kubeflow cluster.
  • Introduce the rok-restore script to manage the restoration procedure of a Kubeflow cluster.
  • Introduce Operations guides to migrate an Arrikto EKF cluster.
  • Change default settings for trial limits in KFaaS.
  • Extend the ServeConfig object of the Kale serve API.
  • Refactor the Kale ServeConfig API user guide.
  • Support building docs on Debian Buster. The final docs after build on Buster might not be correct yet, but the build succeeds.
  • Support building Rok and the Debian packages on Debian Buster.
  • Update rok-kmod to Debian Buster.
  • Update rok-tools to Debian Buster.
  • Update rok-csi to Debian Buster.
  • Support running ci-tests on Debian Buster.
  • Use boto3 paginators for making persistent requests to AWS API.
  • Extend the rok-backup and rok-restore scripts to support the migration of Kubeflow profiles.
  • Introduce the kfaas-users-suspend script to suspend Kubeflow as a Service users.
  • Make the rok-deploy library report progress to the caller.
  • Expose rok-deploy installation info.
  • Prune stale resources from previous releases.
  • Introduce devguides for configuring Git and GitHub.
  • Separate the generate from the apply phase of the AWS installation.
  • Add guide for non-interactive EKF deployments.
  • Use boto3 waiters and retries for AWS commands.
  • Make rok-deploy tasks re-runnable.
  • Update Cluster Autoscaler to take into consideration the not-safe-to-evict annotation for DS Pods.
  • Introduce devguide for configuring Slack.
  • Prompt user to run the automated upgrades in rok-deploy.
  • Automate the “Upgrade Rok” guide.
  • Automate the “Upgrade Kubeflow” guide.
  • Include the Kubernetes YAML in the snapshot metadata of StatefulSets, Pods and PVCs.
  • Include the notebook CR in the snapshot metadata of Notebooks.
  • Increase verbosity of Rok Operator readiness checks.
  • Enable Istio sidecar injection in Rok Jobs.
  • Prevent Cluster Autoscaler from removing nodes where Rok etcd, Rok Redis, and Rok master Pod are running on.
  • Add utilities to handle and create authentication backends in Django apps.
  • Add Django app for users to subscribe to Arrikto products and handle their billing and subscription info.
  • Introduce Debian package for the arr-payments application code.
  • Add rok-do tasks to build the Arrikto Payments Docker image.
  • Remove all imports from upstream kubernetes.
  • Use persistent socket connections in Kiwi.
  • Make Kiwi clients report their Pod name and namespace.
  • Update Istio to 1.14.1.
  • Upgrade Prettier to version 2.7.1.
  • Introduce kfaas frontend.
  • Add operations guide for recovering Pods from out of space errors.
  • Add support in Rok CSI for explicitly unpinning volumes using rok/unpin annotation.
  • Support configuring the version retention policy of a bucket via a RokBucketConfiguration Kubernetes custom resource.
  • Introduce an operations guide to modify the default version retention policy for notebook snapshots.
  • Extend Kale serve API to serve custom models.
  • Add user guides for serving custom models.
  • Refactor the Kale Serve API user guide.
  • Restore SKLearn upstream image.
  • Extend Kale serve API to propagate Notebook Server’s PodDefaults to the InferenceService.
  • Use Kubernetes 1.21 in MiniKF.
  • Drop python2 support for rok-aws.
  • Support AMI releases 20220811, 20220802, and 20220725 [kernel version 5.4.204-113.362.amzn2.x86_64] for node groups on EKS.
  • Enable official snapshot of backports APT repository.
  • Update the rok-do Debian Stretch base image to use 20220721 snapshot of backports APT repository.
  • Update the rok-do Debian Buster base image to use 20220721 snapshot of backports APT repository.
  • Introduce changes to the Kale serving user guides to support the new serving API based on KServe.
  • Upgrade API clients to support Kubernetes 1.22.
  • Upgrade ExternalDNS to 0.12.2.
  • Remove dnsmasq from RokE and Rok Registry.
  • Drop python2 support for rok-deploy.
  • Introduce the kiwi-webhook to enable running Kiwi workloads on tainted GPU nodegroups.
  • Upgrade AWS Load Balancer Controller to 2.4.3.
  • Expose models behind a single subdomain under a different prefix using path-based serving.
  • Improve instructions for creating and upgrading self-managed node groups on EKS.
  • Pin versions of Docker CLI and Azure CLI in rok-tools.
  • Organize and prune packages in rok-tools.
  • Introduce Kiwi installation guide.
  • Introduce Kiwi User guide on how to use an Arrikto vGPU.
  • Introduce Kiwi Operations guides on how to manage your Kiwi-enabled GPUs.
  • Extend Rok controller so that Rok Processing Units can accept connections at predefined public ports.
  • Upgrade Rok CustomResourceDefinitions from apiextensions.k8s.io/v1beta1 to apiextensions.k8s.io/v1.
  • Turn recommended packages of rok-common into suggested to make them optional.
  • Update rok-operator to Debian Buster.
  • Make PostgreSQL an optional Rok external service.
  • Migrate Rok probes from init containers to rok-init.
  • Switch to the emissary Argo executor.
  • Upgrade NVIDIA device plugin.
  • Upgrade Rok/Rok Registry PostgreSQL image to 10.22.0.
  • Update rok-disk-manager to Debian Buster.
  • Migrate Rok Fort to Python 3.
  • Update roke to Debian Buster.
  • Migrate from FluentD to Fluent Bit to send EKS cluster logs to Amazon CloudWatch.
  • Enable Istio sidecar injection for Rok Pods.
  • Add headless Service for Rok so that Istio works with port 10000.
  • Explicitly enable the etcd v2 API in the manifests, since it is disabled by default in etcd v3.4 or later.
  • Upgrade Rok etcd to version v3.5.4.
  • Update kiwi to Debian Buster.
  • Update rok-registry to Debian Buster.
  • Make small fixes in Upgrade docs.
  • Enable Istio sidecar injection for Rok CSI Controller Pods.
  • Enable Istio sidecar injection for Rok CSI Node Pods.
  • Upgrade Ingress NGINX Controller to v1.3.0.
  • Upgrade our manifests to support Kubernetes 1.22.
  • Enable Istio sidecar injection for Rok Operator.
  • Update Spark to v1beta2-1.3.2-3.1.1 for K8s 1.22
  • Configure Dex to run as StatefulSet and use SQLite3 as its storage backend.
  • Upgrade kubectl in rok-tools to 1.22.13.
  • Support Kubernetes 1.22 on EKS.
  • Drop support for Kubernetes 1.19 and 1.20.
  • Support Kubernetes 1.22 on GKE.
  • Add the new Snapshots tab in the KFP UI.
  • Serve arbitrary models stored in an object storage service with Kale.
  • Introduce Kale API support for the LightGBM framework regarding MLMD logging and serving.
  • Create upgrade guides for ALB Controller, and EDNS.
  • Add user guide for serving LightGBM models.
  • Add user guide for creating custom ServingRuntimes on KServe.
  • Add user guide for serving PyTorch models.
  • Add user guide for serving Python functions with the Triton Inference Server.

Bug Fixes

  • Fix Kyverno deployment to support scaling up from zero.
  • Fix the section for accessing an EKS cluster from an EC2 instance.
  • Fix error handling in safe_{atoll, atoull}().
  • Fix race condition in Document updates on the Rok etcd v3 store.
  • Fix the rok_etcd3_client to return the etcd revision in CAS/CAD actions.
  • Prevent libkiwi from printing log messages for non-CUDA processes.
  • Set correct permissions for the Kiwi sockets and directory.
  • Fix a bug in rok-deploy where the eks-access task fails due to a missing question.
  • Accurately restore the YAML of the original notebook in the rok-notebook-restore script.
  • Fix PasswordInputQuestion to not use the prompt mechanism of the CLI arguments.
  • Fix references to the onboarding guides in our internal docs.
  • Fix references to the ‘Proxy via Jumphost (SOCKS5)’ guide in our installation guides.
  • Fix dialog package to support large lines in multiline input.
  • Fast-forward deployment tasks that have already run in rok-deploy and remember users’ choice for the verify section.
  • Fix rok-probes Python 2 compatibility.
  • Fix an issue in the Rok Kubernetes client that caused it to hang when waiting for a StatefulSet to scale to zero.
  • Stop using headless services for connecting to Rok external services (etcd, and Redis)
  • Use more secure hostPath mounts in Kiwi.
  • Fix a bug where the Kiwi scheduler erroneously thinks that a client is holding the lock after the state transition ON->OFF->ON.
  • Fix regression in the LVM host paths we bind mount into the csi-node Pods.
  • Update the rok-do Debian Buster base image to use 20220821 snapshot APT repositories.
  • Remove unused packages from Rok images to remove CVEs.
  • Fix rok-deploy so that it can apply multiple kustomize packages.
  • Fix rok-image-mirror and rok-k8s-drain rok-tools bugs.
  • Forward the original status and message when an error occurs in a local Rok control client.
  • Update the context of the Rok controller on demand when proxied requests that modify state succeed.
  • Update the tutorials section in our docs homepage so that links and descriptions are the latest ones.
  • Update the rok-do Debian Buster base image to use 20220909 snapshot APT repositories.
  • Use distinct name for the ConfigMap of the Cluster Autoscaler to workaround a Kustomize issue with replacing resource references in ClusterRoles.
  • Do not send the Content-Length header in responses with a status code of 1xx or 204.
  • Fix broken question for etcd cluster size in rok-deploy.
  • Always perform quorum reads in etcd v2 API to avoid receiving stale data.
  • Fix bug of Kale pipeline steps failing to retrieve their ancestor step pods.
  • Fix failing verification step of the cli-aws task in rok-deploy due to missing default region.
  • Fix undesired field-pruning in our CRDs.
  • Fix a bug where rok-deploy would not commit the decrypted dockerconfig in the fast-forward path of the arriktoreg task.
  • Fix invalid format errors for unencrypted SSH keys in rok-deploy.
  • Track missing rendered envfiles.

Version 1.5.3 (Ultramarine)

(Released Wed, 03 Aug 2022)

New Features

  • Extent Rok CSI to reserve enough COW space to allow for at least one snapshot per node to make progress.
  • Add new VolumeSnapshotClass parameter named rok/unpin-snapshot-cow-size to control the COW size used for unpin snapshots.

Bug Fixes

  • Fix Models Web App issue that caused model deletion to fail.
  • Update Kubeflow images to eliminate various CVEs.
  • Update google-cloud-sdk to version 393.0.0.
  • Fix CSI etcd Job handling code to prevent spawning multiple instances of the same job.
  • Fix CSI etcd Job handling code to decrease the timeout in case of stale revision errors, instead of restarting it.
  • Fix CSI to properly clean up the snapshot device stack for the volume if a snapshot fails.
  • Do not parse Rok URLs as Jinja2 templates in Rok CSI.
  • Improve filtering of PVCs when creating Rok snapshots.

Version 1.5.2 (Ultramarine)

(Released Tue, 28 Jun 2022)

New Features

  • Support kernel version 5.4.196-108.356.amzn2.x86_64 for node groups on EKS.
  • Revamp Rok controller to quickly accept hundreds of new connections from individual clients.

Bug Fixes

  • Fix upstream bug in the kernel module used for changed block tracking which could cause Rok snapshots to fail.
  • Update the rok-do Debian base image to use 20220616 snapshot APT repositories.
  • Eliminate critical CVEs from etcd v3.3.27 by using an Arrikto-provided image.
  • Fix CVE-2018-20060.
  • Fix GC race when sealing epochs.
  • Fix GC race when updating current epoch.
  • Fix Kale’s local execution of pipelines with conditional statements.
  • Fix the following Go-related CVEs: CVE-2022-26945, CVE-2022-30321, CVE-2022-30322, CVE-2022-30323.
  • Fix Rok CSI to set the VolumeContentSource field of the CreateVolumeResponse as expected by the external-provisioner sidecar.

Version 1.5.1 (Ultramarine)

(Released Thu, 02 Jun 2022)

New Features

  • Introduce an operations guide on changing the EBS volume type of Rok etcd from gp2 to io1.
  • Introduce feature documentation on scaling cluster to 300 nodes on EKS.

Bug Fixes

  • Fix rok-kubernetes client bug when reloading service account tokens, which leads to Rok Gateway snapshots failing.
  • Update the rok-do Debian Buster base image to use 20220622 snapshot APT repositories.

Version 1.5 (Ultramarine)

(Released Fri, 27 May 2022)

New Features

  • Reduce the time it takes for a RWX volume to become ready for mounting when used by hundreds of Pods.
  • Update Kale images with security updates, enhancements, and bug fixes.

Bug Fixes

  • Fix Rok CSI bug leading to RWO volumes getting attached to multiple nodes.
  • Drop support for 1.18 Kubernetes version in rok-deploy.

Version 1.5-rc4 (Ultramarine)

(Released Wed, 25 May 2022)

New Features

  • Check the status of the RokCluster CR during the installation procedure.
  • Document the supported platforms for Rok Registry.
  • Improve checks for bucket prefix in our docs and add troubleshooting sections.
  • Improve Verify steps in snapshot-controller guide.
  • Improve Kale’s error catching and handling in notebook pipelines.
  • Improve the Kale JupyterLab extension dialogs, make it compatible with the @jupyterlab/git’ extension, and fix bugs.
  • Extend rok-cluster-gc to support the --controller-connect argument.
  • Support AMI releases 1.19.15-20220429, 1.20.11-20220429 and 1.21.5-20220429 [kernel version 5.4.188-104.359.amzn2.x86_64] for node groups on EKS.
  • Support kernel version 5.4.190-107.353.amzn2.x86_64 for node groups on EKS.
  • Update various Kubeflow components to support bound service account tokens that are enabled on EKS Kubernetes 1.21 by default.
  • Update csi-node-driver-registrar CSI sidecar to version v2.5.1.
  • Update csi-attacher CSI sidecar to version v3.4.0.
  • Support storing the cluster configuration variables file (config.json) compressed on etcd.
  • Improve the performance of Rok etcd under large numbers of updates.
  • Start Rok’s cron service only when the cluster member is ready.
  • Update the instructions on how to enable the AuthService caching mechanism.
  • Add an operations guide on how to configure AuthService authentication methods.
  • Add an operations guide on how to configure the audiences that AuthService accepts.

Bug Fixes

  • Add verification step that ensures the subnets that the EKS node group will use all belong to the same availability zone.
  • Fix hardcoded output of the command that prints the rok-deploy version in Upgrade guides.
  • Update the rok-do Debian base image to use 20220517 snapshot APT repositories.
  • Update Dex to v2.30.3 to support bound service account tokens.
  • Ensure Rok Operator unblocks a drain operation by deleting all CSI Guard Pods in a cordoned node without Rok PVs.
  • Improve copy button functionality in docs.
  • Use the full VCS SemVer when producing the UI version.
  • Prevent removal of volumeattachment objects for volumes that fail to detach by upgrading the external-attacher CSI sidecar to version v3.4.0.
  • Fix Python Kubernetes client to reload tokens.
  • Ensure that Rok Operator calculates the correct CR phase based on the latest observed state.
  • Upgrade Rok etcd to version 3.3.27 which fixes a race condition that could lead to data loss when the server is shutting down.

Known Issues

  • Registering a StatefulSet through the GW driver does not work in Kubernetes 1.21, because we use a stale interface in the Python Kubernetes client.

Version 1.5-rc3 (Ultramarine)

(Released Mon, 09 May 2022)

New features

  • Add user guide on how to create short-lived tokens to authenticate external clients.
  • Update the introductory user guide regarding how authentication and authorization work in Arrikto EKF.
  • Support client-authentication with JWT access tokens.
  • Auto-detect proxy environment variables in Python Kubernetes client.
  • Update Cluster Autoscaler’s AWS EC2 Instances list.
  • Add user guide on Serving performance.

Bug Fixes

  • Don’t fail volume operations in Rok CSI if the PVC has been deleted.
  • Fix reproducibiliy of the rok-tasks that produce the Rok Docker images.
  • Fix issue that caused Rok Operator to report wrong CR status upon cluster creation.
  • Introduce Kubernetes resource quotas on Rok Registry to allow assigning the system critical priority classes to System Pods.
  • Add GKE overlays for Registry’s etcd and PostgreSQL external services.
  • Add readiness probe to the Registry operator.
  • Fix a bug where dnsmasq would fail if the member IDs for Rok Registry clusters were too long.
  • Mitigate CoreDNS flooding from Istio sidecars triggered by ExternalName services that Knative creates for Inference Services.

Version 1.5-rc2 (Ultramarine)

(Released Fri, 15 Apr 2022)

New Features

  • Support deleting a Rok account.
  • Introduce new user friendly format in NEWS section.
  • Gather more logs in rok-gather-logs (e.g., the Cluster Autoscaler logs and the /root/.rok/log logs for all Rok Pods).
  • Support Kubernetes 1.21 on EKS.
  • Support Kubernetes 1.21 on GKE.
  • Deploy the Rok Monitoring Stack in MiniKF.
  • Add upgrade instructions for GKE.
  • Add guide to set up maintenance exclusions for GKE clusters.
  • Support kernel version 5.4.186-102.354.amzn2.x86_64 for node groups on EKS.
  • Add JWT access token authentication method for AuthService, allow admins to disable one or more authentication methods of AuthService.
  • Change the order of the authentication methods of AuthService.

Bug Fixes

  • Fix verifying the instance identity document in MiniKF on AWS.
  • Remove under construction documents.
  • Fix a cache contention issue that caused Rok Operator to mishandle cluster upgrades.
  • Prevent leaving behind locks without a client in Rok DLM.
  • Handle conflicts in manifests between local and upstream changes during upgrades.
  • Update the rok-do Debian base image to use 20220413 snapshot APT repositories.
  • Don’t fail volume operations in Rok CSI if the PVC has been deleted.

Version 1.5-rc1 (Ultramarine)

(Released Thu, 31 Mar 2022)

New features

  • Introduce a new section, “Features”, in our docs.
  • Migrate Kale JupyterLab extension from JupyterLab v2.x to v3.x.
  • Add an operations guide on how to extend JWA preselected Configurations.
  • Introduce a Gunicorn application for Rok DDNS.
  • Create Debian packages for rok-ddns and rok-ddns-client.
  • Create a Docker image for Rok DDNS.
  • Validate the AWS MiniKF product code in the DDNS API backend.
  • Support CSI Spec v1.5.0.
  • Introduce optional admonition in docs.
  • Introduce under-construction admonition in docs.
  • Introduce feature documentation for notebook Docker images.
  • Introduce Jupyter Kale images with GPU support.
  • Add instructions for logging in to EKF via Azure AD (Active Directory).
  • Introduce feature documentation about our GitOps process.
  • Introduce a Kale API to log ML Models to MLMD.
  • Introduce feature documentation about the Rok snapshotting functionality.
  • Introduce feature documentation for the DistributedConfig class.
  • Add Configure Default Rok Registry URL operations guide.
  • Add the --list-tasks argument to rok-deploy to list all tasks and get information about them.
  • Add GCP support to Rok DDNS API.
  • Add GCP support to Rok DDNS client.
  • Perform verification of instance’s identity documents during MiniKF provisioning.
  • Update the way we verify roles with AWS-managed attached policies in our guides and in rok-deploy.
  • Use CloudFormation in the “Create EKS Managed Node Group” guide.
  • Add the --run-until TASK argument to deploy2, to allow stopping the installation after an arbitrary task.
  • Add readiness probe to rok-operator.
  • Support providing decrypted configs in configure-git and arriktoreg guides.
  • Document how AuthService performs authentication and authorization.
  • Bump AuthService image and patch the EnvoyFilter to allow the “Auth-Method” header.
  • Introduce Kale SDK loops.
  • Introduce a dialog in the Kale labextension to configure the deploy configuration of every step.
  • Introduce a “Show Logs” button in rok-deploy if the verification of a task fails.
  • Extend Rok csi-controller to return RESOURCE_EXHAUSTED gRPC status when there is not sufficient free space to create a volume.
  • Bump version of Rok VolumeSnapshotClass to v1beta1.
  • Upgrade csi-snapshotter to v3.0.3.
  • Update rok-kubernetes client to use the v1beta1 Volume Snapshots API.
  • Deploy Snapshot Controller along with the v1beta1 CRDs for VolumeSnapshots, VolumeSnapshotClasses, and VolumeSnapshotContents.
  • Use v1beta1 VolumeSnapshot CRDs for volume snapshots throughout Rok.
  • Automate the “Deploy Rok Disk Manager” guide.
  • rok-csi: Track the free storage space of each node as an annotation on the Node API object.
  • Introduce the Kale distributed LR scheduler example.
  • Automate the “Deploy Snapshot Controller” guide.
  • Add user guide for Kale’s typing system.
  • Support accessing Rok from outside the Kubernetes cluster using the Rok Python client and CLI.
  • Add documentation for Kale steps with keyword arguments.
  • Integrate upstream KFP caching mechanism with EKF.
  • Deliver Python3.8 Kale images in EKF.
  • Document security practices across Rok and EKF.
  • lvmd: Parse multi-line lvcreate errors.
  • rok-csi: Track the max storage space of each node as a label on the Node API object.
  • Add the Rok Monitoring Stack to the service mesh of the EKF cluster.
  • Expose Rok’s Grafana publicly at /monitoring/.
  • Support displaying Rok’s Grafana inside Kubeflow’s central dashboard.
  • Document how to allow specific users to access the Rok Monitoring Stack as admins.
  • Bump Rok’s Grafana to 7.4.0 and set the Rok dashboard as the default one.
  • Use IAM role for service account for FluentD.
  • Disable IMDSv1 on EKS managed node groups.
  • Support both custom and auto-detected AWS CLI credentials.
  • Ask user’s confirmation for AWS_ACCOUNT_ID and remove AWS_PROFILE.
  • Improve cards in the CentralDashboard main view.
  • Add a user guide invokin an InferenceService using an external client.
  • Introduce the Rok Scheduler which enables capacity aware scheduling of Pods with Rok volumes.
  • Introduce the Rok Scheduler Webhook which mutates Pods to use the Rok Scheduler.
  • Extend the Rok Scheduler Webhook to admit Pods only in selected namespaces.
  • Extend Cluster Autoscaler to take into consideration Rok storage capacity.
  • Extend Cluster Autoscaler to take into consideration Rok storage utilization when scaling in a cluster.
  • Always emit events on the cluster CR when the cluster’s health changes.
  • Support Kubernetes 1.20 on EKS.
  • Use PodDefaults admission webhook that handles certificate renewal.
  • Support AMI release 1.18.20-20220123 [kernel version 4.14.256-197.484.amzn2.x86_64] for node groups on EKS.
  • Support AMI releases 1.19.15-20220123 and 1.20.11-20220123 [kernel version 5.4.172-90.336.amzn2.x86_64] for node groups on EKS.
  • Support kernel version 5.4.176-91.338.amzn2.x86_64 for node groups on EKS.
  • Update MiniKF’s provision script to deploy the Snapshot Controller along with the v1beta1 CRDs for VolumeSnapshots, VolumeSnapshotClasses, and VolumeSnapshotContents.
  • Support retrieving the subtasks of a task in the Rok API.
  • Support retrieving the subtasks of a task in the Rok Python client.
  • Support displaying the subtasks of a task in the Rok CLI.
  • Extend the docs with instructions to retrieve the logs of a Rok API task via the CLI and Python client.
  • Restore support for the KFServing PMML predictor securely and update PyTorch predictor images for security reasons.
  • Enable all-namespaces support in VolumesWebApp and TensorboardWebApp.
  • Enable Istio sidecar injection in KFP steps.
  • Include Kale-powered VSCode images in Jupyter Web App
  • Support gathering only the latest logs.
  • Support gathering logs from selected nodes.
  • Show elapsed time in rok-gather-logs.
  • Show progress of archiving operations in rok-gather-logs.
  • Add ops guide on cleaning up MinIO.
  • Add support for Rok Scheduler on Kubernetes 1.20.
  • Add support for Arrikto provided Cluster Autoscaler on Kubernetes 1.20.
  • Drop support for Cluster Autoscaler on Kubernetes 1.18.
  • Add instructions for upgrading Volume Snapshot CRDs to v1beta1.
  • Add toleration for Cluster Autoscaler’s ToBeDeletedByClusterAutoscaler:NoSchedule taint on Rok DaemonSets.
  • Only retain the last 10K successful tasks and 20K tasks total in the Rok API.
  • Add upgrade guide for Cluster Autoscaler and Rok Scheduler.
  • Automate the “Deploy Rok Scheduler” guide.
  • Improve account handling in Rok’s frontend.
  • KF CentralDashboard can send namespace information to apps opened by another apps, like Rok file chooser opened from the JWA
  • Add user guide about monitoring Rok etcd.
  • Drop support for Kubernetes 1.18.
  • Introduce support for defaults values in Kale SDK step arguments
  • Upgrade kubectl in rok-tools to 1.20.15.
  • Support AMI releases 1.19.15-20220309 and 1.20.11-20220309 [kernel version 5.4.181-99.354.amzn2.x86_64] for node groups on EKS.
  • Remove memory and cpu limits from Rok Scheduler Webhook’s deployment and specify requests, instead.
  • Specify memory requests on Rok Scheduler deployment.
  • Support Kubernetes 1.20 on GKE and AKS.
  • Support Ubuntu bionic kernel 5.4.0-1059-gke for GKE.
  • Support Ubuntu bionic kernel 5.4.0-1070-azure for Azure.
  • Add user guide about monitoring the free space on EKF cluster nodes.
  • Introduce an operations guide to create custom Kale images.
  • Introduce an operations guide to configure a custom Kale Python image.
  • Add feature docs on creating a notebook with a Kale VSCode image.
  • Introduce script to enable Istio sidecar injection for existing inference services.
  • Fix an issue where the Rok election daemon would delete its DLM client without having cleaned up its locks, which prevented the election of a new master.
  • Introduce Kale CLI arguments to upload a (shared) pipeline
  • Use Istio sidecar in all inference services.
  • Add AuthService caching mechanism.

Bug Fixes

  • Improve copy functionality for Firefox in docs.
  • Support logging in via a Service Principal in the Azure installation docs.
  • Restructure the Verify section of the “Create ACM Certificate” guide.
  • Improve nested toggle-able admonitions.
  • Protect Tensorboard and PVCViewer controllers from OOM conditions.
  • Restrict the total size of log files in Rok Pods to prevent them from running out of disk space.
  • Extend Rok Scheduler Webhook to log the ‘generateName’ of a Pod.
  • Make the Rok client compatible with recent versions of the Python Kubernetes client.
  • Improve action button positioning in docs.
  • Do not enable auto-registration of volume snapshots on MiniKF by default.
  • Fix AuthService integration with Cognito.
  • Fix a race condition that could cause Rok API task garbage collection to leave behind stale task logs.
  • Clean up subtask directories when deleting Rok API tasks.
  • Fix an issue where running Rok API task garbage collection could cause policies to run earlier than their schedule.
  • Fix csi-provisioner sidecar to remove the volume.kubernetes.io/selected-node annotation from an unbound PVC if the node no longer exists.
  • Do not garbage collect Rok API tasks if the provided retention interval is zero.
  • Work around upstream bug in snapshot-controller which prevents the deletion of volume snapshots.
  • Fix a bug that occurred in the file info page in a Kubeflow environment when a user reloads it.
  • Fix Cluster Autoscaler to sanitize node label on template nodes.
  • Improve the cleanup logic of the Rok election daemon, to ensure the Rok cluster can always elect a new master node.
  • Fix applying a retry stragey for pipeline steps from the Kale SDK.

Version 1.4.4 (Titanium)

  • Fix MiniKF breakage caused by git fixing CVE-2022-24765.
  • Pin pip dependencies of the MiniKF provision script.
  • Migrate legacy RWX volumes to new Rok releases.
  • Don’t fail volume operations in Rok CSI if the PVC has been deleted.
  • Update the base Ubuntu Bionic image MiniKF uses on GCP to v20220419 and on AWS to v20220411.
  • Update the APT key MiniKF uses for Nvidia APT repositories.
  • Improve performance of instantiating the ClusterConfig.
  • Improve performance of validating the ClusterConfig.
  • Support batch adding members to a Rok cluster.
  • Support batch removing members from a Rok cluster.
  • Make retrieval of join metadata for multiple cluster members more efficient.
  • Rework Rok Operator to add multiple, new members to the cluster in bulk.
  • Rework Rok Operator to remove multiple members from the cluster in bulk.
  • Add devguide to update base image of MiniKF.
  • Replace EULA with a Terms of Service (ToS).
  • Fix bug that resulted in rok-config consuming lots of memory.
  • Update MiniKF docs for releasing on GCP through Producer Portal.
  • Use authoritative DNS servers to verify that MiniKF FQDN is resolvable on GCP.
  • Allow specifying the Dex (login) and AuthServive (logout) theme on MiniKF.
  • Allow specifying the Central Dashboard overlay on MiniKF.
  • Update Dex to inlcude a KFaaS-specific theme.
  • Support KFaaS-specific login/logout themes and Central Dashboard on MiniKF.
  • Remove links from the MiniKF release information that point to MiniKF references in the Kubeflow website.
  • Upgrade containerd in MiniKF to fix pods randomly getting into crash loops.
  • Update EKF tutorial descriptions.

Version 1.4.3 (Titanium)

  • Fix accounting for tolerations when the Rok Operator scales a Rok cluster.
  • Add support for templated resources in Profile Controller.
  • Fix Cluster Autoscaler to sanitize node label on template nodes.
  • Update the copyright date in the frontend.

Version 1.4.2 (Titanium)

  • Introduce a memory request for the Rok etcd Pod.

Version 1.4.1 (Titanium)

  • Get join metadata for a Rok cluster member with a single etcd request.
  • Fix KFP bug when showing logs of failed steps.
  • Increase Rok Operator’s verbosity during the reconciliation loop.
  • Add argument for the number of workers to the task-gc command of the Rok API management tool.
  • Reduce the number of Rok Thrower updates to Etcd, for version stats that have not changed.
  • Rename the ROK_PORT and ROK_PORTAL environment variables to ROK_PU_PORT and ROK_PU_PORTAL, to avoid a conflict with Kubernetes service environment variables.
  • Fix an issue where the tool performing garbage collection of Rok API tasks failed due to a missing environment variable.
  • Set a memory limit to Rok’s etcd.
  • Rename Rok’s Node Exporter cluster-scoped RBAC resources to avoid conflicts with Knative by adding the ‘rok-‘ prefix to their names.
  • Fix an issue with parsing the revision number in Rok versions.
  • Support rendering the Kubeflow manifests, replacing the release-2.0-l0-release-2.0.2 string.
  • Replace Rok 2.0.2 "Aurora" (release - release-2.0) (iliastsi@rok-dev) (GCC 6.3.0) 2023-03-31T13:49:56Z and 2.0.2 when rendering manifests.
  • Fix calculating short Rok versions when the commit hash is missing.
  • Upgrade to a CentralDashboard image that exposes the EKF version in the dashboard’s sidebar.
  • Introduce devguide for installing WSL2.
  • Add WSL2 troubleshooting for multiple NAT.
  • Add instructions for logging in to EKF via AWS Cognito.

Version 1.4 (Titanium)

  • bootstrap: Rename --no-check to --validate/--no-validate.
  • Add Verify and Troubleshooting sections in the AKS docs to ensure that managed identities are enabled on AKS clusters.
  • Remove dpkg-dev, apt-utils and bzip2 from our images.
  • Use reproducible base images for Debian, Ubuntu, AmazonLinux, and CentOS.
  • Use a single task to build all bootstrap images.
  • Tag MiniKF image with labels.
  • Fix a bug to prevent a key error in a CloudFormation stack status.
  • Add a Check Kubernetes Version section in our upgrade guides.
  • Restructure the “Configure Access to Arrikto’s Private Registry” guide and add a verify section.
  • Introduce a developer guide for installing and configuring Docker.
  • Introduce arrikto-admin admonition in docs.
  • Introduce fast-forward admonition in docs.
  • Improve nested lists style in docs.
  • Introduce custom design in nested lists in docs.
  • Move the cleanup instructions to the top level of the docs.
  • Split the cleanup instructions into separate documents that clean up apps, the RokCluster, identities, storage, and the Kubernetes cluster itself.
  • Improve the structure of our cleanup documents.
  • Add support for Azure in our cleanup documents.
  • Make the Kubeflow cleanup instructions part of the Rok cleanup guide.
  • Extend the docs with Azure CLI instructions for creating an AKS cluster.
  • Extend the docs with Azure CLI instructions for attaching disks to nodes.
  • Extend the docs with Azure CLI instructions for creating a storage account.
  • Extend the docs with Azure CLI instructions for creating a Managed Identity.
  • Introduce internal ops guide for new customer onboarding.
  • Introduce internal ops guide for Team Member onboarding to AWS.
  • Improve anchor links scroll behavior in doc.
  • Introduce persistent state for toggles and admonition directives.
  • Extend the Azure docs to add tags in storage accounts.
  • Fix autofill suggestions in presentation policies in Rok UI.
  • Add account management in Rok deployments.
  • Update the Kubeflow guide to not deploy AuthService or Dex.
  • rok-csi: Extend GC to handle LIO devices.
  • Remove the section about draining CSI nodes from the upgrade instructions.
  • Add user guide for Kale JupyterLab extension.
  • Add Verify section for Azure in the “Authorize Access to Object Storage” guide.
  • Restructure the ‘Hot-Patch an Arbitrary Image in Your Deployment’ section of the ops docs.
  • Render Jinja2 YAML templates when rendering manifests.
  • Fix regression causing slow Rok CSI reboots.
  • rok-csi: Wait until LIOd has been fully initialized.
  • Introduce persistent state for tab directive.
  • Restructure “Create Kubernetes Cluster on AWS”.
  • Explicitly specify in the Sphinx configuration file the paths included per tag for doc builds.
  • Fix the default CLI help message of True/False questions.
  • Support environment files as a new input source for answering questions.
  • Log question related events at INFO level.
  • Automate the “Clone GitOps Repository” guide.
  • Add a restore mechanism for retrieving answered questions from the deployment context.
  • Add a save mechanism for storing answers to questions in the deployment context.
  • Automate the “Configure Access to Arrikto’s Private Registry” guide.
  • Add a fast-forward admonition for the ‘Configure Access to Arrikto’s Private Registry’ guide.
  • Extend literalinclude directive in docs.
  • Support network-accessible RWX volumes in rok-csi.
  • Support adding EBS volumes to managed node groups.
  • Fix doc builds to retrieve correct Rok version info from a vcs-version file.
  • Restructure the “Create Cloud Identity” and improve the verify section.
  • Restructure the “Authorize Access to Object Storage” and add verify section.
  • Restructure the “Grant Rok Access to Private Docker Registry” guide and add a verify section.
  • Automate the “Create Cloud Identity” guide.
  • Automate the “Authorize Access to Object Storage” guide.
  • Automate the “Grant Rok Access to Private Docker Registry” guide.
  • Restructure the “Deploy Kubeflow” section so that it follows structure and writing guidelines.
  • Add styles for the :guilabel: role in docs.
  • Add instructions for logging in to EKF via the Okta Provider.
  • rok-csi: Fix RWX volumes becoming unresponsive after restarting the rok-csi-node Pod.
  • Add the mechanism to save and restore the context of the docs.
  • Restructure the “Set Up Users for Rok” guide and add a verify section.
  • Improve the verify section of the “Deploy Rok Components” guide.
  • Improve literalinclude directive’s output when rendering diffs.
  • Automate granting access to Rok and Kubeflow Pipelines to user namespaces using skel resources.
  • Add design doc for the skel controller.
  • Update Kale images to work with KF 1.4.
  • Introduce user guides for the Kale integration with the Kubeflow PyTorch Operator.
  • Support disabling automatic Profile creation upon login.
  • Automate the “Configure Git” guide.
  • Automate the “Configure AWS CLI” guide.
  • Support patching the Kale Python image to use in manifests with rok-image-patch.
  • Handle all image references of KFServing in air-gapped deployments.
  • Handle deleted resource types in rok-deploy --delete.
  • Add validation checks when restoring the deployment context of a task.
  • Automate the “Set Up Cloud Environment for AWS” guide.
  • Automate the “Create VPC” guide for AWS.
  • Automate the “Configure Subnets” guide for AWS.
  • Fix broken hidden-literalinclude directive.
  • Introduce user guides for Rok.
  • Introduce user guides for the Kale support for pipeline conditionals, and the use of volumes for data passing.
  • Support unpinning of RWX volumes in rok-csi.
  • Add Verify section for AWS in the “Authorize Access to Object Storage” guide.
  • Fix upgrade guide to first apply the new CRD and then the new CR.
  • Introduce user guides for the Kale support for Kubernetes metadata and spec configuration of pipeline steps.
  • Extend user guides with Kale-KFServing integration docs.
  • Use NFSv4 for RWX volumes in rok-csi to support file locking.
  • Add rok/rwx-enable-local-access annotation to disable the local access optimization for RWX volumes in rok-csi.
  • Prune stale resources after upgrading to Kubeflow 1.4.
  • Support filtering Notebooks by image in rok-notebook-upgrade.
  • Extend rok-do to build the access server image that Rok CSI uses to provide RWX volumes on Kubernetes.
  • Make the skel controller ignore the status field of Kubernetes objects.
  • Include version information in all Rok API service driver calls.
  • Make Rok API tasks impersonate the ‘rok-task-runner’ service account in their namespace, instead of the last user that created or updated them.
  • Fix stale references to Dex and AuthService Rok manifests.
  • Introduce a Kubernetes controller for Rok policies.
  • Introduce an operations guide for setting a culling policy for your Notebook Controller.
  • Add ops guide for setting up a backup policy in DML for the EBS volume that Rok etcd uses.
  • Introduce user guides for Kale container-based steps.
  • Restructure the cluster-autoscaler kustomization package and configure it using j2.
  • Patch Cluster Autoscaler to support scale-in operations in clusters running Rok.
  • Revamp Rok Monitoring Stack to work with Kubernetes 1.19 and 1.20.
  • Extend rok-deploy to support server-side applying resources to Kubernetes.
  • Deploy Rok Monitoring Stack using server-side apply.
  • Always deploy Rok Monitoring Stack on Kubernetes using rok-deploy.
  • Upgrade Kale to support numerous new features and fix bugs.
  • Extend the Rok policy controller to add finalizers to policies it manages.
  • Fix VerifyPasswordInputQuestion to respect question attributes.
  • Use NFSv4.2 and non-privileged NFS ports to prevent using stale conntrack entries after migrating the NFS server of a RWX volume in rok-csi.
  • rok-csi: Don’t mix pods accessing a RWX volume over NFS with pods accessing it locally.
  • rok-csi: Track the nodes where a volume is staged to work around a Kubernetes bug which results in unpublishing in-use volumes.
  • Introduce optional field spec.images.rokAccessServer in the RokCluster CR.
  • Support auto-recovery of RWX volumes in rok-csi.
  • rok-csi: Work around NodeStageVolume Kubernetes bug.
  • Add support for tolerations in the RokCluster CR.
  • Add fast-forward support in deploy2.
  • Automate the “Create EKS Cluster IAM Role” guide.
  • Automate the “Create EKS Node IAM Role” guide.
  • Automate the “Create EKS Cluster” guide.
  • Automate the “Enable IAM Roles for Kubernetes Service Accounts” guide.
  • Automate the “Access EKS Cluster” guide.
  • Automate the “Create EKS Node Group” guide.
  • Introduce an Ops guide to create a default snapshot policy for notebooks.
  • Introduce an Ops guide to create a snapshot policy for Kubeflow PVCs.
  • Improve toggle formatting in docs.
  • Automate the “Set Up Users for Rok” guide.
  • Automate the “Deploy Rok Components” guide.
  • Automate the “Set Up Rok Storage Class” guide.
  • Automate the “Install Kubeflow” guide.
  • rok-csi: Use common labels for Rok Access Server StatefulSet and Service.
  • Add an AuthorizationPolicy client in our Rok Kubernetes clients.
  • rok-csi: Restrict access to Rok Access Server using Istio Authorization Policy.
  • Automate the “Integrate Rok with Kubeflow Dashboard” guide.
  • Improve the Verify section of the “Authorize Access to Object Storage” guide to detect authorization errors if the bucket does not exist.
  • Automate the “Create Hosted Zone” guide.
  • Automate the “Create IAM Role for ExternalDNS” guide.
  • Improve the display of task logs for tasks with large numbers of log lines in Chromium browsers.
  • Improve handling of unknown labels in navigation buttons in the docs.
  • Automate the “Deploy ExternalDNS” guide.
  • Support hiding the first paragraph of admonitions on demand.
  • Automate the “Create ACM Certificate” guide.
  • Improve nested lists styles in docs.
  • Automate the “Deploy cert-manager” guide.
  • Automate the “Create IAM Role for AWS Load Balancer Controller” guide.
  • Automate the “Deploy AWS Load Balancer Controller” guide.
  • Automate the “Deploy NGINX Ingress Controller” guide.
  • Improve explicit numbering in doc’s numbered lists.
  • Improve the style of tabs inside admonitions.
  • Automate the “Expose Istio” guide.
  • Support running rok-k8s-reboot in air-gapped environments.
  • Fix EKS_IAM_{CLUSTER, NODE}_ROLE variables in docs and j2 templates.
  • Remove exports from the CF stacks for the EKS cluster/node IAM roles.
  • Provide clearer messages for the fast-forward path in rok-deploy2.
  • Standardize deploy2 logic for rok-deploy.
  • Fix new line omission when rendering jinja templates in deploy2.
  • Fix LOW priority questions in the fast-forward path of deploy2.
  • Automate the “Deploy Cluster Autoscaler on AWS” guide
  • Split some Rok guides into parent-fork structure.
  • Extend the Rok CSI GC code to check the system state and reconcile the ‘staged’ list accordingly.
  • rok-csi: Fix ControllerPublishVolume leaving behind stale NFS server Pods.
  • Split the AWS VPC guide into 2 guides for VPC creation and subnets configuration.
  • Use edit commands in Deploy Autoscaler guide instead of kustomize edit.
  • Update Kubeflow manifests to fix KFP UI bugs.
  • Extend the Frontend module to support writing a summary to stderr at the end of the execution.
  • LVMd: Fail volume creation if hydration is stuck for more than one minute.
  • Update Kale images to introduce a PyTorch distributed example, support the new ML Notebook driver, and fix a KFP client credentials initialization bug.
  • Update the Test Rok section of the installation docs to deploy an application in the user’s rather than the default namespace, so it is compatible with the task authentication changes introduced in Rok 1.4.
  • rok-csi: Add 30 minute timeout on volume lock acquisition for snapshots.
  • Improve the Rok driver for Jupyter Notebooks to handle Notebook CRs instead of Pods.
  • Add toleration to ensure that RWX volumes work on GPU dedicated nodes.
  • Patch knative-serving Deployments and set the safe-to-evict annotation to true.
  • Support deploying Rok monitoring stack in air-gapped environments.
  • Restructure subnet configuration.
  • Minify the Node Exporter Grafana dashboard JSON definition to avoid server-side applying the Rok Monitoring Stack.
  • Fix a rendering bug in rok-deploy for the “Create IAM Role for Cluster Autoscaler” task.
  • Fix LIOd waiting forever for the TCM loop device to appear.
  • Add save/restore mechanism to manual installation guides.
  • Use simulate-principal-policy to verify permissions of IAM role for ExternalDNS.
  • Use simulate-principal-policy to verify permissions of IAM role for AWS Load Balancer Controller.
  • Use CloudFormation in the “Create IAM Role for ExternalDNS” guide.
  • Use CloudFormation in the “Create IAM Role for AWS Load Balancer Controller” guide.
  • Use simulate-principal-policy to verify permissions of IAM role for EKS Cluster and EKS Node IAM Role guides.
  • Use CloudFormation in the “Create Hosted Zone” guide.
  • Support using existing hosted zones in the “Create Hosted Zone” guide.
  • Update Verify section in the “Create Hosted Zone” guide.
  • Save the names of CloudFormation stacks
  • Add missing environment variables to some questions in deploy2.
  • Support AMI releases 1.18.20-20211001, 1.18.20-20211003 and 1.18.20-20211004 [kernel version 4.14.246-187.474.amzn2.x86_64] for node groups on EKS.
  • Support AMI releases 1.18.20-20211008 and 1.18.20-20211013 [kernel version 4.14.248-189.473.amzn2.x86_64] for node groups on EKS.
  • Support AMI releases 1.19.13-20211001, 1.19.13-20211003 [kernel version 5.4.144-69.257.amzn2.x86_64] for node groups on EKS.
  • Support AMI releases 1.19.13-20211004, 1.19.14-20211008 and 1.19.14-20211013 [kernel version 5.4.149-73.259.amzn2.x86_64] for node groups on EKS.
  • Add fast-forward admonitions to manual installation guides for AWS.
  • Fix Kale image building breakage for Python versions other than 3.6.
  • Add the --run-from TASK argument to deploy2, to allow starting the installation from an arbitrary task.
  • Remove CloudFormation stack names from the deployment context.
  • Improve the ‘Set Up Rok Storage Class’ guide and introduce a verification step.
  • Update Verify section in the “Deploy Rok Components” guide.
  • Specify trusted CIDRs for both internal and internet-facing ALBs.
  • Restructure the “Configure Git” guide.
  • Add ops guide on how to gather logs for troubleshooting.
  • Restructure the “Clone GitOps Repository” guide.
  • Fix a bug when creating snapshots of notebooks with emptyDir volumes.
  • Add missing export for subnets env var in the f-f section of the “Create EKS Managed Node Group” guide.
  • Fix a bug in the NodeUnstageVolume Rok CSI method that could result in stale (not deactivated) volumes.
  • Work around NFS kernel bugs that could result in leaving behind stale knfsd threads, preventing rok-csi from deactivating and deleting a RWX volume.
  • Improve the Rok Registry installation docs.
  • Add instructions to snapshot a notebook using the Rok UI, command line and Rok Python client.
  • Fix some omissions in the fast-forward sections of our docs.
  • docs: Add operations guide about recovering RWX volumes after node failure.
  • Use CloudFormation in the “Create ACM Certificate” guide.
  • Introduce ops guides related to firewalling.
  • Fix typo in the “Gather Logs for Troubleshooting” guide.
  • Add an operations guide on how to issue Rok Registry tokens.
  • Refactor rok-deploy to use client-side apply for the Rok Monitoring Stack.
  • rok-csi: Don’t try to record events on non-existing resources.
  • Extend the docs with instructions to retrieve the logs of a Rok API task via the Rok UI.
  • Introduce a script to snapshot all notebooks in a Kubernetes cluster and publish them to a Rok Registry.
  • Add instructions to present a notebook programmatically.
  • Introduce an internal ops guide about how to share Rok wheels.
  • Add a configuration directory to rok-do.
  • Increase the initial delay for the liveness probe of etcd to handle slow startups.
  • Add an operations guide about how to create a privileged notebook server.
  • Introduce a script to restore all notebooks in the buckets of a Rok Registry user with a given prefix.
  • Do not backup notebooks that already have a snapshot, by default.
  • In the f-f path of rok-deploy protect answers that already exist in the envfile so that runtime answers do not override them.
  • Move the helpers required to present a notebook to the rok_gw_client package.
  • Remove unnecessary env vars AWS_ACCOUNT and AWS_IAM_USER from AWS docs.
  • Fix bug where the fingerprint check emitted wrong result in the “Configure Git” guide.
  • Fix bug where we trusted GitHub SSH keys without checking their fingerprints.
  • Suppress questions for AWS_ACCOUNT_ID and AWS_DEFAULT_REGION in task envvars-aws.
  • Extend the fast-forward section of the “Set Up Cloud Environment for AWS” guide to set up the environment context.
  • Fix Knative Serving to avoid potential Istio misconfiguration due to conflicting ports between Istio Gateways.
  • Fix a bug where the Rok file chooser did not highlight the selected file.
  • Fix subnets formatting in f-f sections of the “Create EKS Cluster” and “Create EKS Managed Node Group” guides.
  • Enable the ‘Kubernetes’ path on ‘rok-deploy’.
  • Automate the missing Verify steps for AWS in the “Authorize Access to Object Storage” guide.
  • Fix a Rok API bug when adding a Rok Registry token to a Rok account from within a Kubernetes cluster.
  • Implement a verification loop in order to amend the user experience when an error occurs in a verify section.
  • Update the rok-dev guide to expose port 8000 when running rok-dev.
  • Restructure the “Release Process” developer guide.
  • Support specifying SSH arguments in rok-do.
  • Support AMI releases 1.18.20-20211109, 1.18.20-20211117 and 1.18.20-20211206 [kernel version 4.14.252-195.483.amzn2.x86_64] for node groups on EKS.
  • Support AMI release 1.19.13-20211009 [kernel version 5.4.149-73.259.amzn2.x86_64] for node groups on EKS.
  • Support AMI releases 1.19.15-20211117 and 1.19.15-20211206 [kernel version 5.4.156-83.273.amzn2.x86_64] for node groups on EKS.
  • Fix the volume unpinning logic of rok-csi to make forward progress, even if some volumes fail to unpin.
  • Fix rok-do handling paginated results when working with GCP remotes.
  • Fix container ports in minikube overlay of nginx-ingress-controllers.
  • Add rok-do task to create an Ubuntu Bionic GCP image that uses ifupdown instead of netplan for network configuration and has predictable NIC names disabled.
  • Fix MiniKF to handle the rok-access-server Docker image like the rest of Rok images.
  • Do not configure namespace-resources in MiniKF.
  • Use path.Path instead of the deprecated path.path in MiniKF provision.py.
  • Support installing nvidia-460 in Ubuntu Bionic used by MiniKF.
  • Upgrade Minikube and Kubernetes in MiniKF to versions 1.23.2 and 1.19.15 respectively.
  • Support adding extra, user provided IPs to the AuthorizationPolicy of Rok access servers.
  • Introduce optional field spec.rokCSIControllerArgs in the RokCluster CR, and use it in the MiniKube overlay to add extra allowed IPs for Rok Access Servers.
  • Update scale-in guides to point to the Arrikto provided Cluster Autoscaler.
  • Extend the global_filter option in the host’s LVM configuration to ignore devices created by CSI/LVMd.
  • Relocate misplaced panels in Rok’s Grafana dashboard.
  • Fix Cluster Autoscaler to not ignore PV affinities when scaling out.
  • Fix Cluster Autoscaler to disable scaling in unready nodes.
  • Support switching user at the remote in rok-do.
  • Fix calling fsfreeze to snapshot remotes with rok-do.
  • Disable unattended kernel upgrades on MiniKF.
  • Add support for kernel linux-aws version 5.4.0.1060 in rok-kmod, used by MiniKF on AWS.
  • Reduce the logging output of the Rok S3 daemon under normal operation.
  • Drop support for the potentially vulnerable PMML predictor of KFServing.
  • Enable automatic Rok snapshot policies for Kubeflow system PVCs.
  • rok-csi: Don’t set force-remount flag when recovering an unused RWX volume.
  • Fix rok-csi to remove node from volume’s staged list if it fails to stage the volume.
  • Periodically collect stale volumes in rok-csi.
  • Upgrade NodeJS in KFP UI and Centraldashboard to eliminate CVEs.
  • Add user guide for Rok Disk Manager.
  • Upgrade the S3Proxy image in Azure deployments to include the latest version of log4j.
  • Automate “Gather Logs for Troubleshooting”.
  • Use separate kernel source and build directories in do/kmod tasks.
  • Support MiniKF kernel on GCP.
  • Use ruamel.yaml in the rok_kubernetes Python package.
  • Fix Python 2 compatibility for the rok_kubernetes Python package.
  • Fix typo when checking if snapshot is full in LVMd.
  • Fix era_invalidate failures resulting in failed CSI snapshots.
  • Install python3-psutil using APT in MiniKF instead of relying on PyPI.
  • Fix typos in MiniKF central dashboard cards.
  • Fix a bug in the Rok API notebook driver where it failed to retrieve suggestions if a notebook with missing volumes existed in the namespace.
  • Update Kale commit to use its release branch containing bug fixes.
  • Fix regressions in JWA regarding form inputs not respecting the ConfigMap, backwards-compatibility for volume mount paths and PVC names, and the default size of volumes.
  • Fix backwards-compatibility issue in KFP UI regarding MLMetadata artifact name display.
  • Update the Kale images in EKF with respect to its commit update.
  • Catch corner case exceptions in the Kale distributed example.
  • Fix a bug in the cleanup code of Rok CSI’s NodeStageVolume where it failed to remove the node from the staged list for RWX volumes.
  • rok-csi: Allow nodes on GKE to reach the Rok Access Server Pods.
  • Support kernel version 4.14.252-195.481.amzn2.x86_64 for node groups on EKS.
  • Add devguide to scan MiniKF AMIs.
  • Add devguide to release MiniKF on AWS Marketplace.
  • Add devguide to release MiniKF on GCP Marketplace.
  • Support AMI release 1.19.15-20220112 [kernel version 5.4.162-86.275.amzn2.x86_64] for node groups on EKS.
  • Support AMI release 1.18.20-20220112 [kernel version 4.14.256-197.484.amzn2.x86_64] for node groups on EKS.
  • Gather logs from all EKF pods.
  • Add various fixes in the GKE guides.
  • Increase buffer length for GCP access token in the S3 daemon.
  • Fix broken checks for length of GCP access token in the S3 daemon.
  • Compress tarballs more efficiently.
  • Add gnupg to the list of required dependencies of rok-deploy.
  • Remove stale home directories from MiniKF on GCP.
  • Drop support for RHEL kernels.
  • Update the rok-do Debian base image to use 20220201 snapshot APT repositories.
  • Update CentralDashboard image that supports dashboard new content.
  • Introduce new internal MiniKF documentation.
  • Fix the autoscaler deployment in rok-deploy.
  • Introduce feature documentation about our GitOps process.
  • Block project SSH keys when developers deploy MiniKF on GCP.
  • Support kernel version 5.4.181-99.354.amzn2.x86_64 for node groups on EKS.

Version 1.3.1 (Sapphire)

  • Extend Istio to support regular expressions in Authorization Policies.

Version 1.3 (Sapphire)

  • Support RDM on Google Cloud.
  • Enable auto-recovery for Rok on Google Cloud.
  • Configure gcloud inside rok-tools.
  • Set up cloud environment for GCP inside rok-tools.
  • Support creating a GKE cluster.
  • Expose services on GCP.
  • Add instructions for logging in to EKF via the Google Identity Provider.
  • Rename the --aws-region argument of the Rok S3 daemon to --region.
  • Introduce the --authentication-scheme argument to the Rok S3 daemon, which controls the authentication scheme used when accessing the S3 service.
  • Introduce the --gcp-access-token argument to the Rok S3 daemon to pass the OAuth2 token when using the GCP authentication scheme.
  • Extend the Rok S3 daemon to automatically retrieve security credentials from GCP instance metadata when they have not been provided via the environment.
  • Introduce the --gcp-project-id argument to the Rok S3 daemon to pass the Google project ID to use when accessing Google Cloud Storage.
  • Extend the Rok Operator to support deploying Rok using Workload Identities on GKE.
  • Support deploying Rok in GKE using Workload Identities.
  • Add instructions to deploy Rok using a Workload Identity on GKE.
  • Prevent GKE from forcing v1beta1 CSI snapshot CRDs.
  • Use high performance storage for Rok external services on GKE.
  • Introduce Kubernetes resource quotas on Rok to allow assigning the system critical priority classes to System Pods.
  • Improve ordered list styles in docs.
  • Make the deploy overlays of our kustomizations build-able.
  • Support running nginx-ingress-controller in security-wise strict environments where privilege escalation is not allowed.
  • Introduce nav-buttons directive in docs.
  • do: Enrich the labels rok-do attaches to snapshots and remotes
  • Improve toggle directive’s nested functionality in docs.
  • Improve list design in docs.
  • Support deploying rok-tools inside an EC2 instance.
  • Support air gapped deployments on AWS.
  • Improve numbering in nested ordered lists in docs.
  • Make all documentation’s headers black.
  • Improve code-block’s desing in docs.
  • Introduce helper for preserving comments when removing entries from YAML manifests.
  • Temporarily revert changes when running rok-image-patch to support seamless upgrades after first invocation.
  • Extend the Rok common download helper to automatically encode the downloaded content using the encoding found in the HTTP headers of the response.
  • Introduce ec2 specific helpers in rok-aws that fetch an AWS instance’s metadata.
  • Implement a Python 3 client for the MiniKF DDNS API.
  • Introduce a developer guide for bug report workflow.
  • Rename Maintenance section to Operations Guide.
  • Restructure “Configure Rok” and move it to Operations Guide.
  • Upgrade cert-manager to version 1.3.1.
  • Use hex encoding in S3Proxy credentials.
  • Use a predictable and unique storage account name on Azure.
  • Specify the S3 bucket prefix when deploying Rok on Azure.
  • Introduce a Rok Kubernetes client for SubjectAccessReview resources.
  • Introduce a Rok Kubernetes client for TokenReview resources.
  • Use the Rok Kubernetes client in the Kubernetes authorization and authentication backends of the Rok Django library.
  • Restructure “Test Rok”.
  • Restructure the “Expose Services on AWS with ALB” guide.
  • Use a predictable and unique Managed Identity name on Azure.
  • Support Ubuntu Bionic kernel 5.4.0-1048-azure for AKS node pools.
  • Remove rok-conf dependency from RDM.
  • Introduce Kubernetes resource quotas to our manifests to allow assigning the system critical priority classes to System Pods.
  • Mark System Pods of Rok external services as critical to protect from OOM kills and evictions.
  • Add CPU requests to containers of Rok external services to protect them from CPU starvation.
  • Fix rok-image-patch to work with EKF 1.3.
  • Extend rok-kf-rebase to handle commits made with rok-image-patch.
  • Drop support for Kubernetes 1.16.
  • Bump the version of the Kubeflow manifests.
  • Introduce dedicated guide for patching manifests to use mirrored images.
  • Modify the “Switch release channel” document for 1.3.
  • Add Auto Scaling Rok AWS client.
  • Support multiple node groups and Availability Zones in rok-k8s-drain tool.
  • rok-k8s-drain: Recalculate utilization of candidate node before draining it.
  • rok-k8s-drain: Add extra logs while waiting for a node to be removed.
  • Support AMI releases 1.17.12-20210628 and 1.18.9-20210628 [kernel version 4.14.232-177.418.amzn2.x86_64] for managed node groups on EKS.
  • Restructure and enhance the Kale SDK guides.
  • Enable the AutoML-related features of Kale.
  • Increase the HTTP request header limits for the NGINX and Istio proxies, and Rok’s Gunicorn.
  • Restructure and enhance the Mirror Arrikto GitOps repository guide.
  • Extend rok-notebook-upgrade script to support label selectors.
  • Extend rok-notebook-upgrade script to remove PodDefaults from notebooks.
  • Extend rok-notebook-upgrade script to add PodDefaults to notebooks.
  • Remove the AGPL-licensed libjbig2dec0 package from rok-tools.
  • Add instructions for logging in to EKF via the PingID Identity Provider.
  • Upgrade Kale due to bug fixes.
  • Fix an error in the migration script for config version v010300_0002.
  • Update the Rok 1.3 upgrade guide to check for Kubernetes version 1.17 or 1.18.
  • Introduce the Kale - Katib integration user guides.
  • Extend rok-image-list to include the Rok Registry image.
  • Ensure user-enabled Istio patches take effect after running rok-image-patch.
  • Remove some unnecessary dependencies from the Rok Registry image.
  • Move the Rok Registry overlays named registry-* into their own registry/ directory.
  • Enable Rok Trackers to port-check arbitrary hosts.
  • Enable Rok Thrower to specify a user-defined host during port-checking.
  • Enable Rok Thrower to announce a user-defined host to a Rok Tracker.
  • Enable users to change the Rok Tracker configuration from the RokRegistryCluster CR.
  • Make the Rok Tracker trust by default the hosts that Rok Thrower announces.
  • Allow exposing the Rok Thrower using a LoadBalancer Service.
  • Restructure the “Deploy Rok Registry” guide.
  • Handle modify/delete conflicts during rebase
  • Fix rok-kf-prune to not remove necessary resources for cert-manager leader election.
  • Add a guide on how to configure a Rok cluster to sync data with other peers.
  • Introduce an ops guide for trusting a custom CA.
  • Use Dex as the default OIDC provider for authentication in Rok Registry.
  • Add user guide on how to register Rok cluster to Rok registry.
  • Add user guides on how to publish and subscribe to bucket.
  • Support containerd as a container runtime for Kubernetes, by configuring Argo to use the PNS executor.
  • Update the “Scale-in Kubernetes Cluster” documentation and remove the single node group, single Availability Zone requirement.
  • Support Ubuntu Bionic kernels 5.4.0-1049-azure and 5.4.0-1051-azure for AKS node pools.
  • Add repo to detect 5.4 kernel source packages in the Amazon Linux 2 image.
  • Support AMI release 1.18.9-20210722 [kernel version 4.14.238-182.422.amzn2.x86_64] for managed node groups on EKS.
  • Support AMI release 1.18.20-20210813 [kernel version 4.14.241-184.433.amzn2.x86_64] for managed node groups on EKS.
  • Support AMI release 1.19.13-20210813 [kernel version 5.4.129-63.229.amzn2.x86_64] for managed node groups on EKS.
  • Support Ubuntu bionic kernel 5.4.0-1044-gke for GKE.
  • Upgrade kubectl in rok-tools to 1.18.19.
  • Support Kubernetes version 1.19.
  • Restore “clone an existing Rok snapshot” functionality in VWA.
  • Set the failurePolicy to Fail in the MutatingWebhookConfiguration for the admission-webhook controller.
  • Upgrade Kale to fix local execution bugs and introduce some new features.
  • Set Node Exporter’s port to 9200 to avoid possible conflicts when deploying Rok’s monitoring stack alongside vanilla Prometheus’ installations.
  • Support AMI releases 1.18.20-20210826 and 1.18.20-20210830 [kernel version 4.14.243-185.433.amzn2.x86_64] for node groups on EKS.
  • Support AMI releases 1.19.13-20210826 [kernel version 5.4.129-63.229.amzn2.x86_64] and 1.19.13-20210830 [kernel version 5.4.141-67.229.amzn2.x86_64] for node groups on EKS.
  • Fix a bug where Rok Operator mishandled the trusted_CA_certs configvar during cluster upgrades.

Version 1.2.2 (Ruby)

  • Set imagePullPolicy to IfNotPresent in Istio manifests.

Version 1.2.1 (Ruby)

  • Update the Rok 1.2 upgrade guide to check for Kubernetes version 1.17 or 1.18.
  • Handle potential conversion webhook misconfiguration during upgrades.

Version 1.2 (Ruby)

  • Introduce a new Django view in Rok GW to serve HTTP GET requests at /metrics and expose Rok metrics in Prometheus’s text-based format.
  • Introduce a Grafana dashboard with multiple rows and panels to visualize Rok metrics, extracted from Prometheus’s TSTB.
  • Set newTag only if necessary when patching images for air gapped deployments.
  • Add python-authlib in the Debian packages to install for CI, RokE and Registry images.
  • Separate Istio deployment from Rok and Rok Registry in rok-deploy.
  • Introduce “Arrikto” and “air gapped” custom admonitions in docs.
  • Include the S3 action performed in the logs of the S3 daemon.
  • Include the names of all libs3 functions called in the logs of the S3 daemon.
  • Truncate the MiniKF image name to conform to the naming restrictions of GCP and AWS.
  • Use Kubernetes 1.17 for EKS clusters.
  • Implement a common button component in the UI.
  • Introduce social login buttons in the UI.
  • Improve the button hover functionality in the UI.
  • Disable GC cron jobs in Rok Registry clusters.
  • Generate the rok-dlm-break service dynamically, based on the type of the appliance.
  • Add python script in package rok_pu for testing individual target PUs.
  • Fix a bug where the Rok S3 daemon would not verify the SSL certificate of the S3 service it connected to.
  • Add a Rok cluster config variable to allow connecting to an S3 service without verifying its SSL certificate.
  • Configure Prometheus to run in multiprocess mode to allow Gunicorn workers to cooperate in order to expose GW metrics.
  • Restructure the ‘Prepare Management Environment’ section of the EKS docs to follow the current documentation guidelines.
  • Add the Prometheus Python client as a dependency to Rok’s Django library.
  • Install the Prometheus Python client in Rok Registry container images.
  • Add settings for external OIDC providers for Rok Fort.
  • Add the ‘SocialUser’ model which holds information about users who authenticate with external OIDC providers in Rok Fort.
  • Add support for authentication via external OIDC providers in Rok Fort.
  • Protect the OIDC endpoints using a state parameter.
  • Add support for the OIDC callback URL in the common UI code.
  • Extend Rok Registry UI to initialize/finalize OIDC cycles.
  • Prevent updating browser’s history in docs when scrolling.
  • Increase documentation’s content width.
  • Change ordered list design in docs.
  • Remove depth limitation from doc’s menu.
  • Update our docs with instructions on how to edit Registry-related images.
  • Fix a bug in Registry UI that was showing the “Sign In” form when there’s a single Social provider.
  • Introduce Python helper to calculate Rok’s build ID and use it from CMake.
  • Introduce Python helper to calculate the version for Rok’s Python packages and use it from CMake.
  • Include Rok Registry in the release procedure.
  • Extend rok-image-mirror to dump list of mirrored images.
  • Skip creating a pending cluster configuration if there are no changes.
  • Fix a bug that prevented setting cluster config variables to values that contain braces.
  • Extend Rok Operator to upgrade cluster config variables that are not specified under .spec.configVars, but are provided by the users as fields in the CR’s spec.
  • Add documentation for configuring external OIDC providers in Rok Fort.
  • Fix an incompatibility issue in Rok APIs that caused Prometheus metrics to be registered more than once in Python 3.
  • Fix a Python 3 compatibility bug in the Rok etcd3 client.
  • Implement an etcd backend for the Dynamic DNS API for MiniKF.
  • Introduce a Django based Dynamic DNS API for MiniKF AWS instances, that will serve names under the minikf.arrikto.ai zone.
  • Introduce arrikto-dev, arrikto-contact and air-gapped admonition directives in docs.
  • Allow long links to wrap in docs.
  • Gracefully exit GC task of rok-do when the working directory is empty.
  • Fix error logs in Rok Registry and Rok Fort due to Prometheus integration.
  • Fix a validation bug for config variables that have already been converted to the proper Python type.
  • Fix a bug in MiniKF’s provision script, where the list of downloaded images was not correctly passed to the ConfigMap of the admission webhook that sets the imagePullPolicy of downloaded images to Never.
  • Change the MiniKF’s admission webhook’s invocation policy, so that it is invoked again if a subsequent webhook (e.g., Istio injection webhook) further changes the Pod.
  • Introduce RDM overlay with a disk-script that works on Azure.
  • Upgrade Linux kernel in MiniKF to 5.4.104-0504104-generic to fix a Go runtime issue that made CSI sidecars crash because of hitting max locked memory limits.
  • Install virtualbox-guest-dkms and nvidia-440 in MiniKF of all supported platforms.
  • Do not attach the AmazonEKSClusterPolicy IAM policy to the EKS cluster IAM role.
  • Declaratively manage IAM roles needed to create an EKS cluster with AWS CloudFormation stacks.
  • Rename the assume-no-versioning command line argument of the Rok S3 daemon to --no-validate-versioning, and make it skip validation of S3 bucket versioning status when provided, regardless of whether versioning is used by the daemon.
  • Remove the --no-versioning argument from the Rok S3 daemon and automatically enable versioning when the IFC library is enabled via the --enable-ifc argument.
  • Instead of always listing versions to determine if an S3 bucket exists and is empty, only list versions if IFC is enabled, otherwise list objects, to ensure the S3 daemon is compatible with S3 APIs that do not support versioning.
  • Add a note for rebalancing the pods.
  • Update gcloud sdk in MiniKF, as currently pinned version was removed from repo.
  • Enable TCP keepalives globally in Istio.
  • Fix a bug where custom admonitions did not support multiple CSS classes.
  • Introduce toggle directive in docs.
  • Introduce foldable admonitions in docs.
  • Add sphinx-tabs extension for tabbled content in docs.
  • Fix a bug where a user couldn’t register a new Rok Registry from the settings page in the UI.
  • Fix email symbols handling in Rok Registry links in the UI.
  • Update NVIDIA driver and CUDA version in MiniKF to 460 and 11.2 respectively.
  • Mount ~/.docker/ on tmpfs to fix the broken symlink across MiniKF reboots.
  • Extend MiniKF to use rok-image-list and automatically generate the list of images that provision.py needs to pre-pull.
  • Use a newer version of python3-git to work with packed-refs created from newer Git versions. As a result, fix some import issues.
  • Redesign MiniKF’s landing page for Vagrant.
  • Use our own nginx-ingress-controller kustomization instead of Minikube’s ingress addon.
  • Use manifests to deploy Istio Ingress instead of applying a formatted string value.
  • Extend MiniKF to read docker/images-exclude and exclude images mentioned in this file.
  • Fix a bug in Rok UI where it throws a NullInjectorError for the AuthUrl InjectionToken.
  • Fix a bug that resulted in an incorrect suggested file name in Dataset snapshot policies.
  • Fix a bug where after changing the file name of a snapshot policy, the Rok UI would still display the default value.
  • Produce a smaller Vagrant box for MiniKF by excluding non-critical images from the pre-pull list.
  • Extend rok-version to generate a valid SemVer for MiniKF.
  • Fix a bug in MiniKF where it would always try to pull images from index.docker.io even if they exist locally.
  • Add design doc for authentication with external OIDC providers in Rok Fort.
  • Increase the amount of required RAM for MiniKF on VirtualBox from 10GB to 12GB.
  • Exclude extra Docker images from MiniKF on GCP to improve provisioning times.
  • Implement a composite authentication backend for the MiniKF Dynamic DNS API, to allow bearer token authentication for instances and admins.
  • Ensure that no stale containers are left in the final MiniKF image.
  • Update APT cache before installing kernel build dependencies on Ubuntu.
  • Support Ubuntu Bionic kernel 5.4.0-1040-azure for AKS node pools.
  • Support Ubuntu Xenial kernel 4.15.0-1108-azure for AKS node pools.
  • Support Ubuntu Xenial kernel 4.15.0-1109-azure for AKS node pools.
  • Support Ubuntu Xenial kernel 4.15.0-1111-azure for AKS node pools.
  • Extend the rok-tools manifests to support deployment on Azure.
  • Disable Azure’s Admissions Enforcer for Istio.
  • Support RDM on Azure.
  • Retry Kubernetes watch() operations on ProtocolError exceptions.
  • Enable TCP keepalives in rok-kubernetes Python module.
  • Install Azure CLI in rok-tools.
  • Bring rok-deploy up-to-date with the latest instructions for cloning our GitOps repository.
  • Introduce manifests to deploy S3Proxy on AKS.
  • Extend the docs with instructions to deploy Rok over S3Proxy on Azure cloud.
  • Deploy Rok’s external services (etcd/PostgreSQL/Redis) on Azure.
  • Expose services on Azure.
  • Configure Azure CLI inside rok-tools.
  • Set up a cloud environment for Azure inside rok-tools.
  • Support creating an AKS cluster.
  • Introduce the rok-kf-rebase CLI tool to help with manifests rebase.
  • Introduce the rok-kf-prune CLI tool to help with resource pruning during upgrades.
  • Update to Enterprise Kubeflow 1.3 manifests.
  • Add upgrade instructions for EKF 1.3.
  • Remove EKS references from platform-agnostic sections of the docs.
  • Add a maintenance guide with instructions on how to add an internal GitHub repository as a backup GitOps remote.
  • Add a maintenance guide with instructions on how set up cluster-wide access to a Docker Registry.
  • Add aliases for Kubernetes memory units Ei, Pi, Ti, Gi, Mi, Ki.
  • Introduce script to scale-in a Kubernetes cluster.
  • Improve highlighting of prompts in doc’s code blocks.
  • Update the Debian base image rok-do uses to debian/snapshot:stretch-20210511.
  • Use Kubernetes 1.18 for EKS clusters.
  • Expose services on AWS using Classic Load Balancer.
  • Fix a validation check for emails in our githooks that failed if an email address contained a dot.
  • Add maintenance guide for adding users in dex.
  • rok-k8s-drain: Fix scale-in script to handle Unauthorized Errors.
  • rok-k8s-drain: Remove K8s configuration confirmation question.
  • rok-k8s-drain: Ask for user input confirmation.
  • rok-k8s-drain: Update log file location.
  • rok-k8s-drain: Fix help argument to work with missing kube config file.
  • rok-k8s-drain: Use AWSRegion Question instead of AWSRegionArgument.
  • Introduce script to protect Arrikto EKF Pods from OOM conditions and CPU starvation.
  • Support rendering the Rok 2.0.2 "Aurora" (release - release-2.0) (iliastsi@rok-dev) (GCC 6.3.0) 2023-03-31T13:49:56Z in docs.
  • Increase the buffer size that NGINX Ingress Controller allocates for reading HTTP response headers, so that it doesn’t fail when the Rok UI returns large headers.
  • Add upgrade instructions for NGINX Ingress Controller.
  • Fix supported list of kernels.
  • Support AMI releases 1.17.12-20210526, 1.17.12-20210621, 1.18.9-20210526 and 1.18.9-20210621 [kernel version 4.14.232-176.381.amzn2.x86_64] for managed node groups on EKS.
  • Remove check for the AWS CLI credentials file when deploying in EKS.
  • Make the deploy overlays of our kustomizations build-able.

Version 1.1.1 (Quartz)

  • Update the Rok 1.1 upgrade guide to check for Kubernetes version 1.17.

Version 1.1 (Quartz)

  • Make our AWS CloudFormation client, and rok-s3-authorize by extension, idempotent.
  • Improve the periodic rule of Rok API version retention policies to retain the latest instead of the earliest version in each interval.
  • Do not include group members in the files list API call of the Rok API.
  • Extend the files list API call of the Rok API to support including deleted files in the response.
  • Include the number of versions of each object in the files list API call of the Rok API.
  • Support pagination in the files list API call of the Rok API.
  • Extend Rok’s provisioning tool for Kubernetes with the –delete mode to delete specified Kustomize packages.
  • Add a loader to the select all button of the Rok UI.
  • Use pagination in the copy and delete files dialogs of the Rok UI.
  • Use pagination in the files list page of the Rok UI.
  • Remove a backwards compatibility fix for Rok versions v0.10 or earlier, that allowed passing the task ID in place of the bucket name to retrieve a task by ID in the API call to list the tasks of a bucket in the Rok v1 services API.
  • Replace the coarse grained authorization which was applied by the Rok API to provide namespace isolation with fine grained authorization tests for each API call, ensuring the user is authorized to perform the specific action they requested.
  • Remove a workaround that automatically added the Kubeflow-UserID header in all Rok client requests performed inside a Kubernetes cluster.
  • Only allow authenticating via a token in the Rok client and CLI.
  • Drop the GW_ part of all environment variables used by the Rok client. For example, rename ROK_GW_TOKEN to ROK_TOKEN.
  • Use the Authorization: Bearer <token> header instead of the X-Auth-Token: <token> header for authentication in the Rok API and client.
  • Relax a restriction in our githooks that required every introduced Rok config version in our repo to also immediately be the target one.
  • Support using more than one authentication backend simultaneously in the Rok API.
  • Support authentication via Kubernetes tokens in the Rok API.
  • Retrieve the CSRF token from the X-XSRF-Token header in the Rok API.
  • docs: Document how Rok CSI handles auto-registration for VolumeSnapshots
  • Introduce more fine-grained ClusterRoles for users and administrators to provide access to the Rok API.
  • Restrict access to individual Rok API services via RBAC rules.
  • Fix a bug where Rok API tasks created using a Kubernetes token failed to access the Kubernetes API due to using the user ID instead of the username for impersonation.
  • Introduce a design document for the Kubernetes Rok operator.
  • Restrict Rok CSI to only allow registering VolumeSnapshots in the same Rok account as the snapshot’s Kubernetes namespace.
  • Restrict Rok CSI to only allow creating PVCs from a Rok URL in the same Kubernetes namespace as the account of the Rok URL.
  • Remove support for the rok/origin-fisk and rok/origin-fisk-group annotations from Rok CSI, which violated namespace isolation by allowing users to register any fisk into their account.
  • Extend our APT helper to install packages in a batch while retaining progress reports.
  • Remove a 500ms delay from our progress messages in the ‘dialog’ frontend.
  • Use a distinct call to list group members in the versions list page of the Rok UI.
  • Introduce separate tasks to manage different deployments repos.
  • Rename the Rok CLI from rok-gw to rok.
  • Automatically reload tokens before every request in the Rok client if they have been provided using the file: prefix.
  • Extend rok-do to garbage-collect local artifacts.
  • Add design document for Rok CLI questions.
  • Set argparse.SUPPRESS as the global default for CLI args and display the enclosing Question’s default in the CLI arg’s help message.
  • Do not mutate CLI argument defaults via preseed files.
  • Extend rok-version with the –build-tag argument to report the versioned tag of build artifacts.
  • Extend Rok’s build version with the source branch of the release.
  • Add license, build type and git branch information to rok-do tasks that manage manifests, docs and the deployment repositories.
  • Introduce per release open-ended upgrade notes and fold any generic ones into the version-specific ones.
  • Include fixes for upstream dm-era bugs in the rok-kmod images.
  • Introduce a script to upgrade the image of all notebooks in a cluster.
  • Create Rok Registry images with rok-do.
  • Introduce a script to perform a rolling reboot of a Kubernetes cluster.
  • Introduce a script to reset the CBT data of all Rok PVCs.
  • Fix a bug where the Rok etcd library would sometimes report an incorrect number of retries in its logs.
  • Fix a bug where the Rok DLM CLI would incorrectly log warnings about all other DLM clients being missing when requested to retrieve information for one of them.
  • Fix an out of bounds memory access bug in the Python bindings of librok_dlm that resulted in the rok-dlm CLI occasionally segfaulting and leaving behind stale locks after a pod restart.
  • Extend rok-deploy to deploy Rok Registry clusters and split the deployment process into three steps: Deploy, Generate manifests, Apply manifests.
  • Improve the Kubeflow recurring runs upgrade instructions to use the Jobs page and clone old failing runs.
  • Include the user’s AWS account ID in the default S3 bucket name prefix.
  • Omit the -rok-rok suffix from the name of the CF stack and related IAM resources needed to grant Rok full access to S3 buckets.
  • Fix a bug where the modal for entering an authorization code in Rok UI closes unexpectedly.
  • Use UI’s path as a prefix when storing and retrieving localStorage values.
  • Introduce rok-do tasks for building the Rok Documentation with any combination of (builder type, tags).
  • Incorporate the public tag of the Rok Documentation into the logic/content of the docs.
  • Use Debian image snapshots as the base Docker images for rok-do tasks.
  • Add an option that disables the offline warning notification for specific requests in the UI.
  • Remove the v prefix from Rok version and related artifacts.
  • Fix a bug where the Rok S3 daemon would attempt to assume an AWS role using the AWS STS endpoint of an incorrect region.
  • Revamp the Rok S3 daemon bucket versioning validation to first retrieve the versioning, and then if required either update it during formatting or fail with an error during validation.
  • Support deploying Rok over pre-existing, empty S3 buckets
  • Fix a wrong route in Authservice’s SKIP_AUTH_URLS setting.
  • Replace the patchesStrategicMerge and JSONPatches6902 fields with the patches one in the kustomization file of monitoring’s deploy overlay.
  • Allow the user to verify if the S3 IAM role exists, instead of making it a strict check in rok-deploy.
  • Prevent the auto-redirect to the Kubeflow dashboard from the OAuth callback page.
  • Highlight the active menu item in the Rok docs.
  • Upgrade Font Awesome version in docs.
  • Improve the appearance of admonitions in the docs.
  • Allow selecting the prompts in all code-blocks except console in the docs.
  • do: Improve the way we clean up and snapshot MiniKFs
  • Loosen the newsworthiness check of our githooks by ensuring that at least one of NEWS.rst, Changelog.rst is updated by a commit that closes a GH issue.
  • Fix a bug in the responses of the OAuth endpoints in the Rok API.
  • Use the correct Registry base URL in the Rok UI during the Rok registration process.
  • Support using classic ELB instead of ALB to expose NGINX.
  • Support terminating TLS on NGINX instead of using an ACM certificate at ALB.
  • Introduce manifests for creating self-signed certificates and expose Rok+EKF with ELB in front of NGINX.
  • Support AMI release 1.16.15-20210310 [kernel version 4.14.219-164.354.amzn2.x86_64] for managed node groups on EKS.
  • Fix rok-lio bug that causes rok-csi to misdetect whether a Fisk is exposed as a block device.
  • Fix race in the pre-clone verification step of LVMd that could lead to errors, such as failures to unexport the origin Fisk, I/O errors, and stale TCMU handlers.
  • Support applying different set of patches for each supported kernel version in do/kmod tasks.
  • Support AMI release 1.16.15-20210322 [kernel version 4.14.225-168.357.amzn2.x86_64] for managed node groups on EKS.
  • Support serving multiple versions of the docs.
  • Fix rok-do to download the correct kernel source for Ubuntu kernels.
  • Support AMI releases 1.16.15-20210329 and 1.16.15-20210414 [kernel version 4.14.225-169.362.amzn2.x86_64] for managed node groups on EKS.
  • Support AMI release 1.16.15-20210501 [kernel version 4.14.231-173.360.amzn2.x86_64] for managed node groups on EKS.
  • Support AMI releases 1.16.15-20210504, 1.16.15-20210512 and 1.16.15-20210518 [kernel version 4.14.231-173.361.amzn2.x86_64] for managed node groups on EKS.
  • Mark Rok and RokCSI Pods as critical, to avoid OOM kills and evictions.
  • Improve the copy button, implement exactly the same behavior as manually selecting and copying text.
  • Improve copy behavior for secondary prompts in doc’s code blocks.
  • Improve text color for command’s output in doc’s code blocks.
  • Improve copy behavior in doc’s code blocks with command’s outputs.
  • Add CPU requests for RokE and Rok CSI containers to protect them from CPU starvation.

Version 1.0 (Platinum)

  • Fix a bug where the account selector in the Rok UI sometimes displayed the incorrect account.
  • Do not display a logout button when logging out is not possible in the Rok UI
  • Fix a bug where Rok API drivers would use the account instead of the user to perform authorization checks for tasks.
  • Fix a bug where the Rok UI would sometimes raise an undefined variable exception after logging in.
  • Fix a bug where the Rok UI would ignore the namespace selected via the Kubeflow dashboard selector
  • Fix a bug where the Rok UI would not render correctly in a Kubeflow environment.
  • Fix a bug where Kubernetes exceptions would not be converted to a Unicode string properly, resulting in the messages of Kubernetes errors not being visible in Rok task logs.
  • Fix a bug where the Rok client would fail to retrieve the user’s ID when using static authentication.
  • Remove secrets from the allowed variables in Rok CSI auto-register URLs.
  • Fix a bug where Rok CSI would fail to auto-register a VolumeSnapshot when the Rok API was using AuthService authentication.
  • Fix a bug where Rok CSI would fail to hydrate a PVC when the Rok API was using AuthService authentication.
  • Give Rok CSI a rok-admin ClusterRole to allow it to access to all Rok accounts.
  • Extend Rok’s provisioning tool for Kubernetes with the –apply mode to avoid questions, skip regeneration of manifests and only apply specified Kustomize packages.
  • Make rok-do fail by default if a path in the host is needed by a task and it does not exist.
  • Replace CommandNotFoundError with CommandOSError, which is more broad and accurate.
  • Fix the logging of byte strings (and the b’…’ prefix) in the cmdutils module.
  • Persist the home directory of user root inside rok-tools by mounting a Docker volume or Kubernetes PVC at /root.
  • Correctly display the account name instead of the user ID in Rok CLI.
  • Move authorization code from the Rok API views to a dedicated backend.
  • Store the Kubernetes namespace UUID in Rok API accounts and verify it matches the one on Kubernetes with every request to prevent accessing resources on Rok after the namespace has been deleted.
  • Add fine-grained authorization to account metadata updates in the Rok API.
  • Introduce the rok-cluster-admin ClusterRole for Rok cluster administrators on Kubernetes.
  • Prevent auto redirect to KF dashboard when the Rok UI is in chooser mode.
  • Bump the version of Istio that Rok’s provisioning tool for Kubernetes installs to 1.5.7.
  • Remove a late import in Rok’s log formatting code, which could cause a deadlock between the log handler’s lock and the Python module import lock during the initialization of the Rok client by Rok CSI.
  • Improve the style of all links in the Rok UI.
  • Display the number of versions in the object list of the Rok UI
  • Migrate githooks to Python 3.
  • Use Angular’s infinite scroll component in the Rok UI.
  • Implement search support for buckets and objects in the Rok UI.
  • Export the Rok client, its error classes and the helpers responsible for querying Rok URLs at the Rok client’s module level.
  • Introduce a helper to the Rok client to list the members of a group.
  • Fix a bug where Rok CSI would sometimes use the incorrect Rok API version when restoring a volume from the Rok URL of a group.
  • Introduce group delete for objects and versions in Rok UI.
  • Improve messaging in UI’s network errors.
  • Suppress C812 Flake8 error, because it doesn’t offer us much and leads to a bit uglier code.
  • Perform retries when setting the versioning status of an S3 bucket, to workaround the fact that the S3 API sometimes returns 404 errors for buckets that have just been created.
  • Suppress E741 Flake8 error, because most monospace fonts already do a good job at showing “l”, “I” and “1” differently.
  • Add a way to lazily evaluate Task attributes in rok-do
  • Introduce rok-dev, a Debian Stretch environment for Arrikto devs.
  • Enable logs in UI’s production builds
  • Fix CRD validation in Istio kustomizations.
  • Provide a ClusterRoleBinding for the rok-admin and rok-cluster-admin ClusterRoles to the rok and rok-operator ServiceAccounts.
  • Fix Githooks random behavior regarding flake8 checks
  • Add support for creating a Docker image with Python 3.5.1 installed.
  • Preserve LC_ALL when running tasks in a remote with rok-do.
  • Build Rok Enterprise Docker images with rok-do
  • Improve rok-dev with support for running rok-do
  • Make Python bindings compatible with Python 3 and ship the corresponding Python 3 packages.
  • Add support for building the Rok Operator Docker image with rok-do.
  • Add support for building the Rok Disk Manager Docker image with rok-do.
  • Add support for building the Rok CSI Docker image with rok-do.
  • Give Rok CSI nodes the rok-admin ClusterRole, to provide them access to all Rok accounts.
  • Reduce configd log spam by rendering config only if member is not up-to-date
  • Improve the Rok API error message when accessing an account for a Kubernetes namespace that does not exist.
  • Fix a bug where the Rok Composer could deadlock while serving simultaneous requests to delete and access a fisk.
  • Support snapshot policies in the Rok GW Jupyter driver.
  • Support snapshot policies in the Rok GW dataset driver.
  • Reduce electiond log spam by watching the master lease without timeout.
  • Preserve query parameters when the namespace changes in Rok UI.
  • Add documentation for cmdutils, as well as a developer guide with examples for some common scenarios.
  • Extend LVMd to report successful snapshot completion.
  • Allow LVMd to recover from an interrupted snapshot.
  • Introduce config variables to setup cron jobs for local/global GC.
  • rok-csi: Add support for garbage collecting LVs and nodelocal fisks owned by LVMd.
  • Remove the “escalate” permission from the Rok Operator/Cluster pods.
  • Fix a bug where the UI was showing the wrong object count when deleting objects.
  • Add a mixin with common helpers for Rok-related tasks in rok-do.
  • Handle existing tags in deployments repo and avoid tagging trunk versions.
  • Handle transient disconnections in a less intrusive way in Rok UI.
  • Introduce a user guide for snapshot and retention policies.
  • Disable msg_delay in text progressbar
  • lvmd: Ensure we delete stale resources under normal operation.
  • rok-csi: Skip GC-ing nodelocal fisks when composer runs in non-nodelocal mode.
  • rok-csi: Improve GC logs.
  • Add a rok-do task to GC old Docker images used by rok-do.
  • Fix a bug where the rok_common.apt Python module would ignore failures to update the APT cache, because apt-get update returns with a 0 exit code.
  • Fine-tune the update strategy for rok-disk-manager and rok-kmod DaemonSets so that they can be upgraded in parallel.
  • Remove the message limits in the Rok etcd v3 client.
  • Add support for building the Rok Tools Docker image with rok-do.
  • Fix running rok-do subtasks as direct goal tasks.
  • Implement API call to retrieve the members of a group in the Rok API.
  • Make the task-gc management command more efficient by avoiding having to protect all parameters of all tasks.
  • Use an LRU cache for the classes dynamically created when protecting objects to fix a performance issue when protecting large numbers of objects. This will also improve performance of task-gc management command.
  • Improve the efficiency of recursive listing in the etcd v2 emulation client by using a node index when formatting the response.
  • rok-csi: Extend GC to unfreeze frozen filesystems and collect stale device mapper devices.
  • Document how we generate Docker images for the Kubernetes CSI Sidecars.
  • Remove any force-cleanup logic from rok-deploy that could purge a non-empty directory specified by the user as their local GitOps repository.
  • Introduce manifests to deploy a monitoring stack alongside Rok on Kubernetes, based on Prometheus and Grafana.
  • Configure Prometheus to periodically scrape and store metrics from Rok’s etcd.
  • Add a dashboard to Grafana to visualize Rok’s etcd metrics.
  • Configure Prometheus to periodically scrape and store metrics from Rok’s Redis.
  • Add a dashboard to Grafana to visualize Rok’s Redis metrics.
  • Add public document with description and deployment steps for Rok’s monitoring stack on Kubernetes.
  • Use Kubernetes 1.16 for EKS clusters.
  • Work around a Mitogen issue where the standard I/O streams in the remote are in non-blocking mode.
  • Update the code for deploying a Rok Registry cluster.
  • rok-csi: Record all logs and progress updates as events on the corresponding Kubernetes object.
  • rok-csi: Allow displaying the subjob progress along with the total progress.
  • rok-gw: Allow displaying the virtual subtask progress along with the total progress.
  • rok-csi: Fail stale VolumeSnapshots after Pod restart
  • do: Warn when a task does not support caching
  • Fix task’s logs alignment in Rok UI
  • rok-csi: Support migrating PVs from cordoned nodes.
  • do: Create rok-kmod image using Debian packages.
  • Decouple do task NGINXStaticSite from docs
  • do: Support caching in NGINXStaticSite
  • Introduce the run-if-master tool to allow easily running commands on the master node of the Rok cluster.
  • Introduce a helper to acquire an exclusive cluster-wide DLM lock.
  • do: Take the env and entrypoint task attributes into account when caching a task.
  • Introduce a way to uniquely identify a process in a running host, by computing an ID that cannot be reused during the host’s uptime.
  • Extend the run-if-master tool to break all stale DLM locks left behind by the process it executed.
  • Allow garbage collecting Rok API tasks based on their status.
  • Enable automatic garbage collection of Rok API tasks in the Rok cluster.
  • do: Hint to the task that must run when a fromsnap is not found.
  • do: Support adding labels to rok-do snapshots.
  • do: Add support for GCP remotes.
  • Support provisioning MiniKF using the new Kubeflow manifests.
  • Remove Pod deletion logic from Rok Operator; delegate this task to the DaemonSet Controller
  • do: Automate building MiniKF images for GCP.
  • deploy: Improve auto-detection of EKS cluster name to handle clusters created with eksctl.
  • do: Automate building MiniKF images for AWS.
  • Use the j2 CLI to render Jinja templates instead of using envsubst and environment variables.
  • csi: Unpin both used and unused PVCs.
  • csi: Produce events when pinning/unpinning a volume.
  • csi: Automate garbage collecting completed jobs every hour.
  • csi: Do not crash if etcd goes down.
  • Update the Rok operator and systemd units to break locks in the master namespace.
  • Fix an issue where computing the run ID of a process occasionally failed due to a bug when parsing the process stat file.
  • Use fixed size widgets in our dialog based frontend.
  • Fix yld() not to leave open fds behind.
  • electiond: Fix a bug where if the Rok master node was permanently removed, other nodes did not attempt to become master.
  • cluster: Do not lock the master lease just for inspecting it
  • aws: Add CloudFormation support
  • minikf: Reduce timeout limit of APT connections
  • lvmd: Log info that can help us debug filesystem related issues.
  • lvmd: Verify the filesystem state.
  • lvmd: Recover the filesystem journal when activating volumes.
  • csi: Use the same PU object for both CSI and LVMd running on the same process.
  • liod: Set timeout for tcmu_handler while waiting for a connection with Rok to succeed to infinity.
  • operator: Use the kubernetes.io/hostname Kubernetes node label over the name one to schedule Rok CSI Guard Pods more robustly.
  • manifests: Remove Pod Disruption Budgets for Istio.
  • operator: Take into account unschedulable nodes when calculating which nodes to guard to avoid unneeded resource create-delete-recreate cycle.
  • Use the watch helpers provided by the Rok etcd clients when watching for document changes in the Rok API.
  • operator: Emit more events to increase observability into the cluster scaling algorithm
  • Add design document for Rok Disk Manager (RDM)
  • Revamp Rok Disk Manager to always request LVs with size that is a multiple of the block size, i.e. 512.
  • RDM: Hash block devices based on the underlying kernel device, not their path.
  • Fix a bug where rok-deploy modified the kustomization file for Istio, removing some useful resources/transformers.
  • docs: Extend our guides with instruction on how to create a dedicated VPC for the EKS cluster
  • Add missing packages (curl and bsdmainutils) in rok-tools image
  • rok-gw: Fix a bug where the Rok StatefulSet driver would create a group resource with the wrong order for the registered disks.
  • rok-gw: Fix a bug where the Rok StatefulSet driver would not sort the Pod names correctly, placing pod-10 before pod-2 inside the generated group resource.
  • csi: Document how to create a StatefulSet from a Rok group resource using the rok/origin annotation.

Version 0.15.1 (Onyx)

  • Move docs out of the CMake build system.
  • Make the building of docs depend on version-specific manifests.

Version 0.15 (Onyx)

  • manifests: Use latest kmod image and kubeflow/manifests
  • Revamp the instructions to test a Rok installation on EKS
  • doc: Use proper mount for Docker
  • doc: Add deploy overlays to EKS guide manual option
  • doc: Update instructions of building the rok-kmod image
  • manifests: Add .cache kfctl folder to gitignore
  • Enhance guides of onboarding and release procedure
  • cli: Store logs under ~/.rok/log
  • operator: Fix bug with stale cluster config
  • Add instructions to configure the Kubernetes namespaces and RBAC rules after installing a Rok cluster in EKS
  • scripts: Fix tag creation in manifests script
  • rok-kmod: Update Dockerfile.local with missing kernel
  • Restore all Rok probes except the one used by the Rok appliance to Python 3
  • conf: Set master_capable to True on Kubernetes
  • deploy: Provision auth components
  • doc: Treat warnings as errors when building with Makefile
  • Fix an invalid JSON document in the EKS installation docs
  • scripts: Make manifests script adopt existing repos
  • doc: Mention EKF instead of MiniKF
  • doc: Do not copy the results when user select text
  • manifests: Use string replacement instead of jinja2 templating
  • kmod: Don’t start a progress bar if there are no modules to install
  • gw: Always display cancel button in services form
  • Static rok and ekf themes
  • doc: Do not copy the results shown in blocks
  • gw: Move namespace selector into its own component
  • doc: Update Kubeflow integration doc
  • Hide and show code blocks in docs
  • Make our manifests templates and have bases only refer to proper image tags
  • Introduce a developer guide for the Kubernetes client’s initialization
  • Fix a bug in the Kubernetes Rok API drivers that caused SubjectAccessReview requests to sometimes fail with an unauthorized error
  • doc: AuthService Integration
  • Kubernetes: Configure dockerconfig with rok-deploy
  • Introduce the v2 services and OAuth APIs in Rok, to allow Rok clients to interact with any account instead of only the one matching their user UUID
  • Include CMake>=3.8.2 as new a build dependency since we make use of the COMMAND_EXPAND_LISTS option of add_custom_command.
  • Make AuthService authentication the default in Kubernetes
  • Introduce the AUTHORIZATION_BACKEND setting for the Rok API to control the way requests are authorized
  • Convert all Rok API authentication backend names to lowercase
  • Rename the static-authservice authentication backend to authservice in the Rok API
  • Fix custom fonts in doc
  • Further improve Python 3 compatibility
  • doc: Use example.com in our public docs
  • Make Kubeflow-UserID the default user header when using AuthService authentication in the Rok API
  • doc: Improve doc on Kubeflow’s integration with GitLab
  • Fix services request with namespaces
  • Enhance Kubeflow integration and use ekf overlays in KfDef
  • doc: Fix broken copy button
  • manifests: Move Rok manifest to its proper place
  • Revert Rok probes to Python 2 to workaround missing dependencies for the Rok cluster probe
  • Make the Rok etcd3 client compatible with Python 3
  • Automatically allow access to Rok API resources to users that have access to Kubeflow resources in the same Kubernetes namespace
  • doc: Add absolute URL in snippet commands
  • cmake: Separate ctypesgen preprocessor flags
  • Kubernetes: Refactor manifests
  • Fix a bug in the Rok S3 daemon template
  • Build custom dex image
  • Enable the Rok API and UI to run behind Istio with AuthService authentication
  • common: Detect dirty repo and return trunk version
  • Kubernetes: Make Redis probe Python3-compatible
  • etcd: Add Python3 package for v3
  • doc: Extend docs and add integrations
  • Enable building reproducible rok-kmod images locally
  • kmod: Fix typo in Ubuntu PPA Dockerfile
  • rok-tools: Serve Rok’s public docs
  • rok-kmod: Use rok-kmod debian package in rok-kmod’s Dockerfile
  • githooks: Exclude json.in from Copyright check
  • debian: Introduce rok-kmod package
  • rok-kmod: Convert to Python3 and introduce python package
  • doc: Make public docs customer-friendly
  • common: Properly dump to file in current dir
  • Kubernetes: Introduce rok-deploy
  • probes: Make probes library Python3 compatible
  • doc: Change doc’s layout
  • common: Open dump_to_file in text mode by default
  • Mention bootstrapping in the docs
  • Make a number of small fixes to the Rok client to ensure our CI tests pass after transitioning to Python 3
  • ci: Configure locale inside chroot
  • Update the botocore dependency of the Rok AWS library to 1.12.103
  • Integrate Rok with ctypesgen 1.0.2
  • doc: Fix broken copy button image in nested docs
  • Support mass deletion in the Rok UI
  • Revamp the initialization of The Rok S3 daemon to identify deployment errors as soon as possible
  • Introduce formatting and validation to all Rok PUs
  • Correctly include the Rok Tools template in the docs
  • kmod: Build reproducible rok-kmod images
  • doc: Do not copy/link sources in public docs
  • Minor fixes in the Python wheels doc
  • Fix error reporting in Python 3 in the Rok client
  • kmod: Find available custom modules
  • Give to modules installed by rok-kmod the highest priority
  • Introduce instructions for EKS
  • Add design document about the formatting and validation of Rok daemons
  • Introduce kustomize overlays for EKS
  • Introduce Rok Tools
  • doc: Make various adjustments to the rok-do guide
  • Avoid retrying all available methods of retrieving security credentials when updating them in the Rok S3 daemon
  • Support reading values from a file in the Rok C argument parser
  • Display bucket descriptions in the Rok UI
  • Prepare towards Python3 packages
  • rokfs: Make ioctl prototype conditional
  • operator: Set/apply cluster config
  • Allow deleting a specific bucket or all buckets of a Rok cluster using the Rok AWS helper scripts
  • Make rok_cluster an optional dependency of rok_aws
  • Add entrypoints for AWS helper scripts
  • Add AWS C++ SDK to rok-do build dependencies
  • cmake: Use -Og on Debug and fix ctypesgen flags
  • Kubernetes: Use rok-probed in initContainers
  • Make the Rok commmon helpers converting strings to bytes and Unicode Python 3 compatible
  • doc: Add Rok upgrade guides for Kubernetes
  • Disable Fort signups
  • operator: Cluster-neutral logging
  • scripts: Allow purging multiple S3 buckets at once
  • Add search support in Rok UI
  • rdm: Activate the LVs when loading a VG/LV
  • Check if the source directory exists when adding Python tests in CMake
  • gw: Display the number of versions in objects list
  • gw: Change link style across the Rok UI
  • Update rok-do instructions
  • cmdutils: Add check and log_error to wait()
  • bootstrap: Improve validation
  • Kubernetes: Treat configVars as object
  • cmake: Add non-bootstrapped env as possible failure reason
  • libredis: Implement scanning keys and batch deletions
  • libtasks: Fixes and support for disabling logging to frontend
  • Remove a stale file
  • Add support for the IAM Roles for Service Accounts feature of EKS to the Rok S3 daemon
  • Add a design document explaining in detail the way Rok pods gain access to AWS services when running within an EKS cluster
  • Improve handling of time durations in timeutils
  • Add script to attach an IAM role to the Rok service inside an EKS cluster
  • Add script to purge an S3 bucket
  • rok_args: Do not set dest for Sensitive arg
  • libredis: Fix various bugs
  • gw: Disable group toggle button when group is empty
  • Add bootstrap and get build version with Python
  • gw: Remove created info from task popover
  • dm clone: Fix discard handling and overflow bugs which could cause data corruption
  • operator: Add helpers to get CR info as rok-init metadata
  • Add new tooltip messages
  • Introduce file badge component in Rok UI
  • githooks: Use relative paths for symlinks
  • scripts: Fix a check for enabled githooks
  • Fix various issues related to double reclassing
  • Add guidelines for testing to the Rok documentation
  • Add script to attach EBS volumes to a Kubernetes cluster
  • Add perf tests for libfiber
  • gw: Use bigger icons in services header
  • Introduce new delete dialogs in UI
  • Fix monospace and bold in UI
  • Styles changes in authorizations page in UI
  • Minor Kubernetes-related fixes
  • libredis: Refactor code and support retries
  • conf: Fix ip_reachable and remove default gateway verification
  • Correctly initialize the Rok 0.15 client in MiniKF
  • Update the MiniKF kustomize templates and wheels
  • blkutils: Add –force for RAID devices with 1 drive
  • Improve reporting of sizes in CLIs
  • config: Factor out DLM lock break
  • Fix and upgrade custom tensorflow images
  • Update the Dockerfile used to produce the notebook image to create the required Python wheels using rok-do
  • Make rok-do less noisy in case of errors
  • conf: Support disabling host header check
  • libredis: Enforce redis scheme
  • scripts: Improve add_signature() to work on rebase
  • Update Rok Kubernetes guides
  • operator: Support cluster upgrades
  • End-to-end building of Python wheels with rok-do
  • operator: Retrieve secrets from CR
  • Add generic helpers to get, list, and retrieve the owners of resources to the Rok Kubernetes client.
  • libmap: Migrate epoch cache to Redis
  • Keep logs in case cronic fails
  • Do not deepcopy service params to increase the performance of service-related API calls
  • operator: Remove hardcoded cluster refs
  • rok-csi: Recover volumes from deleted nodes
  • libredis: Introduce connection pool
  • Add a simple graph implementation to the Rok common module
  • kustomize: Manage Rok Storage/VolumeSnapshot classes
  • trpt: Print message when magic number is invalid
  • rok-init: Add basic support for upgrading clusters
  • operator/kustomize: Add Redis endpoint
  • appliance: Add redis endpoint
  • operator: Fix bug in member removal
  • doc: Update stretch build dependencies
  • python/pu: Check PU status before releasing objects
  • python: Replace select() with poll()
  • Search for ext2/ext3/ext4 libraries in CMake
  • Add a bucket icon in Rok UI’s breadcrumb trail
  • Fix dependency to the PyYAML package in the Rok Kubernetes client
  • kustomize: Introduce Redis
  • Correctly display access tokens which were issued without an application
  • libredis: Introduce a Redis library
  • electiond: Improve detecting master changes
  • operator: Fix typo in postgresql_probe()
  • Specify arbitrary device attributes for CSI volumes
  • Reduce LU-oriented lock contention in the I/O path
  • csi: Start dm-clone monitoring threads after successfully initializing lvmd
  • csi: Fix imports
  • lvmd: Fix imports
  • lvmd: Don’t snapshot discarded blocks
  • Do not retry ENOENT on get_ca()
  • gw: Refactor objects and versions list in UI
  • doc: Fix indentation errors
  • conf: Remove the templates and the render.py from etcd
  • Fix a bug where tasks would never be finalized if they contained a value that cannot be JSON serialized
  • Use common’s document view component in event info page
  • conf: Do not use hostname as member ID fallback
  • conf: Support config annotations
  • Kubernetes: Extend cluster CRDs with status
  • gw: Fix imports in webpack’s dev config
  • Use common form component in Rok UI
  • rdm: Support parsing and applying scripts line-by-line
  • minikf: Some libtask fixes before updating provisioning script
  • Use only scoped imports in UI
  • common: Relax type restriction in format_duration
  • common: Update copyright date in UI
  • lvmd: Add support for replicated volumes
  • Convert utility class in UI
  • Fix a bug when waiting for a failed task in the Rok client
  • Factor out our internal Kubernetes client
  • common: Add Python functions to calculate versions
  • Introduce two new probes to test the readiness of an etcd and PostgreSQL deployment
  • Add options to wait until a readiness probe succeeds or a liveness probe fails
  • conf: Support atomic config apply
  • common: Revamp error service in UI
  • Replace prettytable with printutils
  • Change position strategy in UI
  • libtrpt: Minor performance optimization
  • Fix some minor email issues in Registry
  • Deploy Rok Registry with Istio on Minikube/GKE
  • operator: Rework init container cmds
  • thrower: Work with any lz4 version
  • lvmd: Properly close the data device
  • operator: Graceful termination
  • Make retention policies in the Rok API return accurate information about group members and cleanup orphan group members.
  • blkutils: Fix using get_disks() with glob
  • lvmd: Fix progress reporting
  • libtrpt: Fix high completion latencies
  • doc: Update LVMD design document
  • common: Create a password helper
  • composer: Use 1MiB chock size as default
  • Always display snapshot policies section in UI
  • common: Use different connection strategy in tooltip
  • Improve Rok UI’s loading screen
  • common: Do not autodetect if we are in container, be explicit
  • Close the Pyro daemon before stopping the thrower
  • dm-clone: Backport upstream patches
  • libmap: Support batched epoch updates
  • common: Add missing prefix in UI’s HTTP client
  • Add Kubernetes PodSecurityPolicy Integration Design Doc
  • Angular and dependencies upgrade
  • indexer: Remove auth token in some API requests
  • ci: Increase dm-clone region size in lvmd tests
  • blkutils: Fix how we parse mountinfo in get_mountinfo()
  • Disable ASan’s LeakSanitizer for tests
  • Do not define min() and max() macros in C++, since they are already defined as funtions.
  • Update the instructions to build a Jupyter notebook
  • Update the Jupyter notebook Dockerfile to include Tensorflow 1.14.0, Python 4 wheels for the Rok client and the latest Kubeflow ml-pipelines Kale, and Kale Jupyterlab plugin.
  • scripts: Allow passing minus tags in rok-buildbot
  • lvmd: Discover volume mount points automatically, instead of providing them explicitly in take_snapshot()
  • Add helper to wait for a task to the Rok client
  • Rename the Rok client and its errors to RokClient and RokClientError respectively
  • Parse credentials and service parameters from a file in the Rok client
  • Parse credentials and service parameters from the environment in the Rok client
  • Provide authentication credentials during initialization in the Rok client
  • lvmd: Support encrypted volumes
  • Fix a bug where Rok API installations would raise internal errors when when accessing old delete marker versions due to migration v001400_0002 incorrectly introducing a number of attributes that should only exist in non-delete marker versions
  • Specify the minimum Gevent version that is supported
  • Detect Rok build type and skip lvmd CI tests
  • Add shared memory transport
  • Fix some prettytable dependency issues
  • Improve HTTP response handling in UI
  • minikf: Merge questions with CLI args
  • dlm: Use force_str() on strings passed to C calls
  • Support logging to the frontend from anywhere
  • liod: Rescan SCSI bus periodically
  • Add a role and role binding to MiniKF users to enable Rok API tasks to access Kubeflow resources
  • Add a PodDefault to allow the MiniKF’s default user to access the Rok API from within the Kubeflow namespace
  • operator: Replace Threads with Greenlets
  • lvmd: Use dm-clone only when restoring a volume from a snapshot, not for fresh volumes
  • conf: Fix a bug where an undefined var was referenced
  • csi: Do not start dm-clone monitor threads on controller
  • Fix the MiniKF deployment and its QA process
  • lvmd: Remove mostly unused dyn_params parameter
  • Make cmdutils compatible with Python 3
  • common: Do not import subprocess32 when using Python3
  • roke: Fix dots in member IDs and stopping md devices
  • Refactor lvmd to improve code readability and maintainability
  • operator: Handle nodeSelector and node labels
  • minikf: Track latest wheels
  • githooks: Fix a bug when checking config version
  • gw: Make the Rok gateway UI pass Prettier checks
  • lvmd: Support variable dm-era tracking granularity
  • Add support for building Python 3 wheels for the Rok client
  • Make the Rok client Python 3 compatible
  • Make the rok_common library Python 3 compatible
  • New icons in Rok UI
  • doc: Add copy buttons in all doc’s code blocks
  • pu: connect all PUs to the external controller by default
  • lvmd: Introduce DM snapshots
  • test: Set start_new_session instead of new_session
  • Use HttpClient in Rok Registry UI
  • ci: Fix hashing test to consider duplicate offsets
  • rdm: Add support for RAID arrays
  • rdm: Export attributes of block devices
  • tests: Fix leaks discovered by ASan
  • cmake: Use correct soversion for libetcd3
  • Add JSON and CSV output format to the Rok client
  • rok-do: Introduce rok-do CLI tool
  • doc: Use correct apt files on source install guides
  • gw: Dynamically resize file chooser window
  • Add extra validation checks in Rok Gateway
  • docker: Add missing syslog argument
  • Implement dialog for copying files
  • cmdutils: Support Popen kwargs and remove some shell=True commands
  • operator: Fix an operator regression wrt platforms
  • operator: Uniformly sync resources
  • conf: Improve diffing support and move it under rok_common
  • gw: Add a Django cache to cache the chock size
  • operator: Produce events on cluster CR
  • cli: Add QuestionContext and expose question threshold through args
  • operator: Always refresh cluster driver cache
  • lvmd: Add support for configuring the snapshot chunk size
  • libtasks: Separate null and empty answers and add boolean-type question
  • Switch to Stretch builds
  • Deploy Rok Registry using Rok Operator
  • Remove deprecated Http class from UI
  • Fix rok-init bugs
  • Allow Fort to filter LDAP users by group
  • doc: Extend Sphinx configuration to include versioned manifests from a specified path

Version 0.14.1 (Nephrite)

Version 0.14 (Nephrite)

Version 0.13 (Marble)

Version 0.12 (Lignite)

Version 0.11.1 (Kryptonite)

Version 0.11 (Kryptonite)

Version 0.10.3 (Jade)

Version 0.10.2 (Jade)

Version 0.10.1 (Jade)

Version 0.10 (Jade)

Version 0.9 (Iron)

Version 0.8.1 (Hematite)

Version 0.8 (Hematite)

Version 0.7.2 (Granite)

Version 0.7.1 (Granite)

Version 0.7 (Granite)

Version 0.6.2 (Flint)

Version 0.6.1 (Flint)

Version 0.6 (Flint)

Version 0.5 (Emerald)

Version 0.4.5 (Diamond)

Version 0.4.4 (Diamond)

Version 0.4.3 (Diamond)

Version 0.4.2 (Diamond)

Version 0.4.1 (Diamond)

Version 0.4 (Diamond)

Version 0.3 (Celestite)

Version 0.2 (Beryl)

Version 0.1 (Amethyst)