This file describes code and packaging changes for all Rok releases starting with Rok 0.15. It is mostly of interest to packagers, administrators, and developers.
Version 1.2 (Ruby)¶
- Introduce a new Django view in Rok GW to serve HTTP GET requests at /metrics and expose Rok metrics in Prometheus’s text-based format.
- Introduce a Grafana dashboard with multiple rows and panels to visualize Rok metrics, extracted from Prometheus’s TSTB.
newTagonly if necessary when patching images for airgapped deployments.
python-authlibin the Debian packages to install for CI, RokE and Registry images.
- Separate Istio deployment from Rok and Rok Registry in
- Introduce “Arrikto” and “air gapped” custom admonitions in docs.
- Include the S3 action performed in the logs of the S3 daemon.
- Include the names of all libs3 functions called in the logs of the S3 daemon.
- Truncate the MiniKF image name to conform to the naming restrictions of GCP and AWS.
- Use Kubernetes 1.17 for EKS clusters.
- Implement a common button component in the UI.
- Introduce social login buttons in the UI.
- Improve the button hover functionality in the UI.
- Disable GC cron jobs in Rok Registry clusters.
- Generate the
rok-dlm-breakservice dynamically, based on the type of the appliance.
- Add python script in package
rok_pufor testing individual target PUs.
- Fix a bug where the Rok S3 daemon would not verify the SSL certificate of the S3 service it connected to.
- Add a Rok cluster config variable to allow connecting to an S3 service without verifying its SSL certificate.
- Configure Prometheus to run in multiprocess mode to allow Gunicorn workers to cooperate in order to expose GW metrics.
- Restructure the ‘Prepare Management Environment’ section of the EKS docs to follow the current documentation guidelines.
- Add the Prometheus Python client as a dependency to Rok’s Django library.
- Install the Prometheus Python client in Rok Registry container images.
- Add settings for external OIDC providers for Rok Fort.
- Add the ‘SocialUser’ model which holds information about users who authenticate with external OIDC providers in Rok Fort.
- Add support for authentication via external OIDC providers in Rok Fort.
- Protect the OIDC endpoints using a state parameter.
- Add support for the OIDC callback URL in the common UI code.
- Extend Rok Registry UI to initialize/finalize OIDC cycles.
- Prevent updating browser’s history in docs when scrolling.
- Increase documentation’s content width.
- Change ordered list design in docs.
- Remove depth limitation from doc’s menu.
- Update our docs with instructions on how to edit Registry-related images.
- Fix a bug in Registry UI that was showing the “Sign In” form when there’s a single Social provider.
- Introduce Python helper to calculate Rok’s build ID and use it from CMake.
- Introduce Python helper to calculate the version for Rok’s Python packages and use it from CMake.
- Include Rok Registry in the release procedure.
rok-image-mirrorto dump list of mirrored images.
- Skip creating a pending cluster configuration if there are no changes.
- Fix a bug that prevented setting cluster config variables to values that contain braces.
- Extend Rok Operator to upgrade cluster config variables that are not specified
.spec.configVars, but are provided by the users as fields in the CR’s spec.
- Add documentation for configuring external OIDC providers in Rok Fort.
- Fix an incompatibility issue in Rok APIs that caused Prometheus metrics to be registered more than once in Python 3.
- Fix a Python 3 compatibility bug in the Rok etcd3 client.
- Implement an etcd backend for the Dynamic DNS API for MiniKF.
- Introduce a Django based Dynamic DNS API for MiniKF AWS instances, that will
serve names under the
air-gappedadmonition directives in docs.
- Allow long links to wrap in docs.
- Gracefully exit GC task of
rok-dowhen the working directory is empty.
- Fix error logs in Rok Registry and Rok Fort due to Prometheus integration.
- Fix a validation bug for config variables that have already been converted to the proper Python type.
- Fix a bug in MiniKF’s provision script, where the list of downloaded images was not correctly passed to the ConfigMap of the admission webhook that sets the imagePullPolicy of downloaded images to Never.
- Change the MiniKF’s admission webhook’s invocation policy, so that it is invoked again if a subsequent webhook (e.g., Istio injection webhook) further changes the Pod.
- Introduce RDM overlay with a disk-script that works on Azure.
- Upgrade Linux kernel in MiniKF to 5.4.104-0504104-generic to fix a Go runtime issue that made CSI sidecars crash because of hitting max locked memory limits.
nvidia-440in MiniKF of all supported platforms.
- Do not attach the
AmazonEKSClusterPolicyIAM policy to the EKS cluster IAM role.
- Declaratively manage IAM roles needed to create an EKS cluster with AWS CloudFormation stacks.
- Rename the
assume-no-versioningcommand line argument of the Rok S3 daemon to
--no-validate-versioning, and make it skip validation of S3 bucket versioning status when provided, regardless of whether versioning is used by the daemon.
- Remove the
--no-versioningargument from the Rok S3 daemon and automatically enable versioning when the IFC library is enabled via the
- Instead of always listing versions to determine if an S3 bucket exists and is empty, only list versions if IFC is enabled, otherwise list objects, to ensure the S3 daemon is compatible with S3 APIs that do not support versioning.
- Add a note for rebalancing the pods.
- Update gcloud sdk in MiniKF, as currently pinned version was removed from repo.
- Enable TCP keepalives globally in Istio.
- Fix a bug where custom admonitions did not support multiple CSS classes.
toggledirective in docs.
- Introduce foldable admonitions in docs.
- Add sphinx-tabs extension for tabbled content in docs.
- Fix a bug where a user couldn’t register a new Rok Registry from the settings page in the UI.
- Fix email symbols handling in Rok Registry links in the UI.
- Update NVIDIA driver and CUDA version in MiniKF to 460 and 11.2 respectively.
~/.docker/on tmpfs to fix the broken symlink across MiniKF reboots.
- Extend MiniKF to use
rok-image-listand automatically generate the list of images that
provision.pyneeds to pre-pull.
- Use a newer version of
python3-gitto work with packed-refs created from newer Git versions. As a result, fix some import issues.
- Redesign MiniKF’s landing page for Vagrant.
- Use our own
nginx-ingress-controllerkustomization instead of Minikube’s ingress addon.
- Use manifests to deploy Istio Ingress instead of applying a formatted string value.
- Extend MiniKF to read
docker/images-excludeand exclude images mentioned in this file.
- Fix a bug in Rok UI where it throws a NullInjectorError for the AuthUrl InjectionToken.
- Fix a bug that resulted in an incorrect suggested file name in Dataset snapshot policies.
- Fix a bug where after changing the file name of a snapshot policy, the Rok UI would still display the default value.
- Produce a smaller Vagrant box for MiniKF by excluding non-critical images from the pre-pull list.
rok-versionto generate a valid SemVer for MiniKF.
- Fix a bug in MiniKF where it would always try to pull images from
index.docker.ioeven if they exist locally.
- Add design doc for authentication with external OIDC providers in Rok Fort.
- Increase the amount of required RAM for MiniKF on VirtualBox from 10GB to 12GB.
- Exclude extra Docker images from MiniKF on GCP to improve provisioning times.
- Implement a composite authentication backend for the MiniKF Dynamic DNS API, to allow bearer token authentication for instances and admins.
- Ensure that no stale containers are left in the final MiniKF image.
- Update APT cache before installing kernel build dependencies on Ubuntu.
- Support Ubuntu Bionic kernel
5.4.0-1040-azurefor AKS nodes pools.
- Support Ubuntu Xenial kernel
4.15.0-1108-azurefor AKS nodes pools.
- Support Ubuntu Xenial kernel
4.15.0-1109-azurefor AKS nodes pools.
- Support Ubuntu Xenial kernel
4.15.0-1111-azurefor AKS nodes pools.
- Extend the rok-tools manifests to support deployment on Azure.
- Disable Azure’s Admissions Enforcer for Istio.
- Support RDM on Azure.
- Retry Kubernetes
watch()operations on ProtocolError exceptions.
- Enable TCP keepalives in
- Install Azure CLI in
rok-deployup-to-date with the latest instructions for cloning our GitOps repository.
- Introduce manifests to deploy S3Proxy on AKS.
- Extend the docs with instructions to deploy Rok over S3Proxy on Azure cloud.
- Deploy Rok’s external services (etcd/PostgreSQL/Redis) on Azure.
- Expose services on Azure.
- Configure Azure CLI inside
- Set up a cloud environment for Azure inside
- Support creating an AKS cluster.
- Introduce the
rok-kf-rebaseCLI tool to help with manifests rebase.
- Introduce the
rok-kf-pruneCLI tool to help with resource pruning during upgrades.
- Update to Enterprise Kubeflow 1.3 manifests.
- Add upgrade instructions for EKF 1.3.
- Remove EKS references from platform-agnostic sections of the docs.
- Add a maintenance guide with instructions on how to add an internal GitHub repository as a backup GitOps remote.
- Add a maintenance guide with instructions on how set up cluster-wide access to a Docker Registry.
- Add aliases for Kubernetes memory units Ei, Pi, Ti, Gi, Mi, Ki.
- Introduce script to scale-in a Kubernetes cluster.
- Improve highlighting of prompts in doc’s code blocks.
- Update the Debian base image rok-do uses to debian/snapshot:stretch-20210511.
- Use Kubernetes 1.18 for EKS clusters.
- Expose services on AWS using Classic Load Balancer.
- Add maintenance guide for adding users in dex.
- rok-k8s-drain: Fix scale-in script to handle Unauthorized Errors.
- rok-k8s-drain: Remove K8s configuration confirmation question.
- rok-k8s-drain: Ask for user input confirmation.
- rok-k8s-drain: Update log file location.
- rok-k8s-drain: Fix help argument to work with missing kube config file.
- rok-k8s-drain: Use AWSRegion Question instead of AWSRegionArgument.
- Introduce script to protect Arrikto EKF Pods from OOM conditions and CPU starvation.
Version 1.1 (Quartz)¶
- Make our AWS CloudFormation client, and
rok-s3-authorizeby extension, idempotent.
- Improve the periodic rule of Rok API version retention policies to retain the latest instead of the earliest version in each interval.
- Do not include group members in the files list API call of the Rok API.
- Extend the files list API call of the Rok API to support including deleted files in the response.
- Include the number of versions of each object in the files list API call of the Rok API.
- Support pagination in the files list API call of the Rok API.
- Extend Rok’s provisioning tool for Kubernetes with the –delete mode to delete specified Kustomize packages.
- Add a loader to the select all button of the Rok UI.
- Use pagination in the copy and delete files dialogs of the Rok UI.
- Use pagination in the files list page of the Rok UI.
- Remove a backwards compatibility fix for Rok versions v0.10 or earlier, that allowed passing the task ID in place of the bucket name to retrieve a task by ID in the API call to list the tasks of a bucket in the Rok v1 services API.
- Replace the coarse grained authorization which was applied by the Rok API to provide namespace isolation with fine grained authorization tests for each API call, ensuring the user is authorized to perform the specific action they requested.
- Remove a workaround that automatically added the Kubeflow-UserID header in all Rok client requests performed inside a Kubernetes cluster.
- Only allow authenticating via a token in the Rok client and CLI.
- Drop the
GW_part of all environment variables used by the Rok client. For example, rename
- Use the
Authorization: Bearer <token>header instead of the
X-Auth-Token: <token>header for authentication in the Rok API and client.
- Relax a restriction in our githooks that required every introduced Rok config version in our repo to also immediately be the target one.
- Support using more than one authentication backend simultaneously in the Rok API.
- Support authentication via Kubernetes tokens in the Rok API.
- Retrieve the CSRF token from the X-XSRF-Token header in the Rok API.
- docs: Document how Rok CSI handles auto-registration for VolumeSnapshots
- Introduce more fine-grained ClusterRoles for users and administrators to provide access to the Rok API.
- Restrict access to individual Rok API services via RBAC rules.
- Fix a bug where Rok API tasks created using a Kubernetes token failed to access the Kubernetes API due to using the user ID instead of the username for impersonation.
- Introduce a design document for the Kubernetes Rok operator.
- Restrict Rok CSI to only allow registering VolumeSnapshots in the same Rok account as the snapshot’s Kubernetes namespace.
- Restrict Rok CSI to only allow creating PVCs from a Rok URL in the same Kubernetes namespace as the account of the Rok URL.
- Remove support for the
rok/origin-fisk-groupannotations from Rok CSI, which violated namespace isolation by allowing users to register any fisk into their account.
- Extend our APT helper to install packages in a batch while retaining progress reports.
- Remove a 500ms delay from our progress messages in the ‘dialog’ frontend.
- Use a distinct call to list group members in the versions list page of the Rok UI.
- Introduce separate tasks to manage different deployments repos.
- Rename the Rok CLI from
- Automatically reload tokens before every request in the Rok client if they
have been provided using the
- Extend rok-do to garbage-collect local artifacts.
- Add design document for Rok CLI questions.
- Set argparse.SUPPRESS as the global default for CLI args and display the enclosing Question’s default in the CLI arg’s help message.
- Do not mutate CLI argument defaults via preseed files.
- Extend rok-version with the –build-tag argument to report the versioned tag of build artifacts.
- Extend Rok’s build version with the source branch of the release.
- Add license, build type and git branch information to rok-do tasks that manage manifests, docs and the deployment repositories.
- Introduce per release open-ended upgrade notes and fold any generic ones into the version-specific ones.
- Include fixes for upstream dm-era bugs in the rok-kmod images.
- Introduce a script to upgrade the image of all notebooks in a cluster.
- Create Rok Registry images with rok-do.
- Introduce a script to perform a rolling reboot of a Kubernetes cluster.
- Introduce a script to reset the CBT data of all Rok PVCs.
- Fix a bug where the Rok etcd library would sometimes report an incorrect number of retries in its logs.
- Fix a bug where the Rok DLM CLI would incorrectly log warnings about all other DLM clients being missing when requested to retrieve information for one of them.
- Fix an out of bounds memory access bug in the Python bindings of
librok_dlmthat resulted in the
rok-dlmCLI occasionally segfaulting and leaving behind stale locks after a pod restart.
rok-deployto deploy Rok Registry clusters and split the deployment process into three steps: Deploy, Generate manifests, Apply manifests.
- Improve the Kubeflow recurring runs upgrade instructions to use the Jobs page and clone old failing runs.
- Include the user’s AWS account ID in the default S3 bucket name prefix.
- Omit the
-rok-roksuffix from the name of the CF stack and related IAM resources needed to grant Rok full access to S3 buckets.
- Fix a bug where the modal for entering an authorization code in Rok UI closes unexpectedly.
- Use UI’s path as a prefix when storing and retrieving localStorage values.
- Introduce rok-do tasks for building the Rok Documentation with any combination of (builder type, tags).
- Incorporate the public tag of the Rok Documentation into the logic/content of the docs.
- Use Debian image snapshots as the base Docker images for rok-do tasks.
- Add an option that disables the offline warning notification for specific requests in the UI.
- Remove the
vprefix from Rok version and related artifacts.
- Fix a bug where the Rok S3 daemon would attempt to assume an AWS role using the AWS STS endpoint of an incorrect region.
- Revamp the Rok S3 daemon bucket versioning validation to first retrieve the versioning, and then if required either update it during formatting or fail with an error during validation.
- Support deploying Rok over pre-existing, empty S3 buckets
- Fix a wrong route in Authservice’s
- Replace the
JSONPatches6902fields with the
patchesone in the kustomization file of monitoring’s deploy overlay.
- Allow the user to verify if the S3 IAM role exists, instead of making it a
strict check in
- Prevent the auto-redirect to the Kubeflow dashboard from the OAuth callback page.
- Highlight the active menu item in the Rok docs.
- Upgrade Font Awesome version in docs.
- Improve the appearance of admonitions in the docs.
- Allow selecting the prompts in all code-blocks except
consolein the docs.
- do: Improve the way we clean up and snapshot MiniKFs
- Loosen the newsworthiness check of our githooks by ensuring that at least one of NEWS.rst, Changelog.rst is updated by a commit that closes a GH issue.
- Fix a bug in the responses of the OAuth endpoints in the Rok API.
- Use the correct Registry base URL in the Rok UI during the Rok registration process.
- Support using classic ELB instead of ALB to expose NGINX.
- Support terminating TLS on NGINX instead of using an ACM certificate at ALB.
- Introduce manifests for creating self-signed certificates and expose Rok+EKF with ELB in front of NGINX.
- Support AMI release 1.16.15-20210310 [kernel version
4.14.219-164.354.amzn2.x86_64] for managed node groups on EKS.
rok-liobug that causes
rok-csito misdetect whether a Fisk is exposed as a block device.
- Fix race in the pre-clone verification step of
LVMdthat could lead to errors, such as failures to unexport the origin Fisk, I/O errors, and stale TCMU handlers.
- Support applying different set of patches for each supported kernel version in
- Support AMI release 1.16.15-20210322 [kernel version
4.14.225-168.357.amzn2.x86_64] for managed node groups on EKS.
- Support serving multiple versions of the docs.
rok-doto download the correct kernel source for Ubuntu kernels.
- Support AMI releases 1.16.15-20210329 and 1.16.15-20210414 [kernel version
4.14.225-169.362.amzn2.x86_64] for managed node groups on EKS.
- Support AMI release 1.16.15-20210501 [kernel version
4.14.231-173.360.amzn2.x86_64] for managed node groups on EKS.
- Support AMI releases 1.16.15-20210504, 1.16.15-20210512 and 1.16.15-20210518
4.14.231-173.361.amzn2.x86_64] for managed node groups on EKS.
- Mark Rok and RokCSI Pods as critical, to avoid OOM kills and evictions.
- Improve the copy button, implement exactly the same behavior as manually selecting and copying text.
- Improve copy behavior for secondary prompts in doc’s code blocks.
- Improve text color for command’s output in doc’s code blocks.
- Improve copy behavior in doc’s code blocks with command’s outputs.
- Add CPU requests for RokE and Rok CSI containers to protect them from CPU starvation.
Version 1.0 (Platinum)¶
- Fix a bug where the account selector in the Rok UI sometimes displayed the incorrect account.
- Do not display a logout button when logging out is not possible in the Rok UI
- Fix a bug where Rok API drivers would use the account instead of the user to perform authorization checks for tasks.
- Fix a bug where the Rok UI would sometimes raise an undefined variable exception after logging in.
- Fix a bug where the Rok UI would ignore the namespace selected via the Kubeflow dashboard selector
- Fix a bug where the Rok UI would not render correctly in a Kubeflow environment.
- Fix a bug where Kubernetes exceptions would not be converted to a Unicode string properly, resulting in the messages of Kubernetes errors not being visible in Rok task logs.
- Fix a bug where the Rok client would fail to retrieve the user’s ID when using static authentication.
- Remove secrets from the allowed variables in Rok CSI auto-register URLs.
- Fix a bug where Rok CSI would fail to auto-register a VolumeSnapshot when the Rok API was using AuthService authentication.
- Fix a bug where Rok CSI would fail to hydrate a PVC when the Rok API was using AuthService authentication.
- Give Rok CSI a rok-admin ClusterRole to allow it to access to all Rok accounts.
- Extend Rok’s provisioning tool for Kubernetes with the –apply mode to avoid questions, skip regeneration of manifests and only apply specified Kustomize packages.
- Make rok-do fail by default if a path in the host is needed by a task and it does not exist.
- Replace CommandNotFoundError with CommandOSError, which is more broad and accurate.
- Fix the logging of byte strings (and the b’...’ prefix) in the cmdutils module.
- Persist the home directory of user root inside rok-tools by mounting a Docker volume or Kubernetes PVC at /root.
- Correctly display the account name instead of the user ID in Rok CLI.
- Move authorization code from the Rok API views to a dedicated backend.
- Store the Kubernetes namespace UUID in Rok API accounts and verify it matches the one on Kubernetes with every request to prevent accessing resources on Rok after the namespace has been deleted.
- Add fine-grained authorization to account metadata updates in the Rok API.
- Introduce the rok-cluster-admin ClusterRole for Rok cluster administrators on Kubernetes.
- Prevent auto redirect to KF dashboard when the Rok UI is in chooser mode.
- Bump the version of Istio that Rok’s provisioning tool for Kubernetes installs to 1.5.7.
- Remove a late import in Rok’s log formatting code, which could cause a deadlock between the log handler’s lock and the Python module import lock during the initialization of the Rok client by Rok CSI.
- Improve the style of all links in the Rok UI.
- Display the number of versions in the object list of the Rok UI
- Migrate githooks to Python 3.
- Use Angular’s infinite scroll component in the Rok UI.
- Implement search support for buckets and objects in the Rok UI.
- Export the Rok client, its error classes and the helpers responsible for querying Rok URLs at the Rok client’s module level.
- Introduce a helper to the Rok client to list the members of a group.
- Fix a bug where Rok CSI would sometimes use the incorrect Rok API version when restoring a volume from the Rok URL of a group.
- Introduce group delete for objects and versions in Rok UI.
- Improve messaging in UI’s network errors.
- Suppress C812 Flake8 error, because it doesn’t offer us much and leads to a bit uglier code.
- Perform retries when setting the versioning status of an S3 bucket, to workaround the fact that the S3 API sometimes returns 404 errors for buckets that have just been created.
- Suppress E741 Flake8 error, because most monospace fonts already do a good job at showing “l”, “I” and “1” differently.
- Add a way to lazily evaluate Task attributes in rok-do
- Introduce rok-dev, a Debian Stretch environment for Arrikto devs.
- Enable logs in UI’s production builds
- Fix CRD validation in Istio kustomizations.
- Provide a ClusterRoleBinding for the rok-admin and rok-cluster-admin ClusterRoles to the rok and rok-operator ServiceAccounts.
- Fix Githooks random behavior regarding flake8 checks
- Add support for creating a Docker image with Python 3.5.1 installed.
- Preserve LC_ALL when running tasks in a remote with rok-do.
- Build Rok Enterprise Docker images with rok-do
- Improve rok-dev with support for running rok-do
- Make Python bindings compatible with Python 3 and ship the corresponding Python 3 packages.
- Add support for building the Rok Operator Docker image with rok-do.
- Add support for building the Rok Disk Manager Docker image with rok-do.
- Add support for building the Rok CSI Docker image with rok-do.
- Give Rok CSI nodes the rok-admin ClusterRole, to provide them access to all Rok accounts.
- Reduce configd log spam by rendering config only if member is not up-to-date
- Improve the Rok API error message when accessing an account for a Kubernetes namespace that does not exist.
- Fix a bug where the Rok Composer could deadlock while serving simultaneous requests to delete and access a fisk.
- Support snapshot policies in the Rok GW Jupyter driver.
- Support snapshot policies in the Rok GW dataset driver.
- Reduce electiond log spam by watching the master lease without timeout.
- Preserve query parameters when the namespace changes in Rok UI.
- Add documentation for cmdutils, as well as a developer guide with examples for some common scenarios.
- Extend LVMd to report successful snapshot completion.
- Allow LVMd to recover from an interrupted snapshot.
- Introduce config variables to setup cron jobs for local/global GC.
- rok-csi: Add support for garbage collecting LVs and nodelocal fisks owned by LVMd.
- Remove the “escalate” permission from the Rok Operator/Cluster pods.
- Fix a bug where the UI was showing the wrong object count when deleting objects.
- Add a mixin with common helpers for Rok-related tasks in rok-do.
- Handle existing tags in deployments repo and avoid tagging trunk versions.
- Handle transient disconnections in a less intrusive way in Rok UI.
- Introduce a user guide for snapshot and retention policies.
- Disable msg_delay in text progressbar
- lvmd: Ensure we delete stale resources under normal operation.
- rok-csi: Skip GC-ing nodelocal fisks when composer runs in non-nodelocal mode.
- rok-csi: Improve GC logs.
- Add a rok-do task to GC old Docker images used by rok-do.
- Fix a bug where the
rok_common.aptPython module would ignore failures to update the APT cache, because
apt-get updatereturns with a 0 exit code.
- Fine-tune the update strategy for rok-disk-manager and rok-kmod DaemonSets so that they can be upgraded in parallel.
- Remove the message limits in the Rok etcd v3 client.
- Add support for building the Rok Tools Docker image with rok-do.
- Fix running rok-do subtasks as direct goal tasks.
- Implement API call to retrieve the members of a group in the Rok API.
- Make the
task-gcmanagement command more efficient by avoiding having to protect all parameters of all tasks.
- Use an LRU cache for the classes dynamically created when protecting objects
to fix a performance issue when protecting large numbers of objects. This
will also improve performance of
- Improve the efficiency of recursive listing in the etcd v2 emulation client by using a node index when formatting the response.
- rok-csi: Extend GC to unfreeze frozen filesystems and collect stale device mapper devices.
- Document how we generate Docker images for the Kubernetes CSI Sidecars.
- Remove any force-cleanup logic from rok-deploy that could purge a non-empty directory specified by the user as their local GitOps repository.
- Introduce manifests to deploy a monitoring stack alongside Rok on Kubernetes, based on Prometheus and Grafana.
- Configure Prometheus to periodically scrape and store metrics from Rok’s etcd.
- Add a dashboard to Grafana to visualize Rok’s etcd metrics.
- Configure Prometheus to periodically scrape and store metrics from Rok’s Redis.
- Add a dashboard to Grafana to visualize Rok’s Redis metrics.
- Add public document with description and deployment steps for Rok’s monitoring stack on Kubernetes.
- Use Kubernetes 1.16 for EKS clusters.
- Work around a Mitogen issue where the standard I/O streams in the remote are in non-blocking mode.
- Update the code for deploying a Rok Registry cluster.
- rok-csi: Record all logs and progress updates as events on the corresponding Kubernetes object.
- rok-csi: Allow displaying the subjob progress along with the total progress.
- rok-gw: Allow displaying the virtual subtask progress along with the total progress.
- rok-csi: Fail stale VolumeSnapshots after Pod restart
- do: Warn when a task does not support caching
- Fix task’s logs alignment in Rok UI
- rok-csi: Support migrating PVs from cordoned nodes.
- do: Create rok-kmod image using Debian packages.
- Decouple do task NGINXStaticSite from docs
- do: Support caching in NGINXStaticSite
- Introduce the
run-if-mastertool to allow easily running commands on the master node of the Rok cluster.
- Introduce a helper to acquire an exclusive cluster-wide DLM lock.
- do: Take the
entrypointtask attributes into account when caching a task.
- Introduce a way to uniquely identify a process in a running host, by computing an ID that cannot be reused during the host’s uptime.
- Extend the
run-if-mastertool to break all stale DLM locks left behind by the process it executed.
- Allow garbage collecting Rok API tasks based on their status.
- Enable automatic garbage collection of Rok API tasks in the Rok cluster.
- do: Hint to the task that must run when a fromsnap is not found.
- do: Support adding labels to rok-do snapshots.
- do: Add support for GCP remotes.
- Support provisioning MiniKF using the new Kubeflow manifests.
- Remove Pod deletion logic from Rok Operator; delegate this task to the DaemonSet Controller
- do: Automate building MiniKF images for GCP.
- deploy: Improve auto-detection of EKS cluster name to handle clusters created
- do: Automate building MiniKF images for AWS.
- Use the j2 CLI to render Jinja templates instead of using envsubst and environment variables.
- csi: Unpin both used and unused PVCs.
- csi: Produce events when pinning/unpinning a volume.
- csi: Automate garbage collecting completed jobs every hour.
- csi: Do not crash if etcd goes down.
- Update the Rok operator and systemd units to break locks in the master namespace.
- Fix an issue where computing the run ID of a process occasionally failed due to a bug when parsing the process stat file.
- Use fixed size widgets in our dialog based frontend.
- Fix yld() not to leave open fds behind.
- electiond: Fix a bug where if the Rok master node was permanently removed, other nodes did not attempt to become master.
- cluster: Do not lock the master lease just for inspecting it
- aws: Add CloudFormation support
- minikf: Reduce timeout limit of APT connections
- lvmd: Log info that can help us debug filesystem related issues.
- lvmd: Verify the filesystem state.
- lvmd: Recover the filesystem journal when activating volumes.
- csi: Use the same PU object for both CSI and LVMd running on the same process.
- liod: Set timeout for
tcmu_handlerwhile waiting for a connection with Rok to succeed to infinity.
- operator: Use the kubernetes.io/hostname Kubernetes node label over the name one to schedule Rok CSI Guard Pods more robustly.
- manifests: Remove Pod Disruption Budgets for Istio.
- operator: Take into account unschedulable nodes when calculating which nodes to guard to avoid unneeded resource create-delete-recreate cycle.
- Use the watch helpers provided by the Rok etcd clients when watching for document changes in the Rok API.
- operator: Emit more events to increase observability into the cluster scaling algorithm
- Add design document for Rok Disk Manager (RDM)
- Revamp Rok Disk Manager to always request LVs with size that is a multiple of the block size, i.e. 512.
- RDM: Hash block devices based on the underlying kernel device, not their path.
- Fix a bug where rok-deploy modified the kustomization file for Istio, removing some useful resources/transformers.
- docs: Extend our guides with instruction on how to create a dedicated VPC for the EKS cluster
- Add missing packages (curl and bsdmainutils) in rok-tools image
- rok-gw: Fix a bug where the Rok StatefulSet driver would create a group resource with the wrong order for the registered disks.
- rok-gw: Fix a bug where the Rok StatefulSet driver would not sort the Pod
names correctly, placing
pod-2inside the generated group resource.
- csi: Document how to create a StatefulSet from a Rok group resource using the
Version 0.15.1 (Onyx)¶
- Move docs out of the CMake build system.
- Make the building of docs depend on version-specific manifests.
Version 0.15 (Onyx)¶
- manifests: Use latest kmod image and kubeflow/manifests
- Revamp the instructions to test a Rok installation on EKS
- doc: Use proper mount for Docker
- doc: Add deploy overlays to EKS guide manual option
- doc: Update instructions of building the rok-kmod image
- manifests: Add .cache kfctl folder to gitignore
- Enhance guides of onboarding and release procedure
- cli: Store logs under ~/.rok/log
- operator: Fix bug with stale cluster config
- Add instructions to configure the Kubernetes namespaces and RBAC rules after installing a Rok cluster in EKS
- scripts: Fix tag creation in manifests script
- rok-kmod: Update Dockerfile.local with missing kernel
- Restore all Rok probes except the one used by the Rok appliance to Python 3
- conf: Set master_capable to True on Kubernetes
- deploy: Provision auth components
- doc: Treat warnings as errors when building with Makefile
- Fix an invalid JSON document in the EKS installation docs
- scripts: Make manifests script adopt existing repos
- doc: Mention EKF instead of MiniKF
- doc: Do not copy the results when user select text
- manifests: Use string replacement instead of jinja2 templating
- kmod: Don’t start a progress bar if there are no modules to install
- gw: Always display cancel button in services form
- Static rok and ekf themes
- doc: Do not copy the results shown in blocks
- gw: Move namespace selector into its own component
- doc: Update Kubeflow integration doc
- Hide and show code blocks in docs
- Make our manifests templates and have bases only refer to proper image tags
- Introduce a developer guide for the Kubernetes client’s initialization
- Fix a bug in the Kubernetes Rok API drivers that caused SubjectAccessReview requests to sometimes fail with an unauthorized error
- doc: AuthService Integration
- Kubernetes: Configure dockerconfig with rok-deploy
- Introduce the v2 services and OAuth APIs in Rok, to allow Rok clients to interact with any account instead of only the one matching their user UUID
- Include CMake>=3.8.2 as new a build dependency since we make use of
- Make AuthService authentication the default in Kubernetes
- Introduce the AUTHORIZATION_BACKEND setting for the Rok API to control the way requests are authorized
- Convert all Rok API authentication backend names to lowercase
- Rename the static-authservice authentication backend to authservice in the Rok API
- Fix custom fonts in doc
- Further improve Python 3 compatibility
- doc: Use example.com in our public docs
- Make Kubeflow-UserID the default user header when using AuthService authentication in the Rok API
- doc: Improve doc on Kubeflow’s integration with GitLab
- Fix services request with namespaces
- Enhance Kubeflow integration and use ekf overlays in KfDef
- doc: Fix broken copy button
- manifests: Move Rok manifest to its proper place
- Revert Rok probes to Python 2 to workaround missing dependencies for the Rok cluster probe
- Make the Rok etcd3 client compatible with Python 3
- Automatically allow access to Rok API resources to users that have access to Kubeflow resources in the same Kubernetes namespace
- doc: Add absolute URL in snippet commands
- cmake: Separate ctypesgen preprocessor flags
- Kubernetes: Refactor manifests
- Fix a bug in the Rok S3 daemon template
- Build custom dex image
- Enable the Rok API and UI to run behind Istio with AuthService authentication
- common: Detect dirty repo and return trunk version
- Kubernetes: Make Redis probe Python3-compatible
- etcd: Add Python3 package for v3
- doc: Extend docs and add integrations
- Enable building reproducible rok-kmod images locally
- kmod: Fix typo in Ubuntu PPA Dockerfile
- rok-tools: Serve Rok’s public docs
- rok-kmod: Use rok-kmod debian package in rok-kmod’s Dockerfile
- githooks: Exclude json.in from Copyright check
- debian: Introduce rok-kmod package
- rok-kmod: Convert to Python3 and introduce python package
- doc: Make public docs customer-friendly
- common: Properly dump to file in current dir
- Kubernetes: Introduce rok-deploy
- probes: Make probes library Python3 compatible
- doc: Change doc’s layout
- common: Open dump_to_file in text mode by default
- Mention bootstrapping in the docs
- Make a number of small fixes to the Rok client to ensure our CI tests pass after transitioning to Python 3
- ci: Configure locale inside chroot
- Update the botocore dependency of the Rok AWS library to 1.12.103
- Integrate Rok with ctypesgen 1.0.2
- doc: Fix broken copy button image in nested docs
- Support mass deletion in the Rok UI
- Revamp the initialization of The Rok S3 daemon to identify deployment errors as soon as possible
- Introduce formatting and validation to all Rok PUs
- Correctly include the Rok Tools template in the docs
- kmod: Build reproducible rok-kmod images
- doc: Do not copy/link sources in public docs
- Minor fixes in the Python wheels doc
- Fix error reporting in Python 3 in the Rok client
- kmod: Find available custom modules
- Give to modules installed by rok-kmod the highest priority
- Introduce instructions for EKS
- Add design document about the formatting and validation of Rok daemons
- Introduce kustomize overlays for EKS
- Introduce Rok Tools
- doc: Make various adjustments to the rok-do guide
- Avoid retrying all available methods of retrieving security credentials when updating them in the Rok S3 daemon
- Support reading values from a file in the Rok C argument parser
- Display bucket descriptions in the Rok UI
- Prepare towards Python3 packages
- rokfs: Make ioctl prototype conditional
- operator: Set/apply cluster config
- Allow deleting a specific bucket or all buckets of a Rok cluster using the Rok AWS helper scripts
- Make rok_cluster an optional dependency of rok_aws
- Add entrypoints for AWS helper scripts
- Add AWS C++ SDK to rok-do build dependencies
- cmake: Use -Og on Debug and fix ctypesgen flags
- Kubernetes: Use rok-probed in initContainers
- Make the Rok commmon helpers converting strings to bytes and Unicode Python 3 compatible
- doc: Add Rok upgrade guides for Kubernetes
- Disable Fort signups
- operator: Cluster-neutral logging
- scripts: Allow purging multiple S3 buckets at once
- Add search support in Rok UI
- rdm: Activate the LVs when loading a VG/LV
- Check if the source directory exists when adding Python tests in CMake
- gw: Display the number of versions in objects list
- gw: Change link style across the Rok UI
- Update rok-do instructions
- cmdutils: Add check and log_error to wait()
- bootstrap: Improve validation
- Kubernetes: Treat configVars as object
- cmake: Add non-bootstrapped env as possible failure reason
- libredis: Implement scanning keys and batch deletions
- libtasks: Fixes and support for disabling logging to frontend
- Remove a stale file
- Add support for the IAM Roles for Service Accounts feature of EKS to the Rok S3 daemon
- Add a design document explaining in detail the way Rok pods gain access to AWS services when running within an EKS cluster
- Improve handling of time durations in timeutils
- Add script to attach an IAM role to the Rok service inside an EKS cluster
- Add script to purge an S3 bucket
- rok_args: Do not set dest for Sensitive arg
- libredis: Fix various bugs
- gw: Disable group toggle button when group is empty
- Add bootstrap and get build version with Python
- gw: Remove created info from task popover
- dm clone: Fix discard handling and overflow bugs which could cause data corruption
- operator: Add helpers to get CR info as rok-init metadata
- Add new tooltip messages
- Introduce file badge component in Rok UI
- githooks: Use relative paths for symlinks
- scripts: Fix a check for enabled githooks
- Fix various issues related to double reclassing
- Add guidelines for testing to the Rok documentation
- Add script to attach EBS volumes to a Kubernetes cluster
- Add perf tests for libfiber
- gw: Use bigger icons in services header
- Introduce new delete dialogs in UI
- Fix monospace and bold in UI
- Styles changes in authorizations page in UI
- Minor Kubernetes-related fixes
- libredis: Refactor code and support retries
- conf: Fix ip_reachable and remove default gateway verification
- Correctly initialize the Rok 0.15 client in MiniKF
- Update the MiniKF kustomize templates and wheels
- blkutils: Add –force for RAID devices with 1 drive
- Improve reporting of sizes in CLIs
- config: Factor out DLM lock break
- Fix and upgrade custom tensorflow images
- Update the Dockerfile used to produce the notebook image to create the required Python wheels using rok-do
- Make rok-do less noisy in case of errors
- conf: Support disabling host header check
- libredis: Enforce redis scheme
- scripts: Improve add_signature() to work on rebase
- Update Rok Kubernetes guides
- operator: Support cluster upgrades
- End-to-end building of Python wheels with rok-do
- operator: Retrieve secrets from CR
- Add generic helpers to get, list, and retrieve the owners of resources to the Rok Kubernetes client.
- libmap: Migrate epoch cache to Redis
- Keep logs in case cronic fails
- Do not deepcopy service params to increase the performance of service-related API calls
- operator: Remove hardcoded cluster refs
- rok-csi: Recover volumes from deleted nodes
- libredis: Introduce connection pool
- Add a simple graph implementation to the Rok common module
- kustomize: Manage Rok Storage/VolumeSnapshot classes
- trpt: Print message when magic number is invalid
- rok-init: Add basic support for upgrading clusters
- operator/kustomize: Add Redis endpoint
- appliance: Add redis endpoint
- operator: Fix bug in member removal
- doc: Update stretch build dependencies
- python/pu: Check PU status before releasing objects
- python: Replace select() with poll()
- Search for ext2/ext3/ext4 libraries in CMake
- Add a bucket icon in Rok UI’s breadcrumb trail
- Fix dependency to the PyYAML package in the Rok Kubernetes client
- kustomize: Introduce Redis
- Correctly display access tokens which were issued without an application
- libredis: Introduce a Redis library
- electiond: Improve detecting master changes
- operator: Fix typo in postgresql_probe()
- Specify arbitrary device attributes for CSI volumes
- Reduce LU-oriented lock contention in the I/O path
- csi: Start dm-clone monitoring threads after successfully initializing lvmd
- csi: Fix imports
- lvmd: Fix imports
- lvmd: Don’t snapshot discarded blocks
- Do not retry ENOENT on get_ca()
- gw: Refactor objects and versions list in UI
- doc: Fix indentation errors
- conf: Remove the templates and the render.py from etcd
- Fix a bug where tasks would never be finalized if they contained a value that cannot be JSON serialized
- Use common’s document view component in event info page
- conf: Do not use hostname as member ID fallback
- conf: Support config annotations
- Kubernetes: Extend cluster CRDs with status
- gw: Fix imports in webpack’s dev config
- Use common form component in Rok UI
- rdm: Support parsing and applying scripts line-by-line
- minikf: Some libtask fixes before updating provisioning script
- Use only scoped imports in UI
- common: Relax type restriction in format_duration
- common: Update copyright date in UI
- lvmd: Add support for replicated volumes
- Convert utility class in UI
- Fix a bug when waiting for a failed task in the Rok client
- Factor out our internal Kubernetes client
- common: Add Python functions to calculate versions
- Introduce two new probes to test the readiness of an etcd and PostgreSQL deployment
- Add options to wait until a readiness probe succeeds or a liveness probe fails
- conf: Support atomic config apply
- common: Revamp error service in UI
- Replace prettytable with printutils
- Change position strategy in UI
- libtrpt: Minor performance optimization
- Fix some minor email issues in Registry
- Deploy Rok Registry with Istio on Minikube/GKE
- operator: Rework init container cmds
- thrower: Work with any lz4 version
- lvmd: Properly close the data device
- operator: Graceful termination
- Make retention policies in the Rok API return accurate information about group members and cleanup orphan group members.
- blkutils: Fix using get_disks() with glob
- lvmd: Fix progress reporting
- libtrpt: Fix high completion latencies
- doc: Update LVMD design document
- common: Create a password helper
- composer: Use 1MiB chock size as default
- Always display snapshot policies section in UI
- common: Use different connection strategy in tooltip
- Improve Rok UI’s loading screen
- common: Do not autodetect if we are in container, be explicit
- Close the Pyro daemon before stopping the thrower
- dm-clone: Backport upstream patches
- libmap: Support batched epoch updates
- common: Add missing prefix in UI’s HTTP client
- Add Kubernetes PodSecurityPolicy Integration Design Doc
- Angular and dependencies upgrade
- indexer: Remove auth token in some API requests
- ci: Increase dm-clone region size in lvmd tests
- blkutils: Fix how we parse mountinfo in get_mountinfo()
- Disable ASan’s LeakSanitizer for tests
- Do not define min() and max() macros in C++, since they are already defined as funtions.
- Update the instructions to build a Jupyter notebook
- Update the Jupyter notebook Dockerfile to include Tensorflow 1.14.0, Python 4 wheels for the Rok client and the latest Kubeflow ml-pipelines Kale, and Kale Jupyterlab plugin.
- scripts: Allow passing minus tags in rok-buildbot
- lvmd: Discover volume mount points automatically, instead of providing them explicitly in take_snapshot()
- Add helper to wait for a task to the Rok client
- Rename the Rok client and its errors to RokClient and RokClientError respectively
- Parse credentials and service parameters from a file in the Rok client
- Parse credentials and service parameters from the environment in the Rok client
- Provide authentication credentials during initialization in the Rok client
- lvmd: Support encrypted volumes
- Fix a bug where Rok API installations would raise internal errors when when accessing old delete marker versions due to migration v001400_0002 incorrectly introducing a number of attributes that should only exist in non-delete marker versions
- Specify the minimum Gevent version that is supported
- Detect Rok build type and skip lvmd CI tests
- Add shared memory transport
- Fix some prettytable dependency issues
- Improve HTTP response handling in UI
- minikf: Merge questions with CLI args
- dlm: Use force_str() on strings passed to C calls
- Support logging to the frontend from anywhere
- liod: Rescan SCSI bus periodically
- Add a role and role binding to MiniKF users to enable Rok API tasks to access Kubeflow resources
- Add a PodDefault to allow the MiniKF’s default user to access the Rok API from within the Kubeflow namespace
- operator: Replace Threads with Greenlets
- lvmd: Use dm-clone only when restoring a volume from a snapshot, not for fresh volumes
- conf: Fix a bug where an undefined var was referenced
- csi: Do not start dm-clone monitor threads on controller
- Fix the MiniKF deployment and its QA process
- lvmd: Remove mostly unused dyn_params parameter
- Make cmdutils compatible with Python 3
- common: Do not import subprocess32 when using Python3
- roke: Fix dots in member IDs and stopping md devices
- Refactor lvmd to improve code readability and maintainability
- operator: Handle nodeSelector and node labels
- minikf: Track latest wheels
- githooks: Fix a bug when checking config version
- gw: Make the Rok gateway UI pass Prettier checks
- lvmd: Support variable dm-era tracking granularity
- Add support for building Python 3 wheels for the Rok client
- Make the Rok client Python 3 compatible
- Make the rok_common library Python 3 compatible
- New icons in Rok UI
- doc: Add copy buttons in all doc’s code blocks
- pu: connect all PUs to the external controller by default
- lvmd: Introduce DM snapshots
- test: Set start_new_session instead of new_session
- Use HttpClient in Rok Registry UI
- ci: Fix hashing test to consider duplicate offsets
- rdm: Add support for RAID arrays
- rdm: Export attributes of block devices
- tests: Fix leaks discovered by ASan
- cmake: Use correct soversion for libetcd3
- Add JSON and CSV output format to the Rok client
- rok-do: Introduce rok-do CLI tool
- doc: Use correct apt files on source install guides
- gw: Dynamically resize file chooser window
- Add extra validation checks in Rok Gateway
- docker: Add missing syslog argument
- Implement dialog for copying files
- cmdutils: Support Popen kwargs and remove some shell=True commands
- operator: Fix an operator regression wrt platforms
- operator: Uniformly sync resources
- conf: Improve diffing support and move it under rok_common
- gw: Add a Django cache to cache the chock size
- operator: Produce events on cluster CR
- cli: Add QuestionContext and expose question threshold through args
- operator: Always refresh cluster driver cache
- lvmd: Add support for configuring the snapshot chunk size
- libtasks: Separate null and empty answers and add boolean-type question
- Switch to Stretch builds
- Deploy Rok Registry using Rok Operator
- Remove deprecated Http class from UI
- Fix rok-init bugs
- Allow Fort to filter LDAP users by group
- doc: Extend Sphinx configuration to include versioned manifests from a specified path