Troubleshooting FAQ

This section contains various troubleshooting instructions regarding Rok deployment or cleanup.

Rok cleanup is stuck

In this section we describe the reason why Rok cleanup might be stuck and the way to fix it.

Reason

The RokCluster Custom Resource (CR) is protected by the rokclustercleanup.arrikto.com finalizer.

Normally, upon CR deletion, Rok Operator would remove this finalizer allowing the resource to be actually deleted.

However, if Rok Operator is not running when the RokCluster is marked for deletion the rokclustercleanup.arrikto.com finalizer will remain on the CR and its deletion will block indefinitely. In addition, any attempt to delete the rok namespace (where the RokCluster CR lives by default) will not succeed, since Kubernetes needs to ensure that all resources that exist in a namespace are deleted before deleting the namespace itself.

Workaround

In order to unblock from this situation you can manually remove the rokclustercleanup.arrikto.com finalizer from the RokCluster CR using kubectl:

$ kubectl edit rokcluster -n rok rok
apiVersion: crd.arrikto.com/v1alpha1 kind: RokCluster metadata: ... deletionTimestamp: "2020-06-25T07:10:47Z" finalizers: - rokclustercleanup.arrikto.com # <-- Remove this line. ...

Kubeflow cleanup is stuck

In this section we describe the reason why Kubeflow might be stuck and the way to fix it.

Reason

It may occur that user namespaces, kubeflow-XXX, are stuck in a Terminating phase. If that is the case, you should list all resources in the namespace to see what does not get deleted:

$ kubectl api-resources --verbs=list --namespaced -o name | \ > xargs -n1 kubectl get --show-kind --ignore-not-found -n kubeflow-XXX

If you find Katib Trial Custom Resources, it is because they are protected by the clean-metrics-in-db finalizer.

Due to a race with respect to resource deletions, trials cannot fulfill their finalizer and, thus, are never deleted.

Workaround

To unblock this, you should patch every Trial and delete its finalizers:

$ kubectl get trials -A -o json | \ > jq -r '.items[] | .metadata.namespace, .metadata.name' | paste - - | \ > xargs -r -n1 -I{} kubectl patch trial -n {} -p '{"metadata":{"finalizers":[]}}' --type=merge