Troubleshooting FAQ¶
This section contains various troubleshooting instructions regarding Rok deployment or cleanup.
Rok cleanup is stuck¶
In this section we describe the reason why Rok cleanup might be stuck and the way to fix it.
Reason¶
The RokCluster
Custom Resource (CR) is protected by the
rokclustercleanup.arrikto.com
finalizer.
Normally, upon CR deletion, Rok Operator would remove this finalizer allowing the resource to be actually deleted.
However, if Rok Operator is not running when the RokCluster
is marked for
deletion the rokclustercleanup.arrikto.com
finalizer will remain on the CR
and its deletion will block indefinitely. In addition, any attempt to delete the
rok
namespace (where the RokCluster
CR lives by default) will not
succeed, since Kubernetes needs to ensure that all resources that exist in a
namespace are deleted before deleting the namespace itself.
Workaround¶
In order to unblock from this situation you can manually remove the
rokclustercleanup.arrikto.com
finalizer from the RokCluster
CR using
kubectl
:
Kubeflow cleanup is stuck¶
In this section we describe the reason why Kubeflow might be stuck and the way to fix it.
Reason¶
It may occur that user namespaces, kubeflow-XXX
, are stuck in a
Terminating
phase. If that is the case, you should list all resources in
the namespace to see what does not get deleted:
If you find Katib Trial Custom Resources, it is because they are protected by
the clean-metrics-in-db
finalizer.
Due to a race with respect to resource deletions, trials cannot fulfill their finalizer and, thus, are never deleted.
Workaround¶
To unblock this, you should patch every Trial and delete its finalizers: