Invoke Existing InferenceService with External Client

In this section, you will query an existing KServe InferenceService for predictions using an external client.

What You’ll Need


  1. Store the ServiceAccount token you have acquired in a local file. For example serving.token.

  2. Start a Python3 kernel in your working environment:

    user@local:~$ python3
  3. Prepare the data you will use to query your model and store them in the model_data variable:

    >>> model_data = [[0.24075317948856828, -0.8274159585641723, -0.20902325728602528, 0.38115838488277776, 1.2897527540827456, 0.8862935614189356, 0.5885784044206096, -0.8505204542093001, 2.601683114180395, 0.5655096456315442, 0.6653007744902968, 0.3088330125989638, -0.5805234498047227, -1.7607627591558177, 1.702214944635238, 0.5257689308069179, 0.7533416211045325, -1.2242982362893657, 0.6731813512699584, -0.13845598398377382]]


    This input is tailored to the Serve Model from Notebook example.

  4. Specify the token itself or a URI to the file where you have stored the token:

    >>> token = "file:serving.token"
  5. Specify the URL where the Kubernetes API is exposed for your cluster:

    >> kubernetes_host = ""
  6. Navigate to the Models UI:

  7. Navigate to the details of the model you want to invoke by clicking on the name of the model:

  8. Specify the name of your model:

    >>> model_name = "test-0-3garl"
  9. Copy the URL external field:

  10. Specify the URL of your model by pasting the URL external of the model.

    >>> model_url = ""


    Use https instead of http.

  11. Prepare the URL and data for the request:

    >>> import json >>> import requests >>> data = json.dumps({"instances": model_data}) >>> predict_url = "%s/v1/models/%s:predict" % (model_url, model_name)


    This assumes that your model supports Data Plane v1

  12. Initialize the authentication mechanism with ServiceAccount tokens for Kubernetes:

    >>> from rok_kubernetes.auth import ServiceAccountAuth >>> auth = ServiceAccountAuth(sa_token=token, kubernetes_host=kubernetes_host, verify=True)
  13. Make an authenticated request and verify it succeeds with <Response [200]>:

    >>> r =, auth=auth, data=data, verify=True) >>> print(r) <Response [200]>
  14. Inspect the response to get your predictions:

    >>> print(r.json()) {"predictions": [1]}
  15. You can see the short-lived token with:

    >>> from rok_common.protect import unprotect_obj >>> unprotect_obj(auth.short_lived_token)


You have successfully invoked a served model using an external client.

What’s Next

Check out the rest of the documentation regarding KServe.