Your submission was sent successfully! Close

You have successfully unsubscribed! Close

What is KFServing?

TL;DR: KFServing is a novel cloud-native multi-framework model serving tool for serverless inference.

A bit of history

KFServing was born as part of the Kubeflow project, a joint effort between AI/ML industry leaders to standardize machine learning operations on top of Kubernetes. It aims at solving the difficulties of model deployment to production through the “model as data” approach, i.e. providing an API for inference requests.

What is KFServing?

KFServing abstracts away the complexity of server configuration, networking, health checking, autoscaling of heterogeneous hardware (CPU, GPU, TPU), scaling from zero, and progressive (aka. canary) rollouts. It provides a complete story for production ML serving that includes prediction, pre-processing, post-processing and explainability, in a way that is compatible with various frameworks – Tensorflow, PyTorch, XGBoost, ScikitLearn, and ONNX.

Check out the KFServing repository on github for a deeper dive.

How does it work?

KFServing uses two well-known cloud-native technologies at its core: Knative and Istio.

Knative is a Kubernetes-based platform to deploy and manage modern serverless workloads. This grants to KFServing the properties of:

  • Scaling to and from zero: optimizing costs associated with inference
  • Auto-scale of GPUs and TPUs: reducing latency with specialized hardware at a cost per demand

Istio is a service mesh technology that works through the concept of Kubernetes sidecars. A sidecar container is added to every pod, handling all network traffic. This enables:

  • Canary roll-outs: allowing for safe model updates across users
  • Traffic to the model: routing and ingress management
  • Observability: tracing, monitoring, and logging features for your models
  • Load balancing: HTTP, gRPC, WebSocket, and TCP traffic
  • Security: authentication, authorization, and encryption of service communication at scale

KFServing itself provides a Kubernetes Custom Resource Definition – an object that extends the Kubernetes API – specific for serving machine learning models saved on various frameworks, such as Tensorflow, PyTorch, XGBoost, ScikitLearn, and ONNX, into production environments.

How can I use it?

To use KFServing, you need to create a YAML file of the “InferenceService” common interface. You can see an example below:

As you can see, it is possible to specify things like what hardware should be used and to what extent it should scale up or down, and what framework is the model saved in.

Progressive / Canary rollouts

Upon defining a canary in the InferenceService YAML, a canary end-point will be created and traffic will be routed using Istio to comply with your required quotas. This is combined with the transformer, explainer and predictor according to the following architecture:

Pre-processing and post-processing data for inference

KFServing includes the concept of “Transformers”, allowing you to orchestrate transformations to the data before or after inference. 

One use case to this is, e.g. your model is good at classifying images of 28×28 pixels, however, your new data may come from a high-resolution camera. To comply with your model requirements, you have to pre-process this data through a transformation to the new data and only then run inference.

You can accomplish this by defining a docker image with your pre-processing or post-processing steps and adding it to the InferenceService YAML that defines your model serving job.

Model explainability

One optional feature of KFServing, that makes use of Seldon Alibi, is the ability to add an “explainer” that enables an alternate data plane, providing model explanations in addition to predictions.

Learn more about KFServing and Kubeflow

Canonical provides Kubeflow training for enterprises alongside professional services such as security and support, custom deployments, consulting, and fully managed Kubeflow – read more on Ubuntu’s AI services page.

Try it out now!

If you’re looking for standalone KFServing, deploy the KFServing operator for a low ops deployment.

If you’re looking for a full-fledged MLOps platform. Get the latest Kubeflow packaged in Operators, providing composability, day-0 and day-2 operations for all Kubeflow applications including KFServing.

Get up and running Kubeflow in 5 minutes

kubeflow logo

Run Kubeflow anywhere, easily

With Charmed Kubeflow, deployment and operations of Kubeflow are easy for any scenario.

Charmed Kubeflow is a collection of Python operators that define integration of the apps inside Kubeflow, like katib or pipelines-ui.

Use Kubeflow on-prem, desktop, edge, public cloud and multi-cloud.

Learn more about Charmed Kubeflow ›

kubeflow logo

What is Kubeflow?

Kubeflow makes deployments of Machine Learning workflows on Kubernetes simple, portable and scalable.

Kubeflow is the machine learning toolkit for Kubernetes. It extends Kubernetes ability to run independent and configurable steps, with machine learning specific frameworks and libraries.

Learn more about Kubeflow ›

kubeflow logo

Install Kubeflow

The Kubeflow project is dedicated to making deployments of machine learning workflows on Kubernetes simple, portable and scalable.

You can install Kubeflow on your workstation, local server or public cloud VM. It is easy to install with MicroK8s on any of these environments and can be scaled to high-availability.

Install Kubeflow ›

Newsletter signup

Select topics you're
interested in

In submitting this form, I confirm that I have read and agree to Canonical's Privacy Notice and Privacy Policy.

Related posts

Large language models (LLMs): what, why, how?

Large language models (LLMs) are machine-learning models specialised in understanding natural language. They became famous once ChatGPT was widely adopted...

Kubeflow vs MLFlow: which one to choose?

Data scientists and machine learning engineers are often looking for tools that could ease their work. Kubeflow and MLFlow are two of the most popular...

Charmed MLFlow Beta is here. Try it out now!

Canonical’s MLOps portfolio is growing with a new machine learning tool. Charmed MLFlow 2.1 is now available in Beta. MLFlow is a crucial component of the...