(Updated!) How to Get Started with Kubeflow

Imagine running a machine learning training and inference stack wherever… on whatever… with little no configuration needed to flow from one piece of hardware to the next, from a hybrid environment to cloud vendor X…

Mind blown…

Enter Kubeflow.

What is Kubeflow? Think of being able to run TensorFlow jobs at scale on containers with the same scalability as containers and container orchestration that comes with Kubernetes.

“Our goal is to make scaling machine learning (ML) models and deploying them to production as simple as possible, by letting Kubernetes do what it’s great at:

Easy, repeatable, portable deployments on a diverse infrastructure (laptop <-> ML rig <-> training cluster <-> production cluster)

Deploying and managing loosely-coupled microservices

Scaling based on demand” (Kubeflow.org)

What? Okay. Let’s consider what it takes to compose a machine learning training and inference rig:

With Kubernetes, you’re able to abstract away the operating system down in terms of managing resources:

Operating system and hardware abstracted with Kubernetes and containers.

And with Kubeflow, all you need to worry about is running your model training and, in proper Kubernetes fashion, setting a goal and parameters for those processes to hit while allowing Kubeflow to manage the rest:

Now, let’s be abundantly clear- Kubeflow as it is now is not necessarily as simple as a one click solution (a very loaded expression regardless of the technology at hand). As of the last update to this article, Kubeflow has yet to reach it’s 1.0 enterprise ready release. But what they do allow you to do is determine parameters for resources available in your environment to use to complete the full machine learning development cycle, inference, and model retraining over time. Optimizing how Kubeflow is implementing with your TensorFlow model will vary (for the technical development side) especially if you are dealing with juggling GPUs, CPUs, and TPUs.

Here are some quick resources to get you started:

  • Intro to Kubeflow blog post here.
  • The intro to Kubeflow documentation page with a bit more of a detailed layout of the current components that go into Kubeflow.
  • For edification, the building blocks or components of Kubeflow currently under the heaviest development and progress by the open source community include Pipelines, Serving, Training, Jupyter Notebooks, and Fairing (a Python package that enabled easy multi-cloud and hybrid implementations).
  • Get your hands dirty right away by running through this Kataconda lab here. You WILL have an ah-ha moment where you see the distribution of an ML job scale across nodes. Then the usefulness of all of the things begin making sense.
  • The Introduction to Kubeflow Codelab- set up the whole damn thing, the training and serving stack, and see how you can connect a flask server and UI to then serve your model and perform inference!
  • This landing page at Kubeflow.org is a great resource for more advanced/further exploration in the form of blog posts, tutorials, etc.
  • An easy way of installing Ksonnet without fighting the war I had to fight on my Macbook.
  • Learn how to perform TensorFlow training jobs here.
  • Learn how to perform TensorFlow serving here.
  • How to get a Jupyter notebook started here.
  • And last but not least, the Kubeflow Github repo. Scroll all the way down on the readme to get step-by-step instructions on how to install KSonnet and the necessary packages to run Kubeflow on your laptop.

For those feeling extra nerdy and enthused at this point about all of the buzzwords coming together in a single technology (containers! Machine Learning Microservices! Is that even a thing?! Keep saying cool words!) stay tuned for a deeper dive into Kubeflow v2.0…