SREcon24 Americas - Optimizing Resilience and Availability by Migrating from JupyterHub to the...
USENIX USENIX
34K subscribers
70 views
0

 Published On Apr 18, 2024

SREcon24 Americas - Optimizing Resilience and Availability by Migrating from JupyterHub to the Kubeflow Notebook Controller

David Hoover and Alexander Perlman, Capital One

This presentation details our transition from JupyterHub to the Kubeflow Notebook Controller.

JupyterHub was architected in a backend agnostic way that "supports" Kubernetes but isn't truly Kubernetes-native. As a result, it has significant shortcomings with respect to resilience and high availability. In particular, the core component, the hub API, can only have one replica at any given time.

In contrast, The Kubeflow Notebook controller is built from the ground up to be Kubernetes native using the operator pattern. There's far less complexity, fewer components, less brittleness, and improved resilience and high availability.

As a result, our platform has been able to scale to four times as many users, including ten times as many concurrent executions. Our users are happier and there's less operational overhead for platform engineers. Our journey illustrates how properly leveraging Kubernetes-native architecture confers significant benefits.

View the full SREcon24 Americas program at https://www.usenix.org/conference/sre...

show more

Share/Embed