GitOps for ML: Converting Notebooks to Reproducible Pipelines

Actions Panel

GitOps for ML: Converting Notebooks to Reproducible Pipelines

A hands-on introduction to MLOps with DVC and CML - Rob de Wit

When and where

Date and time

Location

Online

About this event

  • 1 hour 30 minutes
  • Mobile eTicket

Jupyter Notebooks are an invaluable tool for prototyping in machine learning projects. At a certain point, however, running experiments in them becomes messy. As a result, we tend to lose track of which data, code, and parameters resulted in a given model version.

At that point, you will want to move towards a truly reproducible pipeline. This will help you run experiments in a structured manner and find the best-performing models. At Iterative we build open-source tools that help you do just that.

In this hands-on workshop, we’ll take a prototype in a Jupyter Notebook and transform it into a DVC pipeline. We’ll then use that pipeline locally to run and compare a few experiments. Lastly, we’ll explore how CML will allow us to take our model training online. We’ll use it in conjunction with GitHub Actions to trigger our model training every time we push changes to our repository.

We’ll cover the following steps:

  • Brief introduction to DVC: data versioning and reproducibility
  • Parameterize a Jupyter Notebook and transform it into a DVC pipeline
  • Run, track, and compare experiments with DVC
  • Take our model training online with CML

By the end of this workshop, you’ll be able to convert a notebook into a fully reproducible pipeline and run it in externally with a GitHub runner. With these skills you’ll know how to take a Gitops approach to ML projects.

Prerequisites:

About the speaker:

Rob is a developer advocate at Iterative AI. He’s got a background in information sciences, and experience in data analytics and engineering. Previously he worked in FinTech, and right now he’s learning a whole lot about MLOps. He’s got first-hand experience with the gap between data scientists and engineers, and particularly interested in bridging that gap.

DataTalks.Club is the place to talk about data. Join our slack community!

This event is sponsored by Iterative.ai