Actions Panel
GitOps for ML: Converting Notebooks to Reproducible Pipelines
A hands-on introduction to MLOps with DVC and CML - Rob de Wit
When and where
Date and time
Tuesday, February 21 · 7:30 - 9am PST
Location
Online
About this event
Jupyter Notebooks are an invaluable tool for prototyping in machine learning projects. At a certain point, however, running experiments in them becomes messy. As a result, we tend to lose track of which data, code, and parameters resulted in a given model version.
At that point, you will want to move towards a truly reproducible pipeline. This will help you run experiments in a structured manner and find the best-performing models. At Iterative we build open-source tools that help you do just that.
In this hands-on workshop, we’ll take a prototype in a Jupyter Notebook and transform it into a DVC pipeline. We’ll then use that pipeline locally to run and compare a few experiments. Lastly, we’ll explore how CML will allow us to take our model training online. We’ll use it in conjunction with GitHub Actions to trigger our model training every time we push changes to our repository.
We’ll cover the following steps:
- Brief introduction to DVC: data versioning and reproducibility
- Parameterize a Jupyter Notebook and transform it into a DVC pipeline
- Run, track, and compare experiments with DVC
- Take our model training online with CML
By the end of this workshop, you’ll be able to convert a notebook into a fully reproducible pipeline and run it in externally with a GitHub runner. With these skills you’ll know how to take a Gitops approach to ML projects.
Prerequisites:
- A GitHub account; Git fundamentals are expected knowledge
- Potentially useful: https://iterative.ai/blog/jupyter-notebook-dvc-pipeline
About the speaker:
Rob is a developer advocate at Iterative AI. He’s got a background in information sciences, and experience in data analytics and engineering. Previously he worked in FinTech, and right now he’s learning a whole lot about MLOps. He’s got first-hand experience with the gap between data scientists and engineers, and particularly interested in bridging that gap.
DataTalks.Club is the place to talk about data. Join our slack community!
This event is sponsored by Iterative.ai