Hello everyone, my name is Romain and today we are going to talk about machine learning models deployment, but let's take a road less travelled!
Let's start with a simple question: please raise your hand if you are able to deploy machine learning models from day one? And when I say "deploy", I mean by any means necessary so that someone other than the data science team itself can make use of it somehow... ... Okay so as you can see, not so many people can do it. Let's understand why...
A lot of surveys and my own personal field experience bring to light the following major concerns regarding ML models: - Most models never reach production (some numbers are saying something like 80% of models stay in notebooks) - Data science and engineering teams are facing technical difficulties to collaborate and integrate said models - A lack of user feedback leaves model at initial stage, without confrontation to the real world - Stakeholders have a hard time understanding the business value brought by said models - And lastly, privacy concerns are more and more expressed, and rightfully so. How can we do better and make sure models go as far as they can in the industrialization process and reach their goals?
Here are some ideas to alleviate some of them: - Deploy simple demos at no cost and early-on: provide quick user feedback and showcase potential business value to stakeholders - Use standard model interchange formats to ease the path to integration with engineering teams, and debunk integration issues early-on - Make use of inference at the edge for cost-efficient (no infrastructure needed) and privacy-respecting deployments Using these simple rules, we can already go a long way in solving some of the issues data teams are experiencing.
TODO: add HTML/CSS/JS + ONNX + WASM logos
Okay but how do I go from there? Here is a proposal to address each of these points: - Static websites are easy to build, easy deploy at literally zero-cost and zero security risks (no backend server involved expecting for serving static assets such as HTML/CSS/JS) - ONNX has been the current standard for ML model interchange format, providing a programming-language-agnostic representation of a computational graph - The modern standard WebAssembly provides an efficient computation engine at the edge, and most notably in each an every web-browser Let's go for a quick recap on ONNX and WebAssembly technologies before we can take it for a spin
So ONNX. Stands for "Open Neural Network Exchange" but it applies to any computational graph you can describe with the builtin operators (which is almost anything at this point), not only neural networks. I'm sure most of you know the format but let me give you a quick recap: - Provides a common file format between training and inference stages - You get loose coopling between training and inference implementations - It is language and backend agnostic - By that I mean that you get to train a model with any ML framework of your choice, export it to ONNX, and get automatic interoperability with any implementation consuming ONNX models, be it on a backend server, on a desktop app, a mobile app and even IoT devices! It is also compatible with accelerated hardware such as GPUs, TPUs, NPUs and custom chips. Such standard interchange model format is a key enabler in reducing the friction with engineering teams try to integrate models into existing systems
The ecosystem around ONNX is great. For instance, you can use the client-side webapp Netron to visualize your ONNX models
What if all we've done manually so far could be performed automatically? Like simply providing an ONNX model and some descriptive metadata about inputs and outputs tensors? Let me introduce a new open-source project I have been experimenting with very recently: "modelship" This is a local first Python utility application (in the form of a CLI) It is open-source under the Apache-2 license. For now it is very basic, but it allows to you to generate a static web application, with an autogenerated form for inputs and performs model inference using ONNX Runtime Web. The easiest way to try it: "uvx modelship" (you can of course "pip install modelship")
Anticipated questions: