Machine Learning in the Browser: Fast Iteration with ONNX & WebAssembly

Romain Clement

PyData Paris 2025
October 1st 2025

🙋 Who is able to deploy ML models from day one?

Machine Learning in the Browser: Fast Iteration with ONNX & WebAssembly

🤕 Hard truth about ML models

Many surveys and field experience show :

  • Most models never reach production
  • Technical difficulties for integration
  • Lack of user feedback
  • Difficulty to showcase business value
  • Privacy concerns
Romain Clement - PyData Paris 2025
Machine Learning in the Browser: Fast Iteration with ONNX & WebAssembly

💡 Can we do better?

💰 Deploy simple demos at no cost
⚙️ Use standard model interchange formats
🌍 Inference at the edge

Romain Clement - PyData Paris 2025
Machine Learning in the Browser: Fast Iteration with ONNX & WebAssembly

💡 Can we do better?

💰 Deploy simple demos at no cost → Static websites
⚙️ Use standard model interchange formats → ONNX
🌍 Inference at the edge → WebAssembly

Romain Clement - PyData Paris 2025
Machine Learning in the Browser: Fast Iteration with ONNX & WebAssembly

⚙️ What is ONNX?

Open Neural Network Exchange

  • Generic ML model representation
  • Common file format
  • Training / inference loose coopling
  • Language-agnostic
  • Backend-agnostic
  • Interoperability
Romain Clement - PyData Paris 2025
Machine Learning in the Browser: Fast Iteration with ONNX & WebAssembly

🤝 ONNX Models

Export models from favourite framework:

⚠️ Some models or layer types might not be supported by generic operators yet!

Romain Clement - PyData Paris 2025
Machine Learning in the Browser: Fast Iteration with ONNX & WebAssembly

🤝 ONNX Models

Using Netron to visualize an ONNX model

Romain Clement - PyData Paris 2025
Machine Learning in the Browser: Fast Iteration with ONNX & WebAssembly
Machine Learning in the Browser: Fast Iteration with ONNX & WebAssembly

🤝 ONNX Runtime

  • C/C++
  • Python
  • ...
  • Web!
Romain Clement - PyData Paris 2025
Machine Learning in the Browser: Fast Iteration with ONNX & WebAssembly

🌍 What is WebAssembly?

WASM

  • Portable compilation target
  • Client and server applications
  • Major browsers support (desktop, mobile)
  • Fast, safe and open
  • Privacy

⚠️ Some restrictions may apply, especially regarding available memory (4GB)

Romain Clement - PyData Paris 2025
Machine Learning in the Browser: Fast Iteration with ONNX & WebAssembly

🌍 WebAssembly

Famous usage in Data Science ecosystem:

Romain Clement - PyData Paris 2025
Machine Learning in the Browser: Fast Iteration with ONNX & WebAssembly

Getting started with ONNX Runtime Web

Housing value estimation demo

✅ Train a regressor with Scikit-Learn
✅ Export it to ONNX
✅ Integrate it into a static website

Source: rclement/pydata-paris-2025-ml

Romain Clement - PyData Paris 2025
Machine Learning in the Browser: Fast Iteration with ONNX & WebAssembly

Getting started with 🤗 Transformers.js

Text summarizer demo

✅ Pre-trained lightweight ONNX LLM (Google Gemma 3 270m ~1GB)
✅ Integrate it into a static website

Source: rclement/pydata-paris-2025-ml

Romain Clement - PyData Paris 2025
Machine Learning in the Browser: Fast Iteration with ONNX & WebAssembly

🚢 Introducing modelship

Local-first utility Python app

Automate the generation of application from models

OSS Apache-2 licensed

Get started:

Romain Clement - PyData Paris 2025
Machine Learning in the Browser: Fast Iteration with ONNX & WebAssembly

🚢 modelship demo

Let's try to gap the "extra mile"

✅ Build static web app from ONNX model and metadata
✅ Deploy on GitHub Pages using CI/CD

Source: rclement/pydata-paris-2025-modelship-demo

Romain Clement - PyData Paris 2025
Machine Learning in the Browser: Fast Iteration with ONNX & WebAssembly

🚢 modelship demo

Try the deployment yourself!

https://rclement.github.io/pydata-paris-2025-modelship-demo/

Romain Clement - PyData Paris 2025

🚢 modelship demo

Machine Learning in the Browser: Fast Iteration with ONNX & WebAssembly

🚢 Takeaway

Just start shipping!

Romain Clement - PyData Paris 2025
Machine Learning in the Browser: Fast Iteration with ONNX & WebAssembly

Romain CLEMENT

Indepedent consultant at Datalpia

Meetup Python Grenoble co-organizer

🌐 datalpia.com
🌐 romain-clement.net
🔗 linkedin.com/in/romainclement

Romain Clement - PyData Paris 2025
Machine Learning in the Browser: Fast Iteration with ONNX & WebAssembly

🙋 Questions ?

Thank you! Let's chat!

Romain Clement - PyData Paris 2025
Machine Learning in the Browser: Fast Iteration with ONNX & WebAssembly

📚 References

Romain Clement - PyData Paris 2025

Hello everyone, my name is Romain and today we are going to talk about machine learning models deployment, but let's take a road less travelled!

Let's start with a simple question: please raise your hand if you are able to deploy machine learning models from day one? And when I say "deploy", I mean by any means necessary so that someone other than the data science team itself can make use of it somehow... ... Okay so as you can see, not so many people can do it. Let's understand why...

A lot of surveys and my own personal field experience bring to light the following major concerns regarding ML models: - Most models never reach production (some numbers are saying something like 80% of models stay in notebooks) - Data science and engineering teams are facing technical difficulties to collaborate and integrate said models - A lack of user feedback leaves model at initial stage, without confrontation to the real world - Stakeholders have a hard time understanding the business value brought by said models - And lastly, privacy concerns are more and more expressed, and rightfully so. How can we do better and make sure models go as far as they can in the industrialization process and reach their goals?

Here are some ideas to alleviate some of them: - Deploy simple demos at no cost and early-on: provide quick user feedback and showcase potential business value to stakeholders - Use standard model interchange formats to ease the path to integration with engineering teams, and debunk integration issues early-on - Make use of inference at the edge for cost-efficient (no infrastructure needed) and privacy-respecting deployments Using these simple rules, we can already go a long way in solving some of the issues data teams are experiencing.

TODO: add HTML/CSS/JS + ONNX + WASM logos

Okay but how do I go from there? Here is a proposal to address each of these points: - Static websites are easy to build, easy deploy at literally zero-cost and zero security risks (no backend server involved expecting for serving static assets such as HTML/CSS/JS) - ONNX has been the current standard for ML model interchange format, providing a programming-language-agnostic representation of a computational graph - The modern standard WebAssembly provides an efficient computation engine at the edge, and most notably in each an every web-browser Let's go for a quick recap on ONNX and WebAssembly technologies before we can take it for a spin

So ONNX. Stands for "Open Neural Network Exchange" but it applies to any computational graph you can describe with the builtin operators (which is almost anything at this point), not only neural networks. I'm sure most of you know the format but let me give you a quick recap: - Provides a common file format between training and inference stages - You get loose coopling between training and inference implementations - It is language and backend agnostic - By that I mean that you get to train a model with any ML framework of your choice, export it to ONNX, and get automatic interoperability with any implementation consuming ONNX models, be it on a backend server, on a desktop app, a mobile app and even IoT devices! It is also compatible with accelerated hardware such as GPUs, TPUs, NPUs and custom chips. Such standard interchange model format is a key enabler in reducing the friction with engineering teams try to integrate models into existing systems

The ecosystem around ONNX is great. For instance, you can use the client-side webapp Netron to visualize your ONNX models

What if all we've done manually so far could be performed automatically? Like simply providing an ONNX model and some descriptive metadata about inputs and outputs tensors? Let me introduce a new open-source project I have been experimenting with very recently: "modelship" This is a local first Python utility application (in the form of a CLI) It is open-source under the Apache-2 license. For now it is very basic, but it allows to you to generate a static web application, with an autogenerated form for inputs and performs model inference using ONNX Runtime Web. The easiest way to try it: "uvx modelship" (you can of course "pip install modelship")

Anticipated questions: