Elixir and Machine Learning: Q3 2023 roundup
Back in 2021, the Elixir community started an effort to bring Elixir and Machine Learning together. Over the last three months, the community has released new versions of several key projects as well as the announcement of new libraries that build upon the existing foundation. That’s what we will explore in this blog post.
As we will see, this is a transitional period of our Machine Learning effort. As our Data and Machine Learning foundations become solid and stable, we are now seeing an increased focus on the scalability, integration, and productivity of our tools, many of them guided by production feedback.
Let’s get started!
Nx (Numerical Elixir)
Nx is the project that started it all. It plays a similar role as Numpy within the Elixir community, with support for just-in-time compilation to both CPUs and GPUs. With v0.6, further improve its abilities to parallelize and stream data. Let’s start with some context.
The Nx library comes with its own tensor serving abstraction, called
Nx.Serving, allowing developers to serve both neural networks and traditional machine learning models within a few lines of code.
When you are running code on the GPU, you often want to process entries in parallel for performance. Instead of classifying one image, you want to classify 8 at once. Rather than summarizing one text, you want to summarize 16 simultaneously, and so on. To allow this,
Nx.Serving automatically performs batching of requests.
Nx.Serving is also capable of distributing requests across multiple nodes and multiple GPUs with a single line of code change, something we call “Distributed² Serving”.
However, the features above are already 5 months old. :) In the last month or so,
Nx.Serving added two notable features.
The first one is batch keys. When working with text, we often need to pad the texts. Imagine you want to summarize different texts, one has 100 characters, the other 500 characters, and the other 1000 characters. Of course, you could always pad the text to the largest one, but ideally you want to batch small texts with smaller ones, and larger with large ones. Batch keys allow you to effectively define different queues, based on the text size. You can see the discussion that led to the implementation of this feature for charts and insights.
We also added streaming support to
Nx.Serving, of both inputs and outputs. When you use ChatGPT, have you noticed how the response is streamed as it arrives? That’s output streaming and is now supported out-of-the-box in Nx. We will see a practical usage of these features when talking about the Bumblebee project down below.
Finally, the other major feature in Nx is auto-vectorization. Remember when I said that, when working with the GPU, we want to process entries in parallel? However, in order to classify or summarize 32 images/texts at once, you must write your code in a way that can handle your input in batches. With Nx v0.6, you can write your code in a way that classifies or manipulates a single image, and we automatically make it work on a batch of images through a process called vectorization (as in, we are converting a scalar into a vector). Not only that, vectorization often allows developers to simplify existing complex code, as shown here and here.
In summary, Nx v0.6 comes with large improvements on writing and deploying numerical code efficiently.
The latest versions of Explorer do a tremendous job in the integration department. You can now access
.parquet and other formats directly from S3, URLs, and other sources. In particular, for columnar formats such as
Parquet, you can lazily stream data in and out of S3 bucket, tailored to your queries.
Latest Explorer also features integration with ADBC, a database connectivity specification based on the Apache Arrow columnar format. This allows you to query databases such as PostgreSQL, SQLite3, Snowflake, and others, and directly load the results into your dataframe. Shout out to Cocoa Xu for implementing the low-level ADBC bindings for Elixir.
Not only that, Explorer provides zero-copy integration with Nx. This means you can load external data into your dataframes and send them to the GPU trivially. The only times the data will be copied is when crossing the boundaries from IO to memory and then from memory to GPU.
In summary, Explorer v0.7 brings elegant querying and efficient data transfers across a huge variety of projects and needs.
Bumblebee v0.4 brings support for both GPT-NeoX and LLaMA models, including LLaMA 2, as well as built-in text and image embedding servings. It also supports the new
.safetensors format from Hugging Face.
Furthermore, Bumblebee builds on top of the latest Nx features to add streaming to several of its text-generation models.
The Whisper model, which provides speech-to-text, was the one to benefit the most from Nx advancements. Originally, Whisper can only transcribe up to 30 seconds of audio, leaving it up to the user to break large files into smaller chunks.
Now, thanks to Jonatan Kłosko’s work, a Whisper-serving can automatically split and stream audio chunks, and results are streamed as they arrive, now also including timestamps. Not only that, once a large file is split, its different chunks are processed in parallel, resulting in excellent speech-to-text performance, specially on the GPU. We are working on some exciting demos for Livebook’s upcoming launch week, meanwhile here is a sneak peek.
While deep learning was a major driver behind Nx, Mateusz Słuszniak has been focused on traditional machine learning techniques with the Scholar project (akin to
Since Scholar is built on top of Nx, it means all models also run on the GPU and can be deployed using
New projects and learning resources
Sean Moriarity has published the much awaited Machine Learning in Elixir book, which is an excellent way to get started with Machine Learning in Elixir.
Although they were released back in Q2 2023, it is worth calling out Andrés Alejos’ work on EXGBoost (which provides distributed gradient boosting) and Mockingjay. The latter is able to compile decision trees into tensor operations, bringing
Nx.Serving and GPU support to decision trees. Checkout his talk at ElixirConf US 2023 to learn more.
Paulo Valente, from DockYard, has released the first version of Rein, a library that brings reinforcement learning tooling to Nx.
Panagiotis Nezis has published Tucan, a high-level plotting library on top of Vega-Lite, similar to
seaborn. The project deserves special highlight for its excellent documentation, which includes plenty of examples and plots.
Finally, two weeks ago, Mark Ericksen released his port of LangChain for Elixir. At their core, LLM Agents have to perform tasks and communicate with services. Given the Erlang VM roots in telecomunications, Elixir is an excellent platform for carrying these out, efficiently and concurrently. Check out Charlie Holtz talk on Building AI Apps with Elixir, which explores these concepts with insightful and entertaining demos.
There is still a lot I have not mentioned, including many other Machine Learning talks at ElixirConf US 2023. We invite you to dig deeper, discover, and learn more!
For the next steps, optimization areas are likely to gain further attention. We want to bring first-class quantization, MLIR support, optimizations to pre-trained models (such as Flash Attention), and more. We also hope to further streamline the experience for fine-tuning existing models in the future.
The future is bright for Elixir and Machine Learning, enjoy!