Elixir and Machine Learning in 2024 so far: MLIR, Apache Arrow, structured LLM, and more

Back in 2021, the Elixir community started an effort to bring Elixir and Machine Learning together. We have exciting updates to share since our last roundup, including: MLIR support, rich Arrow types, traditional machine learning, structured LLM, and more.

Numerical Elixir (Nx)

Nx is the project that started it all. It plays a similar role as Numpy within the Elixir community, with support for just-in-time compilation to both CPUs and GPUs.

Since we started working on Numerical Elixir, the Machine Learning landscape changed considerably, and one of such changes was driven by the introduction of MLIR. With the recent release of Nx v0.7, we have ported our Google XLA (Accelerated Linear Algebra) bindings to MLIR, which will hopefully open up the Numerical Elixir to several new exciting possibilities, such as:

  • Support for Metal on Apple Silicon
  • Support for quantization
  • Support for cross-compilation to embedded devices, including mobile (such as Android and iOS)

We thank Paulo Valente and DockYard for driving this effort!

Explorer

Another key project is Explorer, which provides series and dataframes for Elixir. While playing a similar role as Pandas, its biggest inspiration is Tidyverse’s dplyr.

With Explorer v0.8, we have now full compatibility with Arrow numeric types. We also added support for list and struct types, alongside a collection of functions to work with these types, such as splitting strings into a list, joining a list of strings into a single string, functions for checking membership, computing lengths, JSON decoding, and much more.

On top of that, we have improved our support for streaming data in and out of S3-compatible storage for several formats, including .parquet. Overall, these changes make Explorer a more fluent tool for data analysts and engineers.

Scholar

While deep learning was a major driver behind Nx, Mateusz Słuszniak and Krsto Proroković have been focused on traditional machine learning techniques with the Scholar project (akin to scikit-learn).

Scholar v0.3 introduces several new features:

  • LargeVis for visualization of large-scale and high-dimensional data in a low-dimensional (typically 2D or 3D) space
  • KDTree and RandomForestTree as algorithms k-nearest neighbours classification and regression
  • Hierarchical clustering with average, complete, single, and weighted linkage
  • New dimensionality reduction and manifold algorithms, such as TriMap
  • Optimizations and new functions to existing algorithms and metrics

These features bring Numerical Elixir and its ability to setup distributed model serving, over CPUs and GPUs, to traditional machine learning algorithms, allowing developers and data practitioners to tackle a wider number of problems within the Elixir ecosystem.

New projects and learning resources

While I have covered the main projects inside the Numerical Elixir organization, the #machinelearning community continues working on and publishing exciting projects.

A special shoutout goes to instructor_ex, from Thomas Millar, which supports structured prompting for LLMs, based on Elixir’s Ecto data toolkit. Elixir’s langchain has also received updates, including support for more third-party APIs as well as models implemented in Elixir via Bumblebee.

On the content production and learning side, Andrés Alejos has been publishing a lot of content on Livebook, including a recent article on Livebook being Elixir’s secret weapon.

Finally, don’t forget to checkout Sean Moriarity’s Machine Learning in Elixir book for a deep dive into the ecosystem, regardless if you are new to Elixir or machine learning. And if you are not sure what to gain from machine learning in Elixir, check out Chris Grainger’s talk titled A Year in Production with Machine Learning on the BEAM .

It has been a joy to see the ecosystem grow and tackle a whole new problem space with Elixir. Until next time!