Introduction

What This Is

Welcome to NEXTorch! This is an open-source software package in Python/PyTorch to faciliate experimental design using Bayesian Optimization (BO). NEXTorch is also a library for learning the theory and implementation of BO.

Active learning refers to the idea of a machine learning algorithm “learning” from data, proposing next experiments or calculations, and improving prediction accuracy with fewer training data or lower computational cost. BO, a popular active learning framework, refers to a suite of techniques for global optimization of expensive functions.

Prior Works

Machine learning and scientific research communities have developed several software tools for BO (and its close variant from geostatistics - kriging). Most of them have interfaces in Python. We curate a list of open-source BO packages and provide further discussions in our paper.

Their documentation pages can be found at:

Among them, Spearmint and GyOpt are among the early works to make BO accessible to end users. Recently, some packages, such as BoTorch and GPflowOpt, are built on popular machine learning frameworks such as PyTorch and TensorFlow to benefit from the fast matrix operations, batched computation, and GPU acceleration. BoTorch stands out since it naturally supports parallel optimization, Monte Carlo acquisition functions, and advanced cases such as multi-task and multi-objective optimization. The PyTorch backend also makes it suitable for easy experimentation and fast prototyping.

Why We Build This?

However, most tools are designed for AI researchers or software engineers, often requiring a steep learning curve. The workflow can also be less transparent to end-users. Occasionally, design choices are made intentionally to keep humans out of the optimization loop. The above reasons make them difficult to extend to chemistry or engineering problems, where domain knowledge is essential.

We have seen attempts being made by the authors of edbo (a Bayesian reaction optimization package). They performed extensive testing and benchmark studies to showcase the effectiveness of the method in a recent Nature paper [1]. However, the software is still based on command-line scripts, and clear documentation is lacking. Edbo also has no access to hardware acceleration or the latest state-of-art BO methods.

From a practical perspective, we believe a BO tool should be scalable, flexible, and accessible to the end-users, i.e., chemists and engineers. Hence, we build NEXTorch, extending the capabilities of BoTorch, to democratize the use of BO in chemical sciences.

Why NEXTorch?

NEXTorch is unique for several reasons:

  1. NEXTorch benefits from the modern architecture and a variety of models, functions offered by BoTorch.

  2. NEXTorch provides connections to real-world problems, going beyond BoTorch, including automatic parameter scaling, data type conversions, and visualization capabilities. These features allow human-in-the-loop design where decision-making on the next experiments can be aided by domain knowledge.

  3. NEXTorch is modular in design which makes it easy to extend to other frameworks. It also serves as a library for learning the theory and implementation of BO.

We believe its ease of use could serve the community including experimentalists with little or no programming background.


Reference

[1] Shields, B. J.; Stevens, J.; Li, J.; Parasram, M.; Damani, F.; Alvarado, J. I. M.; Janey, J. M.; Adams, R. P.; Doyle, A. G. Bayesian Reaction Optimization as a Tool for Chemical Synthesis. Nature 2021, 590, 89–96.