AutoEmulate v1.0.0 release
Updates on new features in v1.0.0 release
Release: v1.0.0
🎉 We're excited to announce AutoEmulate's first major release with v1.0.0 🎉
This release sees the package fully integrated into the PyTorch ecosystem; PyTorch is one of the most popular open-source deep learning frameworks and one of the bedrocks of modern machine learning with Python.
This release also contains new features, including methods for uncertainty quantification; ensemble-based emulator models, which output predictions with uncertainty, and Bayesian model calibration, which enables inference of parameters likely to have generated observed data with uncertainty.
If you'd like to discuss any of our work on AutoEmulate or AI for Physical Systems, feel free to reach out to us at ai4physics@turing.ac.uk
What's new
PyTorch refactor and package redesign
AutoEmulate was originally built on top of the scikit-learn framework, which is a great library for traditional machine learning tasks. However, as we moved towards more complex models and larger datasets, we found that we needed the flexibility and performance that PyTorch provides.
The new PyTorch backend allows AutoEmulate to take advantage of PyTorch's powerful features, including (but not limited to):
- Leveraging GPU acceleration, making training and inference much faster.
- Enabling automatic differentiation via PyTorch's autograd system.
- Seamlessly integrating AutoEmulate with other tools in the broader ecosystem, enabling end-to-end emulation workflows.
A new set of Emulator models
AutoEmulate v1.0.0 includes a new set of Ensemble
models where we combine multiple emulators and quantify uncertainty by looking at the distribution over outputs. We similarly implemented MC dropout, where we get uncertainty quantification by running an NN multiple times with dropout turned on.
In a continuation of our integration with GPyTorch for Gaussian Processes we have added a Gaussian Process emulator with correlated outputs.
We have refactored the existing emulator models to use PyTorch where practical. Additionally, we have retained support for several non-PyTorch models that our userbase has found useful. The table below summarises which emulators available in this release are implemented in PyTorch, support multi-output emulation (MO), use auto-differentiation (AD) for training, or provide predictive uncertainty quantification (UQ).
Emulator | PyTorch | MO | AD | UQ |
---|---|---|---|---|
GaussianProcess | ✔️ | ✔️ | ✔️ | ✔️ |
GaussianProcessCorrelated | ✔️ | ✔️ | ✔️ | ✔️ |
EnsembleMLP | ✔️ | ✔️ | ✔️ | ✔️ |
EnsembleMLPDropout | ✔️ | ✔️ | ✔️ | ✔️ |
MLP | ✔️ | ✔️ | ✔️ | ❌ |
PolynomialRegression | ✔️ | ✔️ | ✔️ | ❌ |
RadialBasisFunctions | ✔️ | ✔️ | ❌ | ❌ |
RandomForest | scikit-learn | ✔️ | ❌ | ❌ |
SupportVectorMachine | scikit-learn | ❌ | ❌ | ❌ |
LightGBM | LightGBM | ❌ | ❌ | ❌ |
Bayesian model calibration
Bayesian calibration is a method for estimating which input parameters were most likely to produce observed data. An advantage over other calibration methods is that it returns a probability distribution over the input parameters rather than just point estimates. Performing Bayesian calibration requires a fit emulator and observations associated with the simulator output. Check out our new tutorial for more details on how to use this feature.
Upgraded features
- Simulator in the loop: Some features in AutoEmulate such as active learning require the user to provide a simulator that can be run by AutoEmulate. AutoEmulate now supports easy integration of simulators into the emulation workflow. This is achieved through a subclassing mechanism of the Simulator base class, allowing the users to define a method to run their simulation, taking in the input parameters and returning the output variables. See our custom simulations tutorial for more details.
- Active learning is now a core feature of AutoEmulate. Active learning intelligently selects informative simulator evaluations to maximize emulator improvement with minimal computational cost. See our active learning tutorial for more details.
- History matching We have refactored the history matching implementation into two separate classes. It can now be used as a standalone metric to evaluate emulator predictions as well as part of a simulator in the loop workflow. See our two updated history matching tutorials 1 and 2 for more details.
What's next
Looking ahead, we are working on providing support for more complex simulation outputs including spatial and temporal data, which will allow a wider range of applications to benefit from AutoEmulate.