Understanding and analyzing a system’s response to various inputs is central to any scientific or engineering R&D. Designing a new steel alloy, for example, requires knowing its response to different loads. Scientific modeling aims to determine these relationships, often using mathematical models, in order to accelerate engineering design. These models could be straightforward linear functions, complex differential equations, or in modern times, neural networks. In all cases, models are designed, either by hand or computationally, from examining observed data. Of course, the amount of data we can collect is finite. As such, while we may always validate a model’s accuracy on previously observed data, in order for the model to be useful, we must also be confident in its accuracy on unobserved data.
Let’s look at a simple toy problem. Below, we have defined a system with one input and one output, which can be plotted on the x and y axes respectively. Prior to fitting a neural network (NN) to model this system, we collect data on this simple function between -2 and 2. After fitting the model, we can validate its accuracy on the previously observed data from its train mean squared error (MSE).
But how does the model perform on test data the model was not trained on? Well, it depends on the data!
For this simple problem, we can clearly define test data that is similar to the training data, or in-distribution (ID), also collected between -2 and 2, and test data that is out of distribution (OOD), collected over a wider range. Model accuracy is excellent on the in-distribution test data, but unsurprisingly much poorer on the out-of-distribution test data. It is easy to say for this toy system if a test point is in-distribution or out-of-distribution by defining bounds on data fed into the model, but real systems with hundreds of inputs and outputs are significantly more complex, and it is infeasible, if not impossible to determine where to trust model predictions simply by defining dataset bounds.
This is why we need uncertainty aware models. Uncertainty predictions provide a confidence interval around model outputs and allow us to implicitly distinguish in-distribution vs out-of-distribution data using the model uncertainty. If the uncertainty aware model predicts high uncertainty for certain test data, we identify it as out-of-distribution.
Uncertainty aware materials modeling
Uncertainty-aware modeling with machine learning is receiving increased attention due to the complex, black-box nature of deep learning models. Probabilistic techniques such as Gaussian process models have been used in machine learning interatomic potentials (MLIPs) to improve their performance. GAP and FLARE are popular implementations with proprietary implementations being directly integrated into DFT software such as VASP. However, Gaussian process models do not scale well to large datasets, prohibiting them from being widely used.
At Physics Inverted Materials, we overcome this with our innovative graph neural network architecture that simultaneously predicts the energies, forces, stresses and their respective uncertainties. With uncertainty, we can identify high error samples that will lead to incorrect material properties. In every simulation we run, we predict the uncertainty to ensure our model is ingesting in-distribution data. Any simulation that encounters out-of-distribution data identified by uncertain energies, forces, or stresses prompts the model to be retrained with those uncertain data points.
The below figure shows why this is critical. In the left panel the training parity plot has a very low mean absolute error (MAE) of 0.04. With the current iteration MLIP, the uncertainty quantification plot in the center panel predicts high errors for a few of the atoms’ forces, marking this sample structure as uncertain. Importantly, this out-of-distribution sample has an MAE of 0.18 (right panel), much higher than the training MAE and could have led to incorrect material property prediction. This is exactly the same as what we identified above with the in- and out-of- distribution datasets. But, because we can identify out-of-distribution data, we add this sample to the training dataset, retrain the MLIP and now have an MAE of 0.05 and corresponding low uncertainties, indicating the previously out-of-distribution data is now in-distribution. The use of uncertainty aware simulations means that we are able to identify when our models are extrapolating and systematically improve them by adding more data to the training dataset so that in the next simulation it is interpolating and highly accurate.
Without knowing when a model is hallucinating, an MLIP can easily be used to predict inaccurate results. Because MLIP errors accumulate during property predictions, the errors in MLIPs that lead to property predictions greater than 25% are actually quite small. Below we show R2 values for foundation models (FMs) that are impressively high – force R2 greater than 0.96. However, even these small errors lead to accumulating errors and incorrect property predictions (see our previous blog posts for more info). However, our uncertainty aware models lead to R2 values greater than 0.99 that allow us to accurately reproduce DFT results.
Our state-of-the-art uncertainty-aware MLIPs provide us unparalleled confidence in our material property predictions. The uncertainty aware models enable us to identify in- and out-of- distribution data automatically. This means that we can understand the extent of where our models are accurate. If we need to sample outside of the current data distribution, we automatically expand our dataset. To this end, our uncertainty aware models embedded within an active learning paradigm are unique in the machine learning community and are truly generalizable models. This paradigm allows us to simulate many materials and material properties at DFT accuracy for the first time.