top of page
Emil Annevelink

Scaling simulations to complex materials systems

Machine learning is enabling a new era of digital materials modeling. PHIN’s advancements in data-driven modeling techniques unlock a host of new capabilities for in-silico materials development and consequently address the techno-economic bottlenecks preventing adoption of digital tools. Machine learning interatomic potentials (MLIPs) are at the heart of PHIN-atomic, our atomic-scale modeling software, that resolve previously critical tradeoffs in computational modeling cost and accuracy. PHIN-atomic reduces the cost of high-fidelity density functional theory (DFT) calculations 10,000 fold. With this dramatic shift in economics, the true advantages are far less about incremental cost savings and far more about what kinds of simulations are now economically feasible.


A key drawback to DFT is its poor scaling with system size. In practice, DFT can be used on systems of up to hundreds of atoms. On the other hand, interatomic potentials, including PHIN’s models, can scale to systems of tens or even hundreds of thousands of atoms. This is the predominant feature of MLIPs that make them so attractive to researchers and scientists. The larger the modeled system, the better it approximates real-world behavior. PHIN’s MLIPs can now perform simulations at DFT accuracy on very large systems that would have previously exceeded the capabilities of even the world’s largest supercomputers, predicting material properties like melting temperature or viscosity from first-principles theory. 


Like most ML models, MLIPs need to generalize to unseen data well in order to be useful. Using an MLIP to model a system larger than it was trained on is one of the more challenging generalization problems an MLIP will encounter. PHIN’s key differentiator is the ability to fine-tune bespoke models to specific materials by automatically calculating DFT data on inaccurate MLIP predictions. Unfortunately, DFT calculations are limited to hundreds of atoms due to their high computational cost. To label extrapolative data from very large systems, we’ve developed a technique we call “cropping”, which generates smaller systems that can be labeled with DFT, recovering the ability of PHIN-atomic to ensure the generalization performance of the MLIP. 


Cropping relies on a core principle of MLIPs – locality. When data is flagged for labeling during active learning, it means that the MLIP is uncertain about its per-atom contribution for one or more atoms in the system, and consequently unfamiliar with those atoms’ local neighborhood. What remains is a need to label not necessarily the entire large structure, but a structure or set of structures that contain those atoms’ local neighborhood. Cropping produces these smaller structures that contain the appropriate local neighborhoods.


Cropping takes its name from the familiar image manipulation tool. Of course, “cropping” 3D atomic structures is not as straightforward. These cropped structures must be calculated with DFT, and careful attention needs to be placed on boundary conditions and atomic positions as to ensure the cropped structure is physically reasonable. Unphysical structures can taint the dataset and lead to worse generalization performance. Our algorithm ensures that these requirements are met, preserving the local neighborhood and maintaining appropriate interatomic distances, charge balancing, and overall energy.


Let’s look at an example of cropping in use on a silicon system. We start with a model that’s been trained on a small crystalline dataset. Naturally, when we initially evaluate the model on a large, 1000-atom disordered structure, we see poor performance and high atomic force errors. 

The cropping algorithm is used to generate and label a few smaller 64-atom cropped structures from another 1000-atom disordered structure. The cropped structures are calculated with DFT and used to fine-tune the model with this new data. 

Evaluating the fine-tuned model again on the original 1000-atom disordered structure, we can observe a dramatic improvement in accuracy, despite the model’s training set consisting of only 64-atom structures.



What does this mean in practice? For end-users, like much of our core technology, everything is under-the-hood. Users can now set up a simulation with initial structures of 1000+ atoms and PHIN-atomic will automatically fine-tune an MLIP to perform the simulation accurately like any other system size. Reach out to info@phinmaterials.com learn more.

bottom of page