Machine-learned interatomic potentials represent a breakthrough in computational chemistry, bridging the gap between quantum mechanical accuracy and classical force field efficiency. Unlike traditional force fields that require explicit bonding topologies and parameter sets, MLIPs learn interactions directly from training data, making them particularly powerful for studying complex chemical systems.
Training an MLIP can take from a few minutes to weeks on multiple GPUs, depending on the complexity of the system and the desired accuracy. For this tutorial, the models you are using are pre-trained to allow you to focus on understanding their application and performance characteristics. The training procedure is included in this repository and you could in principle add new model configurations. The training workflow has been set up using the IPSuite package [3]. The training data is taken from the work of Cheng et al. [4], which provides high-quality ab initio reference data for water systems.
In this tutorial you will be using the ASE (Atomic Simulation Environment) package [5]. Most MLIP packages provide an ASE calculator interface for their direct use within Python. ASE provides a unified interface for working with atomic structures and computational chemistry codes. It represents atomic systems as Python objects and provides calculator interfaces that allow different simulation engines to compute energies and forces using the same API.
The ASE calculator and most MLIP packages use eV and Ångström as units for energy and distance, respectively, as referenced in the ASE documentation. This is important to keep in mind when interfacing with other simulation codes that may use different unit systems.
All units in this part of the tutorial follow the ASE convention.
import matplotlib.pyplot as plt
import numpy as np
import sys
import pint
import znh5md
import zntrack
import logging
import warnings
import datetime
import rdkit2ase
import IPython.display
import tqdm.auto as tqdm
# enable import statements on local Python scripts
sys.path.insert(0, ".")
# Initialize unit registry for proper unit handling
ureg = pint.UnitRegistry()
This bash command will download the training data set:
!dvc pull
Below is the default configuration for the APAX MLIP training that you'll be using. This YAML file defines key hyperparameters that control both the accuracy and computational efficiency of the model. The hyperparameter define the architecture of the model and how it is trained. Hyperparameters are fixed before training, while the model parameters, e.g. weights and biases, are learned during training.
The target property an MLIP predicts is the potential energy of the entire system $E_\text{system}$ based on the atomic positions and species $S$. To allow for a linear scaling with system size, the total energy is decomposed into local atomic contributions $E_i$. The forces are obtained from the gradients of the energy with respect to atomic positions.
$$ E_\text{system}(S, \theta) = \sum_i^N E_i(\mathbf{G}_i, \theta) $$Here $\theta$ denotes the model parameters and architecture. The local atomic environment is encoded using an atom-centered descriptor $\mathbf{G}_i$. These descriptors are a fixed-length vector representation, encoding symmetries of the system such as translation, rotation, and permutation of atoms.
# Display the APAX configuration file
with open("config/apax.yaml", "r") as f:
content = f.read()
IPython.display.display(IPython.display.Markdown(f"```yaml\n{content}\n```"))
To understand the impact of different hyperparameters on model performance, we have trained multiple models with varying configurations. Each parameter affects both the accuracy and computational cost of the model. Here are the key parameters we'll explore:
r_max: The maximum cutoff radius for the descriptor. This defines how far the model "sees" around each atom when predicting energy contributions and forces. A small value makes the model fast but can lead to physical inaccuracies or instabilities. A large value can improve accuracy but increases computational cost.
n_basis: The number of radial basis functions used in the descriptor. This controls the resolution of the radial description of atomic environments. More basis functions provide finer detail but increase computational overhead.
nn: The architecture of the neural network (number and size of hidden layers).
n_radial: The number of radial basis functions.
The workflow file workflow.py implements the following steps:
data/ folderAll trained models and their results are stored in the nodes/ directory for analysis and comparison.
You can select from the following pre-trained models to explore how different hyperparameters affect performance:
The feed-forward neural networks' width and depth define the number of parameters and the model's representational capacity:
nn_16-16_Apax: Smaller network (16 neurons per layer)nn_32-32_Apax: Medium network (32 neurons per layer) nn_64-64_Apax: Large network (64 neurons per layer)nn_128-128_Apax: Larger network (128 neurons per layer)The maximum radial cutoff defines the spatial extent of local atomic environments:
r_max_2_Apax: Very short cutoff (2.0 Å)r_max_3_Apax: Short cutoff (3.0 Å)r_max_4_Apax: Medium cutoff (4.0 Å)r_max_5_Apax: Standard cutoff (5.0 Å)r_max_5_5_Apax: Extended cutoff (5.5 Å)r_max_6_Apax: Long cutoff (6.0 Å)The GMNN model builds a smooth neighborhood density from a basis set (e.g., equidistant Gaussians or Bessel-like functions) which are linearly combined to form radial basis functions:
n_basis_4_Apax: Low resolution (4 basis functions)n_basis_8_Apax: Medium resolution (8 basis functions)n_basis_16_Apax: High resolution (16 basis functions)The GMNN captures angular information using contractions up to a rotation order. Higher angular resolution allows the model to capture more complex spatial arrangements.
n_radial_5_Apax: Medium angular resolution (5)n_radial_6_Apax: Higher angular resolution (6)n_radial_7_Apax: Higher angular resolution (7)Let's load one of the models (r_max_5_5_Apax) and evaluate its performance.
Our APAX models implement the ASE calculator interface, which enables seamless integration with ESPResSo and other simulation packages.
To load the model, we use the ZnTrack package [6], which provides version control and reproducibility for computational workflows.
ZnTrack is a general-purpose workflow and data management tool. It is built around the concept of nodes, defined as Python class objects, with each node representing a single task within a workflow.
One of the key advantages of ZnTrack is that nodes can be easily reused and packaged for sharing with others.
ZnTrack also serves as the foundation for other packages, such as IPSuite, and is used within Apax - both of which we used to set up the training workflow.
import workflow
with warnings.catch_warnings(action="ignore"):
logging.getLogger("zntrack.project").setLevel(logging.WARNING)
project = workflow.Workflow()
project.configure()
project.train()
logging.getLogger("zntrack.project").setLevel(logging.NOTSET)
# Load a pre-trained model and get its ASE calculator interface
with warnings.catch_warnings(action="ignore"):
model = zntrack.from_rev("r_max_5_5_Apax")
calc = model.get_calculator()
print(calc)
To assess the performance of our MLIPs, we evaluate them on a dedicated test dataset. These are atomic configurations that the model has never seen during training, providing an unbiased assessment of the model's generalization ability.
Important considerations for MLIP evaluation:
# Load the test dataset (stored in H5MD format)
test_io = znh5md.IO("data/cosmo_water_test.h5")
test_frames = test_io[:]
The test_frames variable contains a list of ASE Atoms objects, each representing a different atomic configuration from the test set.
Each Atoms object contains:
Later in this tutorial, you'll learn how to create your own atomic configurations for molecular dynamics simulations. For now, make yourself familiar with the dataset, by
# SOLUTION CELL
print(f"Number of test frames: {len(test_frames)}")
print(f"Number of atoms per frame: {np.mean([len(frame) for frame in test_frames])}")
fig, ax = plt.subplots(1, 2, figsize=(14, 4))
ax[0].hist([frame.get_potential_energy() / 1000. for frame in test_frames], bins=20)
ax[0].set_xlabel("Potential energy / keV")
ax[0].set_ylabel("Counts")
ax[1].hist(
np.reshape([frame.get_forces() for frame in test_frames], -1),
bins=200,
density=True,
)
ax[1].set_xlabel("Force component / eV/Å")
ax[1].set_ylabel("Density")
plt.show()
Now we can use the calc object to apply our MLIP and compute energies and forces for each configuration in the test set.
We'll compare these predictions against the reference DFT values to assess model accuracy.
# Extract reference (true) values from the test dataset
true_energies = [x.get_potential_energy() for x in test_frames]
true_forces = [x.get_forces() for x in test_frames]
# Compute MLIP predictions for all test configurations
pred_energies = []
pred_forces = []
for frame in tqdm.tqdm(test_frames, desc="Computing MLIP predictions"):
frame.calc = calc # Assign the MLIP calculator
pred_energies.append(frame.get_potential_energy())
pred_forces.append(frame.get_forces())
Tasks:
pred_* vs. true_*) comparing DFT reference values to MLIP predictions for both energies and forcesLearning objectives:
# SOLUTION CELL
fig, ax = plt.subplots(1, 2, figsize=(12, 5))
# Helper function to compute Mean Absolute Error
def mae(y_true, y_pred):
return np.mean(np.abs(np.array(y_true) - np.array(y_pred)))
# Energy correlation plot (normalized per atom)
atoms = np.array([len(frame) for frame in test_frames])
true_energies_norm = np.array(true_energies) / atoms
pred_energies_norm = np.array(pred_energies) / atoms
e_mae = mae(true_energies_norm, pred_energies_norm)
ax[0].scatter(
true_energies_norm,
pred_energies_norm,
marker="x",
alpha=0.7,
label=f"MAE: {e_mae:.4f} eV/atom",
)
ax[0].plot(
[true_energies_norm.min(), true_energies_norm.max()],
[true_energies_norm.min(), true_energies_norm.max()],
"k--",
alpha=0.5,
)
ax[0].set_xlabel("True Energies / eV/atom")
ax[0].set_ylabel("Predicted Energies / eV/atom")
ax[0].set_title("Energy Predictions")
ax[0].legend()
# Force correlation plot
true_forces_flat = np.concatenate([f.flatten() for f in true_forces])
pred_forces_flat = np.concatenate([f.flatten() for f in pred_forces])
f_mae = mae(true_forces_flat, pred_forces_flat)
ax[1].scatter(
true_forces_flat,
pred_forces_flat,
marker="x",
alpha=0.3,
label=f"MAE: {f_mae:.4f} eV/Å",
)
ax[1].plot(
[true_forces_flat.min(), true_forces_flat.max()],
[true_forces_flat.min(), true_forces_flat.max()],
"k--",
alpha=0.5,
)
ax[1].set_xlabel(r"True Forces / eV/Å")
ax[1].set_ylabel(r"Predicted Forces / eV/Å")
ax[1].set_title("Force Predictions")
ax[1].legend()
plt.tight_layout()
plt.show()
Now that you've analyzed the performance of the r_max_5_5_Apax model, compute energy and force MAEs for all available models.
Questions to consider:
# SOLUTION CELL
model_names = [
"nn_16-16_Apax",
"nn_32-32_Apax",
"nn_64-64_Apax",
"nn_128-128_Apax", # NN variations
"r_max_2_Apax",
"r_max_3_Apax",
"r_max_4_Apax",
"r_max_5_Apax",
"r_max_5_5_Apax",
"r_max_6_Apax", # Cutoff variations
"n_basis_4_Apax",
"n_basis_8_Apax",
"n_basis_16_Apax", # Basis function variations
"n_radial_5_Apax",
"n_radial_6_Apax",
"n_radial_7_Apax", # Radial resolution variations
]
def mae(y_true, y_pred):
"""Calculate Mean Absolute Error"""
return np.mean(np.abs(np.array(y_true) - np.array(y_pred)))
atoms = np.array([len(frame) for frame in test_frames])
results = {}
# Progress bar for model evaluation
pbar = tqdm.tqdm(model_names, desc="Evaluating models")
for name in pbar:
pbar.set_description(f"Evaluating {name}")
# Load model and get calculator
with warnings.catch_warnings(action="ignore"):
model = zntrack.from_rev(name)
calc = model.get_calculator()
# Measure evaluation time
start = datetime.datetime.now()
pred_frames = calc.batch_eval(test_frames, batch_size=1, silent=True)
end = datetime.datetime.now()
# Extract predictions and compute metrics
pred_energies = [x.get_potential_energy() for x in pred_frames]
pred_forces = [x.get_forces() for x in pred_frames]
# Calculate normalized energy MAE and force MAE
pred_energies_norm = np.array(pred_energies) / atoms
e_mae = mae(true_energies_norm, pred_energies_norm)
pred_forces_flat = np.concatenate([f.flatten() for f in pred_forces])
f_mae = mae(true_forces_flat, pred_forces_flat)
# Store results
results[name] = (e_mae, f_mae, end - start)
# Display results in a formatted table
print("Model Performance Comparison")
print("=" * 70)
print(f"{'Model':<20} {'Energy MAE':<15} {'Force MAE':<15} {'Time / s':<10}")
print(f"{'':20} {'/ eV/atom':<15} {'/ eV/Å':<15} {'':10}")
print("-" * 70)
for name, (e_mae, f_mae, duration) in results.items():
print(f"{name:<20} {e_mae:<15.4f} {f_mae:<15.4f} {duration.total_seconds():<10.2f}")
print("\nKey Observations:")
print("- Lower MAE values indicate better accuracy")
print("- Time reflects computational cost per evaluation")
print("- Best models balance accuracy and efficiency")
MLIPs differ fundamentally from classical force fields in their setup requirements. Classical force fields require explicit molecular topologies to assign parameters for inter- and intramolecular interactions. MLIPs, however, learn these interactions directly from training data and only require:
While this simplicity is powerful, creating atomic coordinates manually is impractical for most systems. Therefore, we use higher-level representations and automated tools:
We use SMILES (Simplified Molecular-Input Line-Entry System) [7] strings to describe molecular structures.
For water, the SMILES string is simply O (representing the oxygen atom with implicit hydrogens).
The rdkit2ase package provides convenient functions to convert SMILES strings into 3D atomic coordinates.
# Generate water molecule conformations from SMILES
water = rdkit2ase.smiles2conformers("O", numConfs=100)
print(f"Generated water molecule as ASE Atoms object: {water[0]}")
print(f"Chemical species: {water[0].get_chemical_symbols()}")
print(f"Atomic positions / Å:\n{water[0].get_positions()}")
# Visualize the molecular structure using RDKit
rdkit2ase.ase2rdkit(water[0])
To create a realistic liquid water system for MD simulation, we need to pack multiple water molecules into a periodic box at the appropriate density. We use the PACKMOL software package for this purpose, which efficiently packs molecules while avoiding unrealistic overlaps.
# Create a liquid water box using PACKMOL
# Parameters: 100 water molecules at liquid density (997 kg/m³)
box = rdkit2ase.pack(
[water], # List of molecules to pack
counts=[100], # Number of each molecule type
density=997, # Density in kg/m³
packmol="packmol", # Path to PACKMOL executable, here assumed to be in PATH
)
For our MLIP, this system is represented as:
No explicit bonds, angles, or other topological information is needed – the MLIP learned these interactions from its training data.
# Assign the MLIP calculator to our water box
box.calc = calc
# Compute the potential energy of the entire system
energy = box.get_potential_energy() * ureg.eV
print(f"Potential energy of the water box: {energy:.2f~P}")
# Compute forces on all atoms
forces = box.get_forces() * ureg.eV / ureg.angstrom
print(f"Forces on atoms (first 5 shown): {forces[:5]:.2f~P}")
One feature of the APAX MLIP framework is its ability to estimate the uncertainty of its predictions using ensemble methods. Our models use a "shallow ensemble" approach where the final layers of the neural network are replicated with different random initializations:
ensemble:
kind: shallow
n_members: 16
This configuration creates 16 different "members" in the ensemble. The final prediction is the mean across all ensemble members, and the standard deviation provides an estimate of prediction uncertainty.
Applications of uncertainty quantification:
This capability is particularly valuable for long MD simulations where the system may explore configurations far from the training distribution.
# Examine available results from the calculator
print("Available calculator results:")
print(list(box.calc.results.keys()))
# Extract and display prediction uncertainty
energy_uncertainty = box.calc.results["energy_uncertainty"] * ureg.eV
print(f"Energy prediction: {energy:.1f~P} ± {energy_uncertainty:.1f~P}")
With our water system configured and the MLIP calculator assigned, we're ready to integrate with ESPResSo for molecular dynamics simulations. The ASE calculator interface provides a seamless bridge between our APAX models and ESPResSo's simulation engine.
# Save the water system for use in ESPResSo simulations
from ase.io import write
write("water.xyz", box)
print("Water system saved to water.xyz for ESPResSo integration in Part 3")
By completing this tutorial, you have learned:
This tutorial provides the foundation for using MLIPs in ESPResSo simulations. Consider exploring:
[1] Moritz R. Schäfer, Nico Segreto, Fabian Zills, Christian Holm, and Johannes Kästner. Apax: A Flexible and Performant Framework for the Development of Machine-Learned Interatomic Potentials. Journal of Chemical Information and Modeling 2025 65 (15), 8066-8078
[2] V. Zaverkin and Johannes Kästner. Gaussian Moments as Physically Inspired Molecular Descriptors for Accurate and Scalable Machine Learning Potentials. J. Chem. Theory Comput. 2020, 16, 8, 5410–5421
[3] Fabian Zills, Moritz René Schäfer, Nico Segreto, Johannes Kästner, Christian Holm, and Samuel Tovey. Collaboration on Machine-Learned Potentials with IPSuite: A Modular Framework for Learning-on-the-Fly. The Journal of Physical Chemistry B 2024 128 (15), 3662-3676
[4] B. Cheng, E.A. Engel, J. Behler, C. Dellago, & M. Ceriotti. Ab initio thermodynamics of liquid and solid water. Proc. Natl. Acad. Sci. U.S.A. 2019 116 (4) 1110-1115
[5] Ask Hjorth Larsen et al. The atomic simulation environment—a Python library for working with atoms. J. Phys.: Condens. Matter 2017 29 273002
[6] Fabian Zills, Moritz Schäfer, Samuel Tovey, Johannes Kästner, Christian Holm. ZnTrack -- Data as Code. arXiv:2401.10603 [cs.MS] 2024
[7] David Weininger. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of Chemical Information and Computer Sciences 1988, 28 (1), 31–36