Machine Learned Interatomic Potentials (MLIP) with ESPResSo¶

Part 2: Focus on Machine Learned Interatomic Potentials¶

Overview¶

Machine-learned interatomic potentials represent a breakthrough in computational chemistry, bridging the gap between quantum mechanical accuracy and classical force field efficiency. Unlike traditional force fields that require explicit bonding topologies and parameter sets, MLIPs learn interactions directly from training data, making them particularly powerful for studying complex chemical systems.

Training an MLIP can take from a few minutes to weeks on multiple GPUs, depending on the complexity of the system and the desired accuracy. For this tutorial, the models you are using are pre-trained to allow you to focus on understanding their application and performance characteristics. The training procedure is included in this repository and you could in principle add new model configurations. The training workflow has been set up using the IPSuite package [3]. The training data is taken from the work of Cheng et al. [4], which provides high-quality ab initio reference data for water systems.

Note on Units and ASE¶

In this tutorial you will be using the ASE (Atomic Simulation Environment) package [5]. Most MLIP packages provide an ASE calculator interface for their direct use within Python. ASE provides a unified interface for working with atomic structures and computational chemistry codes. It represents atomic systems as Python objects and provides calculator interfaces that allow different simulation engines to compute energies and forces using the same API.

The ASE calculator and most MLIP packages use eV and Ångström as units for energy and distance, respectively, as referenced in the ASE documentation. This is important to keep in mind when interfacing with other simulation codes that may use different unit systems. All units in this part of the tutorial follow the ASE convention.

Python Libraries¶

In [ ]:
import matplotlib.pyplot as plt
import numpy as np
import sys
import pint
import znh5md
import zntrack
import logging
import warnings
import datetime
import rdkit2ase
import IPython.display
import tqdm.auto as tqdm

# enable import statements on local Python scripts
sys.path.insert(0, ".")

# Initialize unit registry for proper unit handling
ureg = pint.UnitRegistry()

Training Data¶

This bash command will download the training data set:

In [ ]:
!dvc pull

APAX Configuration¶

Below is the default configuration for the APAX MLIP training that you'll be using. This YAML file defines key hyperparameters that control both the accuracy and computational efficiency of the model. The hyperparameter define the architecture of the model and how it is trained. Hyperparameters are fixed before training, while the model parameters, e.g. weights and biases, are learned during training.

The target property an MLIP predicts is the potential energy of the entire system $E_\text{system}$ based on the atomic positions and species $S$. To allow for a linear scaling with system size, the total energy is decomposed into local atomic contributions $E_i$. The forces are obtained from the gradients of the energy with respect to atomic positions.

$$ E_\text{system}(S, \theta) = \sum_i^N E_i(\mathbf{G}_i, \theta) $$

Here $\theta$ denotes the model parameters and architecture. The local atomic environment is encoded using an atom-centered descriptor $\mathbf{G}_i$. These descriptors are a fixed-length vector representation, encoding symmetries of the system such as translation, rotation, and permutation of atoms.

In [ ]:
# Display the APAX configuration file
with open("config/apax.yaml", "r") as f:
    content = f.read()
IPython.display.display(IPython.display.Markdown(f"```yaml\n{content}\n```"))

Model Hyperparameter Study¶

To understand the impact of different hyperparameters on model performance, we have trained multiple models with varying configurations. Each parameter affects both the accuracy and computational cost of the model. Here are the key parameters we'll explore:

Key Hyperparameters¶

  • r_max: The maximum cutoff radius for the descriptor. This defines how far the model "sees" around each atom when predicting energy contributions and forces. A small value makes the model fast but can lead to physical inaccuracies or instabilities. A large value can improve accuracy but increases computational cost.

  • n_basis: The number of radial basis functions used in the descriptor. This controls the resolution of the radial description of atomic environments. More basis functions provide finer detail but increase computational overhead.

  • nn: The architecture of the neural network (number and size of hidden layers).

  • n_radial: The number of radial basis functions.

Training Workflow¶

The workflow file workflow.py implements the following steps:

  1. Load training and validation data from the data/ folder
  2. Iterate over different model parameter configurations
  3. Generate configuration files for APAX training
  4. Configure model training and evaluation pipelines
  5. Execute the complete training workflow

All trained models and their results are stored in the nodes/ directory for analysis and comparison.

Available Pre-trained Models¶

You can select from the following pre-trained models to explore how different hyperparameters affect performance:

Neural Network Architecture Variations¶

The feed-forward neural networks' width and depth define the number of parameters and the model's representational capacity:

  • nn_16-16_Apax: Smaller network (16 neurons per layer)
  • nn_32-32_Apax: Medium network (32 neurons per layer)
  • nn_64-64_Apax: Large network (64 neurons per layer)
  • nn_128-128_Apax: Larger network (128 neurons per layer)

Cutoff Radius Variations¶

The maximum radial cutoff defines the spatial extent of local atomic environments:

  • r_max_2_Apax: Very short cutoff (2.0 Å)
  • r_max_3_Apax: Short cutoff (3.0 Å)
  • r_max_4_Apax: Medium cutoff (4.0 Å)
  • r_max_5_Apax: Standard cutoff (5.0 Å)
  • r_max_5_5_Apax: Extended cutoff (5.5 Å)
  • r_max_6_Apax: Long cutoff (6.0 Å)

Basis Function Variations¶

The GMNN model builds a smooth neighborhood density from a basis set (e.g., equidistant Gaussians or Bessel-like functions) which are linearly combined to form radial basis functions:

  • n_basis_4_Apax: Low resolution (4 basis functions)
  • n_basis_8_Apax: Medium resolution (8 basis functions)
  • n_basis_16_Apax: High resolution (16 basis functions)

Radial Resolution Variations¶

The GMNN captures angular information using contractions up to a rotation order. Higher angular resolution allows the model to capture more complex spatial arrangements.

  • n_radial_5_Apax: Medium angular resolution (5)
  • n_radial_6_Apax: Higher angular resolution (6)
  • n_radial_7_Apax: Higher angular resolution (7)

Loading and Using Pre-trained Models¶

Let's load one of the models (r_max_5_5_Apax) and evaluate its performance.

Our APAX models implement the ASE calculator interface, which enables seamless integration with ESPResSo and other simulation packages.

To load the model, we use the ZnTrack package [6], which provides version control and reproducibility for computational workflows. ZnTrack is a general-purpose workflow and data management tool. It is built around the concept of nodes, defined as Python class objects, with each node representing a single task within a workflow. One of the key advantages of ZnTrack is that nodes can be easily reused and packaged for sharing with others. ZnTrack also serves as the foundation for other packages, such as IPSuite, and is used within Apax - both of which we used to set up the training workflow.

In [ ]:
import workflow

with warnings.catch_warnings(action="ignore"):
    logging.getLogger("zntrack.project").setLevel(logging.WARNING)
    project = workflow.Workflow()
    project.configure()
    project.train()
    logging.getLogger("zntrack.project").setLevel(logging.NOTSET)
In [ ]:
# Load a pre-trained model and get its ASE calculator interface
with warnings.catch_warnings(action="ignore"):
    model = zntrack.from_rev("r_max_5_5_Apax")
    calc = model.get_calculator()
print(calc)

Model Evaluation on Test Dataset¶

To assess the performance of our MLIPs, we evaluate them on a dedicated test dataset. These are atomic configurations that the model has never seen during training, providing an unbiased assessment of the model's generalization ability.

Important considerations for MLIP evaluation:

  • Static test performance (what we measure here) provides a baseline for model quality
  • The test dataset quality and diversity directly impacts the reliability of these metrics
  • Real-world performance in molecular dynamics simulations may differ from static test metrics
  • Long-timescale stability and transferability often require additional validation beyond static tests
In [ ]:
# Load the test dataset (stored in H5MD format)
test_io = znh5md.IO("data/cosmo_water_test.h5")
test_frames = test_io[:]

The test_frames variable contains a list of ASE Atoms objects, each representing a different atomic configuration from the test set. Each Atoms object contains:

  • Atomic positions and species
  • Periodic boundary conditions and unit cell
  • Reference energies and forces from DFT calculations (stored as calculator results)

Later in this tutorial, you'll learn how to create your own atomic configurations for molecular dynamics simulations. For now, make yourself familiar with the dataset, by

  • printing the length of the dataset, and number of atoms per configuration
  • plotting the energy and force distributions.
In [ ]:
# SOLUTION CELL
print(f"Number of test frames: {len(test_frames)}")
print(f"Number of atoms per frame: {np.mean([len(frame) for frame in test_frames])}")

fig, ax = plt.subplots(1, 2, figsize=(14, 4))
ax[0].hist([frame.get_potential_energy() / 1000. for frame in test_frames], bins=20)
ax[0].set_xlabel("Potential energy / keV")
ax[0].set_ylabel("Counts")

ax[1].hist(
    np.reshape([frame.get_forces() for frame in test_frames], -1),
    bins=200,
    density=True,
)
ax[1].set_xlabel("Force component / eV/Å")
ax[1].set_ylabel("Density")
plt.show()

Now we can use the calc object to apply our MLIP and compute energies and forces for each configuration in the test set. We'll compare these predictions against the reference DFT values to assess model accuracy.

In [ ]:
# Extract reference (true) values from the test dataset
true_energies = [x.get_potential_energy() for x in test_frames]
true_forces = [x.get_forces() for x in test_frames]

# Compute MLIP predictions for all test configurations
pred_energies = []
pred_forces = []
for frame in tqdm.tqdm(test_frames, desc="Computing MLIP predictions"):
    frame.calc = calc  # Assign the MLIP calculator
    pred_energies.append(frame.get_potential_energy())
    pred_forces.append(frame.get_forces())

Exercise 1: Model Performance Analysis¶

Tasks:

  1. Create correlation plots (pred_* vs. true_*) comparing DFT reference values to MLIP predictions for both energies and forces
  2. Calculate the Mean Absolute Error (MAE) for energies and forces
  3. Note that energy MAE is typically reported per atom to enable comparison across different system sizes

Learning objectives:

  • Understand how to quantify MLIP accuracy using standard metrics
  • Learn to visualize model performance through correlation plots
  • Recognize the importance of normalizing energies by system size
In [ ]:
# SOLUTION CELL
fig, ax = plt.subplots(1, 2, figsize=(12, 5))

# Helper function to compute Mean Absolute Error
def mae(y_true, y_pred):
    return np.mean(np.abs(np.array(y_true) - np.array(y_pred)))

# Energy correlation plot (normalized per atom)
atoms = np.array([len(frame) for frame in test_frames])
true_energies_norm = np.array(true_energies) / atoms
pred_energies_norm = np.array(pred_energies) / atoms
e_mae = mae(true_energies_norm, pred_energies_norm)

ax[0].scatter(
    true_energies_norm,
    pred_energies_norm,
    marker="x",
    alpha=0.7,
    label=f"MAE: {e_mae:.4f} eV/atom",
)
ax[0].plot(
    [true_energies_norm.min(), true_energies_norm.max()],
    [true_energies_norm.min(), true_energies_norm.max()],
    "k--",
    alpha=0.5,
)
ax[0].set_xlabel("True Energies / eV/atom")
ax[0].set_ylabel("Predicted Energies / eV/atom")
ax[0].set_title("Energy Predictions")
ax[0].legend()

# Force correlation plot
true_forces_flat = np.concatenate([f.flatten() for f in true_forces])
pred_forces_flat = np.concatenate([f.flatten() for f in pred_forces])
f_mae = mae(true_forces_flat, pred_forces_flat)

ax[1].scatter(
    true_forces_flat,
    pred_forces_flat,
    marker="x",
    alpha=0.3,
    label=f"MAE: {f_mae:.4f} eV/Å",
)
ax[1].plot(
    [true_forces_flat.min(), true_forces_flat.max()],
    [true_forces_flat.min(), true_forces_flat.max()],
    "k--",
    alpha=0.5,
)
ax[1].set_xlabel(r"True Forces / eV/Å")
ax[1].set_ylabel(r"Predicted Forces / eV/Å")
ax[1].set_title("Force Predictions")
ax[1].legend()

plt.tight_layout()
plt.show()

Exercise 2: Comparative Model Analysis¶

Now that you've analyzed the performance of the r_max_5_5_Apax model, compute energy and force MAEs for all available models.

Questions to consider:

  • Which hyperparameters have the strongest impact on accuracy?
  • Is there a trade-off between accuracy and computational speed?
In [ ]:
# SOLUTION CELL
model_names = [
    "nn_16-16_Apax",
    "nn_32-32_Apax",
    "nn_64-64_Apax",
    "nn_128-128_Apax",  # NN variations
    "r_max_2_Apax",
    "r_max_3_Apax",
    "r_max_4_Apax",
    "r_max_5_Apax",
    "r_max_5_5_Apax",
    "r_max_6_Apax",  # Cutoff variations
    "n_basis_4_Apax",
    "n_basis_8_Apax",
    "n_basis_16_Apax",  # Basis function variations
    "n_radial_5_Apax",
    "n_radial_6_Apax",
    "n_radial_7_Apax",  # Radial resolution variations
]

def mae(y_true, y_pred):
    """Calculate Mean Absolute Error"""
    return np.mean(np.abs(np.array(y_true) - np.array(y_pred)))

atoms = np.array([len(frame) for frame in test_frames])
results = {}

# Progress bar for model evaluation
pbar = tqdm.tqdm(model_names, desc="Evaluating models")

for name in pbar:
    pbar.set_description(f"Evaluating {name}")

    # Load model and get calculator
    with warnings.catch_warnings(action="ignore"):
        model = zntrack.from_rev(name)
        calc = model.get_calculator()

    # Measure evaluation time
    start = datetime.datetime.now()
    pred_frames = calc.batch_eval(test_frames, batch_size=1, silent=True)
    end = datetime.datetime.now()

    # Extract predictions and compute metrics
    pred_energies = [x.get_potential_energy() for x in pred_frames]
    pred_forces = [x.get_forces() for x in pred_frames]

    # Calculate normalized energy MAE and force MAE
    pred_energies_norm = np.array(pred_energies) / atoms
    e_mae = mae(true_energies_norm, pred_energies_norm)

    pred_forces_flat = np.concatenate([f.flatten() for f in pred_forces])
    f_mae = mae(true_forces_flat, pred_forces_flat)

    # Store results
    results[name] = (e_mae, f_mae, end - start)

# Display results in a formatted table
print("Model Performance Comparison")
print("=" * 70)
print(f"{'Model':<20} {'Energy MAE':<15} {'Force MAE':<15} {'Time / s':<10}")
print(f"{'':20} {'/ eV/atom':<15} {'/ eV/Å':<15} {'':10}")
print("-" * 70)

for name, (e_mae, f_mae, duration) in results.items():
    print(f"{name:<20} {e_mae:<15.4f} {f_mae:<15.4f} {duration.total_seconds():<10.2f}")

print("\nKey Observations:")
print("- Lower MAE values indicate better accuracy")
print("- Time reflects computational cost per evaluation")
print("- Best models balance accuracy and efficiency")

System Setup for Molecular Dynamics¶

MLIPs differ fundamentally from classical force fields in their setup requirements. Classical force fields require explicit molecular topologies to assign parameters for inter- and intramolecular interactions. MLIPs, however, learn these interactions directly from training data and only require:

  • Atomic species (element types)
  • Atomic positions
  • Unit cell vectors (for periodic systems)

While this simplicity is powerful, creating atomic coordinates manually is impractical for most systems. Therefore, we use higher-level representations and automated tools:

SMILES Representation¶

We use SMILES (Simplified Molecular-Input Line-Entry System) [7] strings to describe molecular structures. For water, the SMILES string is simply O (representing the oxygen atom with implicit hydrogens). The rdkit2ase package provides convenient functions to convert SMILES strings into 3D atomic coordinates.

In [ ]:
# Generate water molecule conformations from SMILES
water = rdkit2ase.smiles2conformers("O", numConfs=100)

print(f"Generated water molecule as ASE Atoms object: {water[0]}")
print(f"Chemical species: {water[0].get_chemical_symbols()}")
print(f"Atomic positions / Å:\n{water[0].get_positions()}")

# Visualize the molecular structure using RDKit
rdkit2ase.ase2rdkit(water[0])

Creating Bulk Systems with PACKMOL¶

To create a realistic liquid water system for MD simulation, we need to pack multiple water molecules into a periodic box at the appropriate density. We use the PACKMOL software package for this purpose, which efficiently packs molecules while avoiding unrealistic overlaps.

In [ ]:
# Create a liquid water box using PACKMOL
# Parameters: 100 water molecules at liquid density (997 kg/m³)
box = rdkit2ase.pack(
    [water],  # List of molecules to pack
    counts=[100],  # Number of each molecule type
    density=997,  # Density in kg/m³
    packmol="packmol",  # Path to PACKMOL executable, here assumed to be in PATH
)

For our MLIP, this system is represented as:

  • 300 atoms (100 × 3 atoms per water molecule)
  • XYZ coordinates in 3D space
  • Periodic boundary conditions defined by cell vectors
  • Atomic species labels (O and H)

No explicit bonds, angles, or other topological information is needed – the MLIP learned these interactions from its training data.

In [ ]:
# Assign the MLIP calculator to our water box
box.calc = calc
In [ ]:
# Compute the potential energy of the entire system
energy = box.get_potential_energy() * ureg.eV
print(f"Potential energy of the water box: {energy:.2f~P}")
In [ ]:
# Compute forces on all atoms
forces = box.get_forces() * ureg.eV / ureg.angstrom
print(f"Forces on atoms (first 5 shown): {forces[:5]:.2f~P}")

Uncertainty Quantification with Ensemble Methods¶

One feature of the APAX MLIP framework is its ability to estimate the uncertainty of its predictions using ensemble methods. Our models use a "shallow ensemble" approach where the final layers of the neural network are replicated with different random initializations:

ensemble:
    kind: shallow
    n_members: 16

This configuration creates 16 different "members" in the ensemble. The final prediction is the mean across all ensemble members, and the standard deviation provides an estimate of prediction uncertainty.

Applications of uncertainty quantification:

  • Monitor prediction reliability during MD simulations
  • Stop simulations automatically when uncertainty exceeds thresholds
  • Identify regions of configuration space poorly covered by training data
  • Guide active learning strategies for model improvement

This capability is particularly valuable for long MD simulations where the system may explore configurations far from the training distribution.

In [ ]:
# Examine available results from the calculator
print("Available calculator results:")
print(list(box.calc.results.keys()))
In [ ]:
# Extract and display prediction uncertainty
energy_uncertainty = box.calc.results["energy_uncertainty"] * ureg.eV
print(f"Energy prediction: {energy:.1f~P} ± {energy_uncertainty:.1f~P}")

Preparing for ESPResSo Integration¶

With our water system configured and the MLIP calculator assigned, we're ready to integrate with ESPResSo for molecular dynamics simulations. The ASE calculator interface provides a seamless bridge between our APAX models and ESPResSo's simulation engine.

In [ ]:
# Save the water system for use in ESPResSo simulations
from ase.io import write

write("water.xyz", box)
print("Water system saved to water.xyz for ESPResSo integration in Part 3")

Key Learning Outcomes¶

By completing this tutorial, you have learned:

  1. MLIP Fundamentals: How machine-learned interatomic potentials differ from classical force fields and their advantages for complex chemical systems
  2. Model Evaluation: Methods to assess MLIP accuracy using correlation plots and error metrics
  3. Hyperparameter Effects: How different model parameters (cutoff radius, neural network size, basis functions) affect accuracy and computational cost
  4. System Preparation: Creating realistic molecular systems using SMILES representations and PACKMOL
  5. Uncertainty Quantification: Using ensemble methods to estimate prediction reliability
  6. Integration Workflow: Connecting APAX models with simulation engines through ASE calculators

Next Steps¶

This tutorial provides the foundation for using MLIPs in ESPResSo simulations. Consider exploring:

  • Longer MD simulations to study dynamic properties
  • Temperature and pressure effects on model performance
  • Active learning strategies for model improvement
  • Comparison with other MLIP frameworks (MACE, NequIP, etc.)

References¶

[1] Moritz R. Schäfer, Nico Segreto, Fabian Zills, Christian Holm, and Johannes Kästner. Apax: A Flexible and Performant Framework for the Development of Machine-Learned Interatomic Potentials. Journal of Chemical Information and Modeling 2025 65 (15), 8066-8078
[2] V. Zaverkin and Johannes Kästner. Gaussian Moments as Physically Inspired Molecular Descriptors for Accurate and Scalable Machine Learning Potentials. J. Chem. Theory Comput. 2020, 16, 8, 5410–5421
[3] Fabian Zills, Moritz René Schäfer, Nico Segreto, Johannes Kästner, Christian Holm, and Samuel Tovey. Collaboration on Machine-Learned Potentials with IPSuite: A Modular Framework for Learning-on-the-Fly. The Journal of Physical Chemistry B 2024 128 (15), 3662-3676
[4] B. Cheng, E.A. Engel, J. Behler, C. Dellago, & M. Ceriotti. Ab initio thermodynamics of liquid and solid water. Proc. Natl. Acad. Sci. U.S.A. 2019 116 (4) 1110-1115
[5] Ask Hjorth Larsen et al. The atomic simulation environment—a Python library for working with atoms. J. Phys.: Condens. Matter 2017 29 273002
[6] Fabian Zills, Moritz Schäfer, Samuel Tovey, Johannes Kästner, Christian Holm. ZnTrack -- Data as Code. arXiv:2401.10603 [cs.MS] 2024
[7] David Weininger. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of Chemical Information and Computer Sciences 1988, 28 (1), 31–36