Quantum mechanics gives us the most complete description of the world we can achieve. While the methods involve some degree of approximation, there are numerous approaches to performing quantum mechanical simulations. The most prominent one is probably density functional theory (DFT). Simulations at this level of accuracy are some of the most precise we can perform, allowing us to study complex chemical interactions and even bond-dynamics studies. Unfortunately, they are computationally demanding, and thus limited to small numbers of atoms and short time scales.
Over the past decade, machine learning has been used extensively to address this problem. Beginning in 2008, scientists began using machine-learning models trained on small quantum mechanical simulations to perform large-scale atomistic studies with no loss in accuracy. The models work by computing energies and forces on individual atoms based on the local atomic environment, encoded in a chemical descriptor. Nowadays, this technique enables large-scale simulations with quantum mechanical accuracy (Kozinsky et al.).
There are many different machine-learning models and dedicated open-source Python packages:
Each of these has its own specialization and applications. In their infancy, using a trained machine-learned potential was challenging: it typically required extensive re-training and, therefore, quantum mechanical simulations or complex active-learning routines where one iteratively improves model accuracy during a simulation. However, as of 2024, foundation models now exist. These models are trained on an extensive chemical space and have demonstrated to be stable on many material simulations.
In this tutorial, we will focus on the MACE-MP-0 model. This model has been trained on the materials project dataset and can accurately predict energies and forces for many organic and inorganic systems. The MACE-MP-0 model uses the state-of-the-art MACE equivariant message passing model architecture, more information for which is available in the lecture slides.
Most machine-learned potential framework operations through the Atomic Simulation Environment (ASE) Python package.
ASE aims to simplify small atomistic computations and finds a lot of use in quantum mechanical and single-point calculations.
The majority of ASE takes place within the ase.Atoms
objects, which in turn are built from the ase.Atom
objects (note the missing final 's').
An Atom
object contains information about a single atom in a configuration (an Atoms
object),
including its position, forces, species, and how interactions should be computed between its neighbours.
These interactions are defined using Calculators
, which take a whole Atoms
object and return energies and forces on each atom.
More information can be found in the ASE documentation: https://wiki.fysik.dtu.dk/ase/ase/atoms.html.
As a part of this tutorial, we have extended ESPResSo to interface with ASE in such a way that it allows calculators to apply forces to atoms in a simulation box. In this case, we will focus solely on the machine-learned potentials. However, by the end of the tutorial, the way other calculators could be used should be clear.
The first step in the tutorial is the imports. We will need a few additional libraries during this tutorial, so let's discuss each one and why we will use it.
They can be installed with the following command:
python3 -m pip install -c requirements "mace-torch==0.3.6" "rdkit2ase==0.1.4" "plotly>=5.15.0" tqdm pint rdkit
espressomd
to create our simulation box and run the MD simulationsASEInterface
to allow ESPResSo to communicate with ASEZnDraw
visualizer for interactive visualizationWe import the foundation model mentioned earlier, mace. Specifically, the ASE calculator that is shipped with the model. When you perform this import, you automatically download the trained model parameters.
The next import is rdkit2ase
, a helper package developed at the Institute for Computational Physics. It allows users to interface the popular atomistic analysis tool RDKit with ASE.
We will use it to generate molecular structures later on.
# ESPResSo imports
import espressomd
from espressomd.plugins.ase import ASEInterface
from espressomd.zn import Visualizer
# MACE-MP-0
from mace.calculators import mace_mp
# Simulation box
import rdkit2ase
# Plotting tools
import rdkit.Chem
import rdkit.Chem.Draw
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams.update({'font.size': 18})
# Miscellaneous
import numpy as np
import pint
import tqdm
In the majority of machine-learned inter-atomic potential studies the unit system is eV for energy, Angstrom for distances, and mass in atomic units. This is not due to any limitations on the models, simply what has now become the default units for the models. The machine-learning algorithms themselves don't care what they are trained on, but they can only be trained on one set of units, so it is important to know what units they are. Additionally, times are often on the femtosecond scale. Perhaps not surprisingly, this is also the standard unit system within ASE. We will use pint to manage our units.
Exercise:
pint.UnitRegistry
for the unit conversions throughout the tutorial and name the variable ureg
.Hint:
# SOLUTION CELL
ureg = pint.UnitRegistry()
In this tutorial, we will use Simplified Molecular Input Line Entry System (SMILES) representations and RDKit to generate starting structures.
SMILES strings are text representations of molecules without the hydrogen atoms.
For example, the SMILES strings for methane, ethane, and propane are C
, CC
, and CCC
respectively.
Other elements are represented by their atomic symbol.
Ammonia for example would be N
.
SMILES can also represent cyclic structures, double bonds, and other chemically relevant properties, but we won't get into this today.
RDKit, the tool mentioned earlier, provides powerful utilities for creating 3D structures from these string representations of molecules.
In the first part of the tutorial, we will perform a geometry optimization for water. The chemical formula of water is H2O.
Exercise:
smiles
, we will use it to create a molecular structure.
Remember, you do not include hydrogens in a SMILES representation.# SOLUTION CELL
smiles = "O"
Using the SMILES string, we can create a 2D sketch of the molecule using rdkit:
m = rdkit.Chem.MolFromSmiles(smiles)
rdkit.Chem.Draw.MolToImage(rdkit.Chem.AddHs(m))
We can also convert it into an ase.Atoms
object. For that we can use the function rdkit2ase.smiles2atoms()
.
atoms = rdkit2ase.smiles2atoms(smiles)
atoms
We are almost there in terms of computing energies on this system; we only need a calculator now.
We can now call up the machine-learning model, MACE-MP-0, download its parameters in the process, and attach it to the ase.Atoms
object.
mace_mp_calc = mace_mp()
atoms.calc = mace_mp_calc
To compute energies and forces with this model one can use the member functions of the ase.Atoms
object called get_potential_energy()
and get_forces()
.
Both will be used later in the integration cycles.
Please note that the units depend on how the model was trained, in this case eV and eV per Angstrom.
Now that we know how to compute properties, it is time to plug it into ESPResSo and peform the geometry optimization. We will create a box of arbitrary size, define a simulation time step of 0.5 fs to resolve the movement of the hydrogens in the molecule, and run a geometry optimization.
This would allow the generation of a relaxed structure and it would give us a good starting structure for an MD simulation.
The first step is creating the simulation box and setting the time step. This is also where the first element of compelxity arises. ESPResSo uses simulation units, not real units, when performing simulations. This means you will need to convert the units you get out of ASE, into something that makes sense for ESPResSo. The good news is, you'll get plenty of practice doing so here.
Exercise:
System
class called system
with box length of 16
.0.4
.Hint:
m_as()
method.# SOLUTION CELL
system = espressomd.System(box_l=[16, 16, 16])
system.periodicity = [False, False, False]
system.cell_system.skin = 0.4
system.time_step = (0.5 * ureg.fs).m_as(((1 * ureg.u * ureg.angstrom**2) / ureg.electron_volt) ** 0.5)
The next step is adding atoms to the simulation box.
ESPResSo, as it deals mostly with fictitious particle types, does not require physically realistic particle numbers when it comes to types.
ASE, however, as it only deals with atoms, does.
Because of this, we have to define the mapping between the ESPResSo particle types and the ASE atom types.
This is done by passing a type_mapping dictionary to the ASEInterface
class.
An example for such a type map for a water system would be:
type_mapping = {1: 8, 2: 1}
e.g., particle type 1 in ESPResSo is mapped to Oxygen (atomic number 8) and particle type 2 is mapped to Hydrogen (atomic number 1). In this tutorial, for simplicity, we will use the same particle types as the atomic number in ASE. This means the mapping can simply be created by:
type_mapping = {x: x for x in set(atoms.numbers)}
for atom in atoms:
system.part.add(pos=atom.position, type=atom.number, mass=atom.mass)
system.ase = ASEInterface(type_mapping={x: x for x in set(atoms.numbers)})
We will use the ZnDraw visualizer integrated with ESPResSo to follow the simulation. To see the effect of the geometry optimization, we display the atomic forces. You can either follow the simulation inside this Notebook or open the app in a dedicated window to the side, following the printed URL.
vis = Visualizer(system)
vis.zndraw.config.scene.vectors = "forces"
vis.zndraw.config.scene.vector_scale = 5
vis.zndraw.config.scene.simulation_box = False
vis
We can use the steepest descent algorithm of ESPResSo for the energy minimization.
system.integrator.set_steepest_descent(f_max=(0.1 * ureg.eV / ureg.angstrom).magnitude,
gamma=4,
max_displacement=(0.001 * ureg.angstrom).magnitude)
In practice this means that we use the MACE Model to predict the forces on the atoms, which are then provided to ESPResSo as external forces and ESPResSo takes care of the atom displacements.
energies = []
for _ in tqdm.trange(100, ncols=120):
atoms = system.ase.get()
atoms.calc = mace_mp_calc
system.part.all().ext_force = atoms.get_forces()
energies.append(atoms.get_potential_energy())
vis.zndraw.append(atoms)
system.integrator.run(1)
Exercise:
mp-697111
structure is a periodic ice crystal belonging to the space group Cmc21.# SOLUTION CELL
# Due to the PBC and interactions the energy of the ice crystal is lower than the energy of the isolated water molecule.
# Going beyond the given fmax would be outside the achievable accuracy of the MLIP.
plt.figure(figsize=(10, 6))
plt.plot(energies)
plt.xlabel("Frame")
plt.ylabel("Energy (eV)")
plt.show()
We have seen how we can use the MACE Model to do geometry optimization of an organic molecule in ESPResSo. We will continue with another system to showcase the dynamics for a more advanced system. For that we will start from a cleared system and also remove the frames from the visualizer.
system.part.clear()
system.thermostat.turn_off()
del vis.zndraw[:]
A particularly novel application of machine-learned potentials is the description of chemical reactions. Molecular dynamics engines require special methods to model bond breaking and bond formation, since it entails overcoming energy barriers and dynamically updating the connectivity information of atoms. Methods that aid this process are plentiful but beyond the scope of this tutorial. Therefore, we will examine the proton transfer from sulfuric acid to water, which can be seen without biasing the simulation.
In this part of the tutorial, we generate two different molecular species - water and sulfuric acid - and use packmol to create a suitable starting structure. Packmol takes an example molecule structure and fills a simulation box with it, while taking periodic boundary conditions into account avoiding atom overlap. We will use the Python interface to packmol provided by rdkit2ase.
To see the proton transfer we need two molecules, water and sulfuric acid.
We again make use of the SMILES representations to generate the ase.Atoms
objects,
which are then used to fill the box with 100 water molecules and 1 sulfuric acid molecule.
water = rdkit2ase.smiles2atoms("O")
sulfuric_acid = rdkit2ase.smiles2atoms("OS(=O)(=O)O")
box_of_atoms = rdkit2ase.pack([[water], [sulfuric_acid]], [100, 1], density=1000)
We will repeat the geometry optimization process. Now that we have a bulk structure, we need to set the correct cell vectors, which we also get from the ase structure we just created. We will use a slightly smaller timestep in this study as it will make watching the bond formation later a lot easier.
system.box_l = box_of_atoms.get_cell().diagonal()
system.periodicity = [True, True, True]
system.time_step = (0.25 * ureg.fs).m_as(((1 * ureg.u * ureg.angstrom**2) / ureg.electron_volt) ** 0.5)
system.cell_system.skin = 0.4
system.ase = ASEInterface(type_mapping={x: x for x in set(box_of_atoms.numbers)})
for atom in box_of_atoms:
system.part.add(pos=atom.position, type=atom.number, mass=atom.mass)
system.integrator.set_steepest_descent(f_max=(0.1 * ureg.eV / ureg.angstrom).magnitude,
gamma=4,
max_displacement=(0.001 * ureg.angstrom).magnitude)
We again use ZnDraw and enable showing the box this time. The next code cells will update the window that was created in the water exercise.
vis.zndraw.config.scene.simulation_box = True
vis.zndraw.config.scene.vector_scale = 0
for _ in tqdm.trange(50, ncols=120):
atoms = system.ase.get()
atoms.calc = mace_mp_calc
system.part.all().ext_force = atoms.get_forces()
vis.zndraw.append(atoms)
system.integrator.run(1)
With this minimized structure, we can now run an MD simulation. Let us highlight the two hydrogen atoms from the sulfuric acid in the visualizer. These are the particles we want to follow and will participate in the proton transfer.
vis.zndraw.selection = [305, 306]
To run the MD we will switch to a Velocity Verlet integrator and set the thermostat to 400 K.
system.integrator.set_vv()
system.thermostat.set_langevin(kT=(400 * ureg.K * ureg.boltzmann_constant).m_as("eV"), gamma=2, seed=42)
Conceptually the process for the MD is the same with the geometry optimization. However, this time we only show every 5th frame to the visualizer.
tbar = tqdm.trange(500, ncols=120)
for idx in tbar:
atoms = system.ase.get()
atoms.calc = mace_mp_calc
system.part.all().ext_force = atoms.get_forces()
if idx % 5 == 0:
vis.zndraw.append(atoms)
system.integrator.run(1)
tbar.set_description(f"E_pot: {atoms.get_potential_energy():.3f} eV")
A good indicator to see the protonation events is the distance between the hydrogen and the oxygen atom in the hydroxyl group of the sulfuric acid. We calculate these distances for each frame in the visualization together with the potential energy.
OH_1 = []
OH_2 = []
energies = []
for atoms in vis.zndraw:
OH_1.append(atoms.get_distance(300, 305, mic=True))
OH_2.append(atoms.get_distance(304, 306, mic=True))
plt.figure(figsize=(10, 6))
plt.plot(OH_1, label="OH1")
plt.plot(OH_2, label="OH2")
plt.xlabel("Frame")
plt.ylabel("O$-$H bond distance (Å)")
plt.legend()
plt.show()
Exercise:
In this tutorial, you have created simulations from simple SMILES representations and were able to run geometry optimization and MD simulations with (almost) DFT accuracy. These approaches are not just theoretical, but highly practical. They can be used when a system requires a high degree of accuracy in its calculations, providing you with a reliable toolkit for your computational chemistry simulations.