Systematic Coarse-Graining: Boltzmann Inversion¶

Introduction¶

An important concept in statistical mechanics and molecular modeling is coarse-graining. Coarse-graining is a procedure in which complexity of a system is reduced, leading to an effective description in terms of a smaller number of variables. Due to the reduced number of degrees of freedom, such coarse-grained models can significantly reduce the time and computational resources required to simulate complex systems. Therefore, coarse-grained simulations can also be used to simulate systems currently not accessible to atomistic simulation models. Coarse-graining is a broadly applicable approach that encompasses many different methods, and consequently there exists a plethora of different ways to derive coarse-grained models. In this tutorial, you will learn how to apply a systematic coarse-graining procedure, the Boltzmann inversion method.

In the following, we consider a suspension of charged colloidal particles. Colloidal particles (or colloids for short) are particles of sizes between 1 nanometer and 1 micrometer, which can be suspended in aqueous solution. When colloidal particles carry an electrical charge, their interaction in solution is significantly modified by the large number of small salt ions, which dynamically rearrange in a charge cloud around the colloids that screens the bare Coulomb interactions. However, including those small salt ions explicitly in a computer simulation of a colloidal suspension comes at a huge computational overhead. In the spirit of coarse-grained modeling, in this tutorial we thus want to derive an effective pair potential between the colloids, which models the effect of the salt ions implicitly.

To derive an effective pair potential between the colloidal particles, we first need to carry out simulations of a colloidal suspension including explicit salt ions. In those simulations, we measure the radial distribution function (RDF) $g(r)$ between the colloids. For sufficiently dilute systems, where many-body interactions are negligible, the RDF can be directly related to an effective pair potential $V^\mathrm{eff}(r)$ between the colloids: $$g(r)\propto \exp\left(-\beta V^\mathrm{eff}(r)\right).$$ Here, we use a proportionality sign, because the effective pair potential is only determined up to an irrelevant additive constant, which does not contribute to the force. The idea of Boltzmann inversion is to invert this relation, i.e. extract the effective pair potential as $$V^\mathrm{eff}(r) = -\frac{1}{\beta} \ln\left(g(r)\right).$$ This effective interaction can then be used as a tabulated interaction in a simulation containing only colloids, providing an implicit description of the effects of salt.

Simulation with Explicit Salt¶

Simulation Setup¶

To measure the RDF and extract the effective pair potential, we now set up the simulation model of a colloidal suspension including explicit salt ions in ESPResSo. Let us begin by importing the required Python modules:

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import tqdm.auto as tqdm

import espressomd
from espressomd import electrostatics
import espressomd.observables

plt.rcParams.update({'font.size': 14})
required_features = ["LENNARD_JONES", "ELECTROSTATICS"]
espressomd.assert_features(required_features)
/usr/lib/python3/dist-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

In the simulations, there are three kinds of particles: small anions, small cations and large (positively charged) colloidal particles. All three particle types are represented as spherical particles that carry an electrical charge and additionally interact through a WCA potential. For the colloidal particles, the WCA potential is shifted, corresponding to a larger radius of the particles. To implement this model conveniently, we define dictionaries types and charges. We set the charge of the colloidal particles to +3, the charge of the small cations to +1 and the charge of the anions to -1. All charges are measured in units of the elementary charge.

In [2]:
# Particle types and charges
types = {
        "colloid": 0,
        "cation": 1,
        "anion": 2,
        }

charges = {
        "colloid": +3, 
        "cation": +1,
        "anion": -1,
        }

In the next step, we define our unit system. Since ESPResSo works with reduced units, we need to establish a mapping between the simulation units and the usual SI units. In the following, we take the unit of energy to be the thermal energy $k_{\mathrm{B}}T$ at room temperature and the length scale $\sigma = 3.55\cdot 10^{-10}\,\mathrm{m}$, which roughly corresponds to the size of a solvated ion in aqueous solution. Using these units, we can convert concentration between simulation units (number of particles / sigma**3) and SI units (mol/L) using the factor PREF. We set the Bjerrum length to a value of L_BJERRUM = 2.0, which describes electrostatic interactions in water.

In [3]:
# Units
KT = 1.0
SIGMA = 3.55e-10 # Sigma in SI units
AVO = 6.022e+23 # Avogadro's number
PREF = 1/(10**3 * AVO * SIGMA**3) # Prefactor to mol/L
L_BJERRUM = 2.0

Now, we define the colloid radius and number, as well the concentrations of colloids and salt. The chosen parameters are a good compromise between a colloid density that is still low enough such that colloidal three-body interactions are negligible and a density that is too low to lead to a sufficient sampling of the RDF in a short simulation time. Furthermore, the colloid charge and salt concentration are in regimes where the analytical Debye-Hückel theory is generally accurate, which means that the Debye-Hückel prediction serves as a useful benchmark, as we will see below.

In [4]:
RADIUS_COLLOID = 3.0
N_COLLOID = 50

c_colloid = 0.001
C_COLLOID = c_colloid / PREF # colloid concentration in number of particles / sigma**3

c_salt = 0.01
C_SALT = c_salt / PREF # salt concentration in number of particles / sigma**3

Using the chosen colloid number and concentrations, we can then calculate the box size as well as the number of salt ions and counterions. Note that the number of counterions corresponds to the number of salt ion pairs plus the number of ions required to neutralize the charge on the colloidal particles:

In [5]:
BOX_LENGTH = np.cbrt(N_COLLOID / C_COLLOID)
N_SALT = int(round(C_SALT * BOX_LENGTH ** 3))
N_COUNTERION = N_SALT + charges["colloid"]*N_COLLOID

Finally, we set some parameters for the Langevin thermostat and the numerical integrator.

In [6]:
# Langevin-Thermostat and integrator
GAMMA = 1.0
DT = 0.01
SKIN = 1.2

We are now ready to create an instance of the ESPResSo system class and add the colloidal particles as well as the salt ions

In [7]:
# Create an instance of the ESPResSo system class
system = espressomd.System(box_l=[BOX_LENGTH]*3)
system.time_step = DT
system.cell_system.skin = SKIN
system.setup_type_map(type_list=list(types.values()))
np.random.seed(42)

# Add the colloidal particles
system.part.add(type=[types["colloid"]]*N_COLLOID, pos=np.random.rand(N_COLLOID, 3) * BOX_LENGTH, q = [charges["colloid"]]*N_COLLOID)

# Add the salt particles
if N_SALT>0:
    system.part.add(type=[types["anion"]]*N_COUNTERION, pos=np.random.rand(N_COUNTERION, 3) * BOX_LENGTH, q = [charges["anion"]]*N_COUNTERION)
    system.part.add(type=[types["cation"]]*N_SALT, pos=np.random.rand(N_SALT, 3) * BOX_LENGTH, q = [charges["cation"]]*N_SALT)

In the next step, we add the electrostatics solver. While the chosen accuracy of the P3M solver is rather low, it suffices for the purpose of this tutorial.

In [8]:
p3m_params = {'mesh':[16,16,16],'cao':3}
p3m = electrostatics.P3M(prefactor=L_BJERRUM * KT, accuracy=1e-2, **p3m_params)
system.electrostatics.solver = p3m

At last, we can set up the WCA interactions between all particles. Due to the offset in the WCA interactions involving colloids, the simulation setup requires a bit of care to avoid divergences in the forces, because the particle positions are initalized randomly. Therefore, we initially assign all particles the same short-range WCA interactions, corresponding to a particle diameter of 1.0 sigma:

In [9]:
system.non_bonded_inter[types["anion"], types["anion"]].lennard_jones.set_params(epsilon=1.0, sigma=1.0, cutoff=(2)**(1.0/6.0), shift="auto")
system.non_bonded_inter[types["cation"], types["cation"]].lennard_jones.set_params(epsilon=1.0, sigma=1.0, cutoff=(2)**(1.0/6.0), shift="auto")
system.non_bonded_inter[types["anion"], types["cation"]].lennard_jones.set_params(epsilon=1.0, sigma=1.0, cutoff=(2)**(1.0/6.0), shift="auto")
system.non_bonded_inter[types["anion"], types["colloid"]].lennard_jones.set_params(epsilon=1.0, sigma=1.0, cutoff=(2)**(1.0/6.0), shift="auto")
system.non_bonded_inter[types["cation"], types["colloid"]].lennard_jones.set_params(epsilon=1.0, sigma=1.0, cutoff=(2)**(1.0/6.0), shift="auto")
system.non_bonded_inter[types["colloid"], types["colloid"]].lennard_jones.set_params(epsilon=1.0, sigma=1.0, cutoff=(2)**(1.0/6.0), shift="auto")

Warmup¶

Before we can start the actual simulation, we have to set the colloid radius to the desired size. To achieve this, we increase the particle diameter of the colloids in small steps and run a steepest descent integration after each increase to relax the system and avoid divergences. Once the colloidal particles have the desired size, we run a final steepest descent integration.

Exercise

  • Set up the steepest descent integrator with parameters f_max=0, gamma=1 and max_displacement=0.01.
  • Write loop that slowly increases the colloid radius in steps of delta_r = 0.1 until the final colloid size is reached. After each increase, run 100 integration steps to relax the system.

Hint

  • You can change the parameters of the LJ interaction by invoking lennard_jones.set_params again.
  • You need to change the offset for all LJ interactions involving the colloids. The final offset for colloid-colloid interactions should be 2*RADIUS_COLLOID and that for colloid-ion interactions RADIUS_COLLOID.
In [10]:
# SOLUTION CELL
# Determine the number of steps needed
delta_r = 0.1
steps_needed = int(RADIUS_COLLOID/delta_r)

# CSet steepest descent integrator
system.integrator.set_steepest_descent(f_max=0, gamma=1, max_displacement=0.01)
system.integrator.run(100)

# Gradually increase colloid radius
current_radius = 0
for i in tqdm.trange(steps_needed):
    current_radius += delta_r
    system.non_bonded_inter[types["anion"], types["colloid"]].lennard_jones.set_params(epsilon=1.0, sigma=1.0, cutoff=(2)**(1.0/6.0), shift="auto", offset=current_radius)
    system.non_bonded_inter[types["cation"], types["colloid"]].lennard_jones.set_params(epsilon=1.0, sigma=1.0, cutoff=(2)**(1.0/6.0), shift="auto", offset=current_radius)
    system.non_bonded_inter[types["colloid"], types["colloid"]].lennard_jones.set_params(epsilon=1.0, sigma=1.0, cutoff=(2)**(1.0/6.0), shift="auto", offset=2*current_radius)
    system.integrator.run(100)

system.non_bonded_inter[types["anion"], types["colloid"]].lennard_jones.set_params(epsilon=1.0, sigma=1.0, cutoff=(2)**(1.0/6.0), shift="auto", offset=RADIUS_COLLOID)
system.non_bonded_inter[types["cation"], types["colloid"]].lennard_jones.set_params(epsilon=1.0, sigma=1.0, cutoff=(2)**(1.0/6.0), shift="auto", offset=RADIUS_COLLOID)
system.non_bonded_inter[types["colloid"], types["colloid"]].lennard_jones.set_params(epsilon=1.0, sigma=1.0, cutoff=(2)**(1.0/6.0), shift="auto", offset=2*RADIUS_COLLOID)
system.integrator.run(100)
100%|██████████| 30/30 [00:03<00:00,  9.05it/s]
Out[10]:
100

Now, we turn on the Langevin thermostat and run a short warmup integration to equilibrate the system.

In [11]:
system.integrator.set_vv()
system.thermostat.set_langevin(seed=42, gamma=GAMMA, kT=KT)

print("Warming up the system...")
for i in tqdm.trange(10):
    system.integrator.run(1000)
print("Done.")
Warming up the system...
100%|██████████| 10/10 [00:06<00:00,  1.45it/s]
Done.

Production Run¶

Almost everything is ready now to start the production run. As the last prepration step, we need to set up an observable to measure the RDF. The chosen bin size for the RDF is rather large at 0.5 sigma, but allows for a good estimate of the effective interaction with a comparatively small number of samples. This choice is valid, because except for the short-ranged, strongly repulsive WCA part, the effective potential varies on a much longer length scale than the bin size.

In [12]:
# Parameters for the radial distribution function
BIN_WIDTH = 0.5
R_MIN = 0.0
R_MAX = system.box_l[0] / 2.0
N_BINS = int((R_MAX-R_MIN)/BIN_WIDTH)

Exercise

  • Instantiate a RDF observable
  • Instantiate a MeanVarianceCalculator accumulator to track the RDF over time. Samples should be taken every steps_per_subsample steps.
  • Add the accumulator to the auto_update_accumulators of the system for automatic updates
In [13]:
# SOLUTION CELL
rdf_obs = espressomd.observables.RDF(ids1=system.part.select(type=types["colloid"]).id, min_r=R_MIN, max_r=R_MAX, n_r_bins=N_BINS)
rdf_acc = espressomd.accumulators.MeanVarianceCalculator(obs=rdf_obs, delta_N=100)
system.auto_update_accumulators.add(rdf_acc)

Now, we are ready to perform our production run in order to measure the RDF:

In [14]:
print("Performing the production run...")
for i in tqdm.trange(2000):
    system.integrator.run(100)
print("Done.")
Performing the production run...
100%|██████████| 2000/2000 [02:24<00:00, 13.85it/s]
Done.

Data Analysis and Boltzmann Inversion¶

We can now analyze our simulation results. In the first step, we analyze the RDF obtained from the simulation.

Exercise

  • Get the mean RDF (define rdf) from the accumulator and the histogram bin centers (define rs) from the observable.
  • Plot the RDF. Interpret your result physically.
In [15]:
# SOLUTION CELL
rdf = rdf_acc.mean()
rs = rdf_obs.bin_centers()

plt.plot(rs, rdf)
plt.xlabel(r'distance $r$/$\sigma$')
plt.ylabel('colloid-colloid RDF $g(r)$')
plt.show()

Using the RDF, we can now calculate a numerical approximation of the effective pair potential between colloids.

Exercise

  • Use the Boltzmann inversion method, i.e. the equation $V^\mathrm{eff}(r) = -\frac{1}{\beta} \ln\left(g(r)\right)$ to calculate the effective pair potential using the RDF.
  • Create a plot of the effective pair potential and interpret it.
In [16]:
# SOLUTION CELL
start = np.nonzero(rdf)[0][0]  # skip null values in the rdf
rs = rs[start:]
veff = -np.log(rdf[start:])

plt.plot(rs, veff)
plt.xlabel(r"distance $r/\sigma$")
plt.ylabel(r"effective pair potential $V^{\mathrm{eff}}(r)/kT$")
plt.ylim((-0.3,2.0))
plt.show()

For charged colloids with small electrostatic surface potentials, the effective pair potential can be calculated analytically using the Debye-Hückel theory, which was first applied to colloids by Derjaguin, Landau, Verwey and Overbeek, resulting in the famous DLVO theory. The theory predicts that the effective potential between two colloids of charge $Z$ and radius $R$ is given by $$\beta V^{\mathrm{eff}}(r) = Z^2 \lambda_\text{B} \, \left(\frac{e^{\kappa R}}{1 + \kappa R}\right)^2 \, \frac{e^{-\kappa r}}{r}.$$ Here, $\kappa$ is the salt concentration-dependent Debye screening constant and $\lambda_\text{B}$ is the Bjerrum length. The inverse $\kappa^{-1}$ of the Debye screening constant is called the Debye length and characterizes the range of the effective interaction. For our system, the analytical result for the effective potential is a useful benchmark to test the simulation results against.

Exercise

  • Use the function veff_debye_hueckel defined below to plot a comparison of the Debye-Hückel potential and the effective pair potential obtained from the simulations.
In [17]:
def veff_debye_hueckel(r, q, c_salt, c_colloid, radius_colloid):
    """
    Calculate the effective potential between the colloids according to the Debye-Hückel theory.
    The calculation takes into account both the salt and the additional counterions.

    Args:
        r (float): The distance between the colloids in sigma.
        q (float): The charge of the colloids in elementary charge units.
        c_salt (float): The concentration of salt in the solution in M (mol/L).
        c_colloid (float): The concentration of colloids in the solution in M (mol/L).
        radius_colloid (float): The radius of the colloids in sigma.

    Returns:
        float: The effective potential between the colloids in kT.
    """
    bjerrum_length = 2.0
    lambda_D = (0.304 / (np.sqrt(c_salt) * np.sqrt(1 + q * c_colloid/ (2 * c_salt)))) / (0.355) # Calculate the Debye screening length
    kappa = 1/lambda_D # Calculate the Debye screening parameter

    if r > 2*(radius_colloid+0.5):
        return q**2 * bjerrum_length * (np.exp(kappa * radius_colloid) /(1 + kappa*radius_colloid))**2 * np.exp(-kappa * r) / r
    else:
        return np.nan
In [18]:
# SOLUTION CELL
plt.plot(rs, veff, label="Boltzmann Inversion")
plt.plot(rs, [veff_debye_hueckel(r, charges["colloid"], c_salt, c_colloid, RADIUS_COLLOID) for r in rs], label="Debye-Hückel")
plt.legend()
plt.xlabel(r"distance $r/\sigma$")
plt.ylabel(r"effective pair potential $V^{\mathrm{eff}}(r)/kT$")
plt.ylim((-0.3,2.0))
plt.show()

Simulation using the Developed Coarse-Grained Potential¶

Setup¶

Having obtained the effective potential between the colloidal particles, we can now perform a coarse-grained simulation using this potential. In a first step, we remove all particles from the system and turn off the electrostatics solver and the thermostat:

In [19]:
system.part.clear()
system.electrostatics.clear()
system.thermostat.turn_off()

Now, we add back the relevant particles to the system. In contrast to the simulations we carried out before, we only need to add the colloidal particles, because there are no explicit salt ions present in the simulations. The effect of salt is taken into account implicitly through the use of the effective potential that we have derived using Boltzmann inversion.

In [20]:
# Add the colloidal particles
system.part.add(type=[types["colloid"]]*N_COLLOID, pos=np.random.rand(N_COLLOID, 3) * BOX_LENGTH)
Out[20]:
<espressomd.particle_data.ParticleSlice at 0x79477bed0640>

In principle, we would now have to add only the effective potential. However, because we ran only very short, unconstrained MD simulations, the strongly repulsive part of the effective potential is not properly sampled. This can lead to instabilites that make the simulation crash. As a solution, we will cut off the tabulated interaction for small distances and still retain the WCA interaction to stabilize the system. Note that we set the initial particle diameter to 1.0 again in order to avoid diverging forces:

In [21]:
# Add excluded volume interactions
system.non_bonded_inter[types["colloid"], types["colloid"]].lennard_jones.set_params(epsilon=1.0, sigma=1.0, cutoff=(2)**(1.0/6.0), shift="auto")

The only part that is missing now is the effective potential we calculated in the previous section. To avoid double-counting the repulsive WCA potential we just added, we account for the effective potential only at distances larger than 2*RADIUS_COLLOID+2**(1/6), i.e. the cutoff of the WCA interaction. Furthermore, as plot we generated above shows, the interaction rapidly decays for large distances. Accordingly, we can also introduce an upper cutoff.

Exercise

  • Calculate the effective force from the effective potential using a numerical derivative.
  • Examine your plot of the effective potential and decide on an upper cutoff for the tabulated interaction. Why did you choose this value?
  • Use the effective potential and the effective force to set up a tabulated interaction in ESPResSo. The tabulated interaction should be defined only between a minimum distance of 2*RADIUS_COLLOID+2**(1/6) (the cutoff of the WCA interaction) and the upper cutoff you determined in the previous step.

Hint

  • Use the NumPy function np.gradient to numerically calculate the effective force from the effective potential.
  • Use NumPy array filtering to select the relevant values of the effective potential and effective force for distances that lie between the minimum distance and the upper cutoff.
  • Refer to the the documentation to learn how tabulated interactions can be set up in ESPResSo.
In [22]:
# SOLUTION CELL
# Calculate the effective force
effective_force = -np.gradient(veff, rs)

# Set the range of interaction
cutoff_tabulated = 25
idx = np.nonzero(np.logical_and(2 * RADIUS_COLLOID + 2**(1/6) < rs, rs < cutoff_tabulated))
veff = veff[idx]
effective_force = effective_force[idx]
rs = rs[idx]

# Add the tabulated interaction in ESPResSo
system.non_bonded_inter[types["colloid"], types["colloid"]].tabulated.set_params(min=rs[0], max=rs[-1], energy=veff, force=effective_force)

Like in the simulations with explicit salt, we now slowly increase the colloid radius to avoid divergences.

In [23]:
# Slowly increase the colloid radius to avoid numerical problems
print("Removing overlaps and slowly increasing colloid radius...")
delta_r = 0.1
steps_needed = int(RADIUS_COLLOID/delta_r)

system.integrator.set_steepest_descent(f_max=0, gamma=1, max_displacement=0.01)
system.integrator.run(100)

current_radius = 0
for i in tqdm.trange(steps_needed):
    current_radius += delta_r
    system.non_bonded_inter[types["colloid"], types["colloid"]].lennard_jones.set_params(epsilon=1.0, sigma=1.0, cutoff=2**(1.0/6.0), shift="auto", offset=2*current_radius)
    system.integrator.run(100)

system.non_bonded_inter[types["colloid"], types["colloid"]].lennard_jones.set_params(epsilon=1.0, sigma=1.0, cutoff=2**(1.0/6.0), shift="auto", offset=2*RADIUS_COLLOID)
system.integrator.run(100)
print("Done.")
Removing overlaps and slowly increasing colloid radius...
100%|██████████| 30/30 [00:00<00:00, 275.25it/s]
Done.

Lastly, we add the Langevin thermostat and run a short equilibration:

In [24]:
# Add thermostat
system.integrator.set_vv()
system.thermostat.set_langevin(seed=42, gamma=GAMMA, kT=KT)
print("Added thermostat.")

print("Warming up the system...")
for i in tqdm.trange(10):
    system.integrator.run(1000)
print("Done.")
Added thermostat.
Warming up the system...
100%|██████████| 10/10 [00:00<00:00, 43.31it/s]
Done.

Production Run and Analysis¶

We are now ready to perform our production run. Like in the simulation with explicit salt, we measure the RDF between colloids. Furthermore, the number of MD steps we perform is exactly the same.

In [25]:
rdf_obs = espressomd.observables.RDF(ids1=system.part.select(type=types["colloid"]).id, min_r=R_MIN, max_r=R_MAX, n_r_bins=N_BINS)
rdf_acc = espressomd.accumulators.MeanVarianceCalculator(obs=rdf_obs, delta_N=100)
system.auto_update_accumulators.add(rdf_acc)

# Perform the production run
print("Performing the production run...")
for i in tqdm.trange(2000):
    system.integrator.run(100)
print("Done.")
Performing the production run...
100%|██████████| 2000/2000 [00:04<00:00, 440.96it/s]
Done.

Exercise

  • Compare the required CPU time for the simulation with the coarse-grained potential to the simulation with explicit salt. How much faster is the simulation using the effective potential?

Exercise

  • Plot a comparison of the RDFs obtained from simulations with explicit and implicit salt.
In [26]:
# SOLUTION CELL
rdf_dh = rdf_acc.mean()
rs_dh = rdf_obs.bin_centers()

plt.plot(rs_dh, rdf, label='Explicit salt')
plt.plot(rs_dh, rdf_dh, label='Implicit salt')
plt.legend()
plt.xlabel(r'$r$/$\sigma$')
plt.ylabel('colloid-colloid RDF g(r)')
plt.show()