The constant-pH ensemble method for acid-base reactions

Expected prior knowledge

To work with this tutorial, you should be familiar with the following topics:

  • Setting up and running simulations in ESPResSo - creating particles, interactions, electrostatics
  • Basic knowledge of chemistry, chemical reaction equilibria and specifically acid-base equilibria
  • Thermodynamics, especially chemical potentials and activities
  • Reduced units, as described in the ESPResSo user guide

Introduction

This tutorial introduces the basic features for simulating titratable systems via the constant $\mathrm{pH}$ method. It is one of the methods implemented within the Reaction Ensemble module for simulating systems which involve chemical reaction equilibria. It is a Monte Carlo method designed to model an acid-base ionization reaction at a prescribed value of solution $\mathrm{pH}$.

To start with, consider a homogeneous aqueous solution of a titratable acidic species $\mathrm{HA}$ that can dissociate in a reaction $$\mathrm{HA} \Leftrightarrow \mathrm{A}^- + \mathrm{H}^+ ,$$ characterized by the equilibrium constant $\mathrm{p}K_{\mathrm{A}} = -\log_{10} K_{\mathrm{A}}$

The ionization degree $\alpha$ is defined as the fraction of ionized form of the acid $\mathrm{A}$, relative to its total amount, given by the sum of both, ionized and non-ionized form. By introducing $N_{\mathrm{A}} = N_{\mathrm{HA}} + N_{\mathrm{A}^-}$ as the total number of titratable groups in solution, we can express the degree of dissociation $\alpha$ as:

$$\alpha = \dfrac{ N_{\mathrm{A}^-} }{N_{\mathrm{HA}} + N_{\mathrm{A}^-}} = \dfrac{ N_{\mathrm{A}^-} }{ N_{\mathrm{A}} }.$$

The ionization degree is one of the key quantities that can be used to describe the acid-base equilibrium. Usually, the goal of the simulation is to predict the value of $\alpha$ under given conditions in a complex system with interactions.

The chemical equilibrium and reaction constant

The equilibrium reaction constant describes the chemical equilibrium of a given reaction. The values of equilibrium constants for various reactions can be found in tables. The equilibrium constant of dissociation of a weak acid is conventionally called the acidity constant, and it is defined as \begin{equation} K_{\mathrm{A}} = \frac{a_{\mathrm{H}^+} a_{\mathrm{A}^-} } {a_{\mathrm{HA}}} \end{equation} where $a_i$ is the activity of species $i$. The activity $a_i$ is related to the chemical potential $\mu_i$ and to the concentration $c_i$ \begin{equation} \mu_i = \mu_i^\mathrm{\ominus} + k_{\mathrm{B}}T \ln a_i \,,\qquad a_i = \frac{c_i \gamma_i}{c^{\ominus}}\,, \end{equation} where $\gamma_i$ is the activity coefficient, and $c^{\ominus}$ is the (arbitrary) reference concentration, often chosen to be the standard concentration, $c^{\ominus} = 1\,\mathrm{mol/L}$, and $\mu_i^\mathrm{\ominus}$ is the chemical potential in the reference state. Note that $K_{\mathrm{A}}$ is a dimensionless quantity. Its numerical value depends on the choice of $c^{\ominus}$ but the thermodynamics is independent of this choice. The activity coefficient, $\gamma_i$, accounts for the effect of interactions on the chemical potential. Therefore, by definition, $\gamma_i=1$ for an ideal system, whereas for an interacting system $\gamma_i$ is a non-trivial function of the interactions. For an ideal system we can rewrite $K_{\mathrm{A}}$ in terms of equilibrium concentrations of the reacting species \begin{equation} K_{\mathrm{A}} \overset{\mathrm{ideal}}{=} \frac{c_{\mathrm{H}^+} c_{\mathrm{A}^-} } {c_{\mathrm{HA}} c^{\ominus}} \end{equation}

In the chemical literature, the ionization degree is usually expressed in terms of the concentrations \begin{equation} \alpha = \frac{N_{\mathrm{A}^-}}{N_{\mathrm{HA}} + N_{\mathrm{A}^-}} = \frac{c_{\mathrm{A}^-}}{c_{\mathrm{HA}}+c_{\mathrm{A}^-}} = \frac{c_{\mathrm{A}^-}}{c_{\mathrm{A}}} \end{equation} where $c_{\mathrm{A}}=c_{\mathrm{HA}}+c_{\mathrm{A}^-}$ is the total (analytical) concentration of titratable acid groups, irrespective of their ionization state. Then, we can characterize the acid-base ionization equilibrium using the ionization degree and $\mathrm{pH}$, defined as \begin{equation} \mathrm{pH} = -\log_{10} a_{\mathrm{H^{+}}} \overset{\mathrm{ideal}}{=} -\log_{10} (c_{\mathrm{H^{+}}} / c^{\ominus}) \end{equation} Note that the second equality is valid only in the ideal case. Substituting for the ionization degree and $\mathrm{pH}$ into the expression for $K_{\mathrm{A}}$ we obtain the Henderson-Hasselbalch equation \begin{equation} \mathrm{pH}-\mathrm{p}K_{\mathrm{A}} = \log_{10} \frac{\alpha}{1-\alpha} \end{equation} One implication of the Henderson-Hasselbalch equation is that at a fixed $\mathrm{pH}$ value the ionization degree of an ideal acid is independent of concentration. Another implication is that the degree of ionization depends only on the difference, $\mathrm{pH}-\mathrm{p}K_{\mathrm{A}}$ but not on the absolute values of $\mathrm{p}K_{\mathrm{A}}$ and $\mathrm{pH}$ alone. Therefore, for an ideal system, the ionization degree $\alpha$ can be obtained from the equation via the following function:

In [1]:
# ionization degree alpha calculated from the Henderson-Hasselbalch equation for an ideal system
def ideal_alpha(pH, pK):
    return 1. / (1 + 10**(pK - pH))

The constant pH method

The constant $\mathrm{pH}$ method [Reed1992] has been designed to simulate an acid-base ionization reaction at a given $\mathrm{pH}$. It assumes that $\mathrm{pH}$ in the system is constant, independent of the ionization degree. In a real solution, the ionization reaction generates an $\mathrm{H^+}$ ion, that immediately reacts with the the buffer that has been added to the solution in order to keep the $\mathrm{pH}$ (approximately) constant. Effectively, the net reaction that can be written as $$\mathrm{HA} \Leftrightarrow \mathrm{A}^- + \mathrm{B}^+ ,$$ where we use $\mathrm{B}^+$ to denote a generic neutralizing counterion that could be either the $\mathrm{H}^+$ ion, or another cation that is generated by the reaction of the $\mathrm{H}^+$ ion with the buffer. In the literature, this counterion is often denoted as $\mathrm{H^+}$, however, this may lead to confusion because then the actual number of $\mathrm{H^+}$ ions in the box does not correspond to their concentration implied by the fixed value of the $\mathrm{pH}$.

In ESPResSo, the forward step of the ionization reaction (from left to right) is implemented by changing the chemical identity (particle type) of a randomly selected $\mathrm{HA}$ particle to $\mathrm{A}^-$, and inserting another particle that represents a neutralizing counterion. In the reverse direction (from right to left), the chemical identity (particle type) of a randomly selected $\mathrm{A}^{-}$ is changed to $\mathrm{HA}$, and a randomly selected $\mathrm{B}^+$ is deleted from the simulation box. The probability of proposing the forward reaction step is $P_\mathrm{for}=N_\mathrm{HA}/N_\mathrm{A}$, and probability of proposing the reverse step is $P_\mathrm{rev}=N_\mathrm{A^{-}}/N_\mathrm{A} = 1 - P_\mathrm{for}$. The trial move is accepted with the acceptance probability

$$ P_{\mathrm{acc}} = \operatorname{min}\left(1, \exp(-\beta \Delta E_\mathrm{pot} \pm \ln(10) \cdot (\mathrm{pH - p}K_A) ) \right)$$

Here $\Delta E_\text{pot}$ is the potential energy change due to the reaction, while $\text{pH - p}K_\mathrm{A}$ is an input parameter. The signs $\pm 1$ correspond to the forward and reverse direction of the ionization reaction, respectively.

Simulation setup

First we import all necessary modules including ESPResSo for simulations and others for convenience.

In [2]:
%matplotlib inline
import matplotlib.pyplot as plt
In [3]:
plt.rcParams.update({'font.size': 18})

import numpy as np
import setuptools
import pint  # module for working with units and dimensions
import time
assert setuptools.version.pkg_resources.packaging.specifiers.SpecifierSet('>=0.10.1').contains(pint.__version__), \
  f'pint version {pint.__version__} is too old: several numpy operations can cast away the unit'

import espressomd
espressomd.assert_features(['WCA', 'ELECTROSTATICS'])
import espressomd.electrostatics
import espressomd.reaction_methods
import espressomd.polymer
from espressomd.interactions import HarmonicBond

The package pint is intended to make handling of physical quantities with different units easy. You simply create an instance of pint.UnitRegistry and access its unit definitions and automatic conversions. For more information or a quick introduction please look at the pint-documentation or pint-tutorials.

In [4]:
ureg = pint.UnitRegistry()

The inputs that we need to define our system in the simulation include

  • temperature TEMPERATURE
  • relative permittivity of water WATER_PERMITTIVITY
  • Bjerrum length BJERRUM_LENGTH
  • concentration of the titratable units C_ACID
  • system size (given by the number of titratable units) N_ACID
  • concentration of added salt C_SALT
  • dissociation constant pK
  • pH
  • types of non-bonded interactions we want to use
  • particle types TYPES and charges CHARGES and their mapping to espresso

Set the reduced units of energy and length

We start by setting the desired temperature $T$ using the variable TEMPERATURE, particle size $\sigma$ using the variable PARTICLE_SIZE and the relative permittivity $\varepsilon_{\mathrm{r}}$ of the aqueous medium at this temperature, using the variable WATER_PERMITTIVITY.

Another important length scale is the BJERRUM_LENGTH which has the physical meaning of the distance, at which the interaction energy between two elementary charges is equal to $k_{\mathrm{B}}T$: $$ l_{\mathrm{B}} = \frac{e^2}{4\pi \varepsilon_0 \varepsilon_{\mathrm{r}} k_{\mathrm{B}} T} $$

Next, we use the above values to define the conversion factors between the values of input parameters in the SI units and the reduced units of length and energy that ESPResSo requires as inputs.

In coarse-grained models of aqueous solutions, it is customary to set the size of ions to approximately $\sigma = 0.355\,\mathrm{nm}$, that corresponds to one half of Bjerrum length in water at room temperature $l_{\mathrm{B}} =0.71\,\mathrm{nm}$. We choose the ion size, $\sigma = 0.355\,\mathrm{nm}$, as the unit of length, which we conveniently do by defining a new unit using the pint module. Then, the Bjerrum length should attain the value of 2.0 in the reduced units. The ratio of Bjerrum length to other important length scales in the system is an important parameter that determines the relative strength of the electrostatic interactions. Furthermore, we choose the thermal energy $k_{\mathrm{B}}T$ as the unit of energy, and the elementary charge $e$ as the unit of charge.

When calculating the Bjerrum length in the script, it is convenient to convert the computed value to $\mathrm{nm}$ using pint, because then it is easier to use in further calculations.

In [5]:
TEMPERATURE = 298 * ureg.K
WATER_PERMITTIVITY = 78.5 # at 298 K
KT = TEMPERATURE * ureg.k_B
PARTICLE_SIZE = 0.355 * ureg.nm
BJERRUM_LENGTH = (ureg.e**2 / (4 * ureg.pi * ureg.eps0 * WATER_PERMITTIVITY * KT)).to('nm')

ureg.define(f'reduced_energy = {TEMPERATURE} * boltzmann_constant')
ureg.define(f'reduced_length = {PARTICLE_SIZE}')
ureg.define(f'reduced_charge = 1*e')

# store the values of some parameters in dimensionless reduced units, that will be later passed to ESPResSo
KT_REDUCED = KT.to('reduced_energy').magnitude
BJERRUM_LENGTH_REDUCED = BJERRUM_LENGTH.to('reduced_length').magnitude
PARTICLE_SIZE_REDUCED = PARTICLE_SIZE.to('reduced_length').magnitude

print("KT = {KT.to('reduced_energy'):4.3f}")
print("PARTICLE_SIZE = {PARTICLE_SIZE.to('reduced_length'):4.3f}")
print("BJERRUM_LENGTH = {BJERRUM_LENGTH.to('reduced_length'):4.3f}")
KT = {KT.to('reduced_energy'):4.3f}
PARTICLE_SIZE = {PARTICLE_SIZE.to('reduced_length'):4.3f}
BJERRUM_LENGTH = {BJERRUM_LENGTH.to('reduced_length'):4.3f}

Set the key physical parameters that uniquely define the system

We want to simulate a solution of polyelectrolyte chains composed of weak acidic groups. The chains are is characterized by the number of acidic groups, their $\mathrm{p}K_{\mathrm{A}}$ and their concentration in the solution. The polyelectrolytes are dissolved in an aqueous solution of NaCl, further characterized by the salt concentration and by the $\mathrm{pH}$ value.

To specify the system properties, we to define the total number of titratable units N_ACID, their concentration C_ACID and the salt concentration C_SALT. We set the dissociation constant of the acid to $\mathrm{p}K_\mathrm{A}=4.88$, that is the acidity constant of propionic acid. The structure of propionic acid resembles the repeating unit of poly(acrylic acid), the most commonly used weak polyacid.

In [6]:
# Parameters that define properties of the system
N_ACID = 20
C_ACID = 1e-3 * ureg.molar
C_SALT = 2 * C_ACID
pKa = 4.88 # acidity constant
pKw = 14.0  # autoprotolysis constant of water

Set the range of parameters that we want to vary

We will simulate multiple $\mathrm{pH}$ values, where the range is determined by the parameters OFFSET and NUM_PHS. The titration curve of the ideal system is symmetric around $\mathrm{pH} = \mathrm{p}K_{\mathrm{A}}$ but the titration curve of the non-ideal system is shifted to higher $\mathrm{pH}$ values. Therefore, we apply the offset asymmetrically.

In [7]:
# Number of pH values that we want to sample
NUM_PHS = 15  # number of pH values
OFFSET = 2.0  # range of pH values to be used = pKa +/- offset

pHmin = pKa - 0.5 * OFFSET  # lowest pH value to be used
pHmax = pKa + 1.5 * OFFSET  # highest pH value to be used
pHs = np.linspace(pHmin, pHmax, NUM_PHS)  # list of pH values

Choose which interactions should be activated

Here we decide what kind of non-bonded interactions we want to use. By setting USE_WCA to True the script below creates WCA-interactions between all particles. Setting USE_ELECTROSTATICS to True will result in electrostatic interactions being turned on. Using electrostatic interactions and differently charged particles always has to be coupled with a short-range-repulsion-interaction, commonly a WCA-interaction.

To be able to compare our results to the analytical solutions for ideal systems and to obtain results very quickly, we begin with all non-bonded interactions turned off. In the next runs, we will add the steric repulsion and electrostatic interactions to observe their effect on the ionization.

Because it is an interactive tutorial, we want it to be completed within few minutes. Therefore, we choose the Debye-Hückel potential for electrostatics by default. It produces results that are similar to the full electrostatics calculated by the P3M method but we gain about a factor of 5 in the simulation runtime. In this way, we sacrifice accuracy for speed. Such thing is definitely not recommended for production simulations.

In [8]:
# Simulate an interacting system with steric repulsion (Warning: it will be slower than without WCA!)
USE_WCA = True
# Simulate an interacting system with electrostatics 
USE_ELECTROSTATICS = False
# By default, we use the Debye-Huckel potential because it allows short runtime of the tutorial
# You can also use the P3M method for full electrostatics (Warning: it will be much slower than DH!)
USE_P3M = False

if USE_ELECTROSTATICS:
    assert USE_WCA, "Use electrostatics only with a short-range repulsive potential. Otherwise, singularity occurs at zero distance."

Set the number of samples to be collected

For error analysis we specify the number of blocks N_BLOCKS and the desired number of samples per block DESIRED_BLOCK_SIZE. From that we can estimate the total number of statistically independent samples that we want to collect NUM_SAMPLES. Note that samples collected in our simulations are not really statistically independent, so the actual number of independent samples is always lower than NUM_SAMPLES. Therefore, the actual number of independent samples depends on the properties of the simulated system and needs to be estimated at the end of the simulation. We keep the number of samples rather low, so that the tutorial completes fast. If you want to obtain data with a better statistical quality, we encourage you to increase the number of samples.

In [9]:
N_BLOCKS = 16  # number of block to be used in data analysis
DESIRED_BLOCK_SIZE = 10  # desired number of samples per block

PROB_INTEGRATION = 0.6 # Probability of running MD integration after the reaction move. 
# This parameter changes the speed of convergence but not the limiting result
# to which the simulation converges

# number of reaction samples per each pH value
NUM_SAMPLES = int(N_BLOCKS * DESIRED_BLOCK_SIZE)

Calculate the dependent parameters

After setting all system parameters that can be chosen independently, we calculate additional parameters that are uniquely defined by the choices above and will be used as direct inputs for ESPResSo.

From both the specified concentration and number of titratable units we can calculate the box volume BOX_V. With our choice of a cubic simulation box we can subsequently determine the box length BOX_L. Once the box volume is fixed, the chosen salt concentration and the box volume determine the number of additional salt ion pairs N_SALT that should be present in the system.

We also calculate the Debye length using the semi-empirical formula from equation 14.36 in chapter 14 of [Israelachvili2011]: $\lambda_{\mathrm{D}} [\mathrm{nm}] = \kappa^{-1} [\mathrm{nm}] = 0.304/\sqrt{I(\mathrm{[M]})} $

In [10]:
BOX_V = (N_ACID / (ureg.avogadro_constant * C_ACID)).to("nm^3")
BOX_L = np.cbrt(BOX_V)
BOX_L_REDUCED = BOX_L.to('reduced_length').magnitude

KAPPA = np.sqrt(C_SALT.to('mol/L').magnitude) / 0.304 / ureg.nm
KAPPA_REDUCED = KAPPA.to('1/reduced_length').magnitude
print(f"KAPPA = {KAPPA:.3f}")
print(f"KAPPA_REDUCED = {KAPPA_REDUCED:.3f}")
print(f"Debye_length: {1. / KAPPA:.2f} = {(1. / KAPPA).to('reduced_length'):.2f}")

N_SALT = int((C_SALT * BOX_V * ureg.avogadro_constant))
print("Calculated values:")
print(f"BOX_L = {BOX_L:.3g} = {BOX_L.to('reduced_length'):.3g}")
print(f"BOX_V  = {BOX_V:.3g} = {BOX_V.to('reduced_length^3'):.3g}")
print(f"N_SALT = {N_SALT}")
KAPPA = 0.147 / nanometer
KAPPA_REDUCED = 0.052
Debye_length: 6.80 nanometer = 19.15 reduced_length
Calculated values:
BOX_L = 32.1 nanometer = 90.5 reduced_length
BOX_V  = 3.32e+04 nanometer ** 3 = 7.42e+05 reduced_length ** 3
N_SALT = 40

Set the particle types and charges

Finally we have to set the particle types we want to simulate and their mapping to ESPResSo-particle-types, as well as particle charges.

In [11]:
# particle types of different species
TYPES = {
    "HA": 0,
    "A": 1,
    "B": 2,
    "Na": 3,
    "Cl": 4,
}
# particle charges of different species
CHARGES = {
    "HA": (0 * ureg.e).to("reduced_charge").magnitude,
    "A": (-1 * ureg.e).to("reduced_charge").magnitude,
    "B": (+1 * ureg.e).to("reduced_charge").magnitude,
    "Na": (+1 * ureg.e).to("reduced_charge").magnitude,
    "Cl": (-1 * ureg.e).to("reduced_charge").magnitude,
}

Initialize the ESPResSo system

When initializing the ESPResSo system, some required parameters must be set.

  • box_l defines the edge length of the cubic simulation box
  • time_step defines the integration step of the MD integration.
  • skin defines the properties of the cell system that do not affect the results but may affect the speed of the calculation, see user guide for more details.
  • seed defines the seed of the random number generator used by the ESPResSo system
In [12]:
system = espressomd.System(box_l=[BOX_L_REDUCED] * 3)
system.time_step = 0.01
system.cell_system.skin = 2.0
np.random.seed(seed=10)  # initialize the random number generator in numpy

Set up particles and bonded-interactions

After defining the simulation parameters, we set up the system that we want to simulate. It is a polyelectrolyte chain with some added salt that is used to control the ionic strength of the solution.

First we define the parameters of bonds connecting the polymer particles and add the bonded interaction type to the system. Then we create the particles. Bonded particle positions of a linear polymer can be created via the espressomd.polymer.linear_polymer_positions, for more details see corresponding section in the documentation. Finally we add the $\mathrm{B}^+$-ions to the system, followed by adding the salt-ion pairs to the system.

In [13]:
# we need to define bonds before creating polymers
hb = HarmonicBond(k=30, r_0=1.0)
system.bonded_inter.add(hb)

# create the polymer positions
polymers = espressomd.polymer.linear_polymer_positions(n_polymers=1,
                                                       beads_per_chain=N_ACID,
                                                       bond_length=0.9, 
                                                       seed=23)

# add the polymer particles composed of ionizable acid groups, initially in the ionized state
for polymer in polymers:
    prev_particle = None
    for position in polymer:
        p = system.part.add(pos=position, type=TYPES["A"], q=CHARGES["A"])
        if prev_particle:
            p.add_bond((hb, prev_particle))
        prev_particle = p

# add the corresponding number of H+ ions
system.part.add(pos=np.random.random((N_ACID, 3)) * BOX_L_REDUCED,
                type=[TYPES["B"]] * N_ACID,
                q=[CHARGES["B"]] * N_ACID)

# add salt ion pairs
system.part.add(pos=np.random.random((N_SALT, 3)) * BOX_L_REDUCED,
                type=[TYPES["Na"]] * N_SALT,
                q=[CHARGES["Na"]] * N_SALT)
system.part.add(pos=np.random.random((N_SALT, 3)) * BOX_L_REDUCED,
                type=[TYPES["Cl"]] * N_SALT,
                q=[CHARGES["Cl"]] * N_SALT)

print(f"The system contains {len(system.part)} particles")
The system contains 120 particles

Set up non-bonded-interactions

If the WCA interaction has been enabled via the USE_WCA flag, we activate it for each possible pair of particle types in the system. Afterwards we relax the initial overlaps using the steepest-descent integrator. Then we add the Langevin thermostat to the system and let the system relax further by running 1000 integration steps of the Langevin dynamics.

After the initial relaxation we set the electrostatic interactions between the particles if it has been enabled via the USE_ELECTROSTATICS flag. For electrostatics can use either the Debye-Hückel DH algorithm or the P3M algorithm. The DH algorithm is based on the Debye-Hückel approximation, the assumptions of which are not satisfied in our simulated system. However, it runs much faster than the P3M algorithm, and the approximate result closely resembles the correct one. Therefore, the DH algorithm should not be used in production simulations. By using the DH algorithm in this tutorial, we sacrifice the accuracy for speed. To obtain an accurate result, we can use the P3M algorithm using accuracy of $10^{-3}$ as an acceptable tradeoff between accuracy and performance. For production runs it might be necessary to use a lower value of accuracy, depending on the simulated system.

By default, ESPResSo uses the regular decomposition cell system to speed up the calculation of short-range interactions. However, for a system with small number of particles and without electrostatics, it runs faster with n_square cell system. See the user guide for additional details on the cell systems.

In [14]:
if USE_WCA:
    # set the WCA interaction between all particle pairs
    for type_1 in TYPES.values():
        for type_2 in TYPES.values():
            if type_1 >= type_2:
                system.non_bonded_inter[type_1, type_2].wca.set_params(epsilon=1.0, sigma=1.0)

    # relax the overlaps with steepest descent
    system.integrator.set_steepest_descent(f_max=0, gamma=0.1, max_displacement=0.1)
    system.integrator.run(20)
    system.integrator.set_vv()  # to switch back to velocity Verlet

# add thermostat and short integration to let the system relax further
system.thermostat.set_langevin(kT=KT_REDUCED, gamma=1.0, seed=7)
system.integrator.run(steps=1000)

if USE_ELECTROSTATICS:
    COULOMB_PREFACTOR=BJERRUM_LENGTH_REDUCED * KT_REDUCED
    if USE_P3M:
        coulomb = espressomd.electrostatics.P3M(prefactor = COULOMB_PREFACTOR, 
                                                accuracy=1e-3)
    else:
        coulomb = espressomd.electrostatics.DH(prefactor = COULOMB_PREFACTOR, 
                                               kappa = KAPPA_REDUCED, 
                                               r_cut = 1. / KAPPA_REDUCED)
        
    system.actors.add(coulomb)
else:
    # this speeds up the simulation of dilute systems with small particle numbers
    system.cell_system.set_n_square()

Set up the constant pH ensemble using the reaction ensemble module

After the particles have been added to the system we initialize the espressomd.reaction_methods. The parameters to set are:

  • kT specifies the $k_\mathrm{B}T$ value which is used as the inverse-temperature in the Boltzmann-factor to calculate the probabilities for the insertion.
  • exclusion_range specifies the distance around particles already existing in the system, where new particles will not be inserted. This ensures that there are no big overlaps with the newly inserted particles, so that MD integration does not crash. The value of the exclusion range does not affect the limiting result, however, it affects the convergence and the stability of the integration. For interacting systems, exclusion range should be comparable to the particle size. For non-interacting systems, we can set the exclusion range to $0.0$.

  • seed for the random number generator (RNG) is an arbitrary number that defines the starting point in the RNG sequence. Using the same seed should yield reproducible simualtion results if the same RNG implementation is used. When running multiple copies of the same simulation, each of them should use a different seed.

Exercise:

In [15]:
exclusion_range = PARTICLE_SIZE_REDUCED if USE_WCA else 0.0
RE = espressomd.reaction_methods.ConstantpHEnsemble(
    kT=KT_REDUCED,
    exclusion_range=exclusion_range,
    seed=77,
    constant_pH=2  # temporary value
)
RE.set_non_interacting_type(type=len(TYPES)) # this parameter helps speed up the calculation in an interacting system

The next step is to define the chemical reaction. The order of species in the lists of reactants and products is very important for ESPResSo because it determines which particles are created or deleted in the reaction move. Specifically, identity of the first species in the list of reactants is changed to the first species in the list of products, the second reactant species is changed to the second product species, and so on. If the reactant list has more species than the product list, then excess reactant species are deleted from the system. If the product list has more species than the reactant list, then the excess product species are created and randomly placed inside the simulation box. This convention is especially important if some of the species belong to a chain-like molecule, so that they cannot be inserted at an arbitrary position.

Exercise:

Hint: Make sure to place TYPES["HA"] and TYPES["A"] as first elements in the reactant_types and product_types lists respectively

In [16]:
RE.add_reaction(
    gamma=10**(-pKa),
    reactant_types=[TYPES["HA"]],
    product_types=[TYPES["A"], TYPES["B"]],
    default_charges={TYPES["HA"]: CHARGES["HA"],
                     TYPES["A"]: CHARGES["A"],
                     TYPES["B"]: CHARGES["B"]}
)

In the example above, the order of reactants and products ensures that identity of $\mathrm{HA}$ is changed to $\mathrm{A^{-}}$ and vice versa, while $\mathrm{B^{+}}$ is inserted/deleted in the reaction move.

If we had used the reverse the order of products in our reaction (i.e. from product_types=[TYPES["A"], TYPES["B"]] to product_types=[TYPES["B"], TYPES["A"]]), then the identity $\mathrm{HA}$ would be changed to $\mathrm{B^{+}}$, while $\mathrm{A^{-}}$ would be inserted/deleted at a random position in the box. Therefore $\mathrm{B^{+}}$ would become a part of the polymer chain whereas $\mathrm{A^{-}}$ would become a free ion.

Run the simulations

Finally, we can perform simulations at different $\mathrm{pH}$ values. To do so in a single loop, we need to equilibrate the simulated system each time when we change the $\mathrm{pH}$, before collecting the samples.

Exercise:

  • Write a function called equilibrate_reaction() that performs the equilibration of the $\mathrm{pH}$ value by performing reaction moves in the system by calling RE.reaction().

Hint: To ensure sufficient equilibration of the reaction, one should use a number of steps, reaction_steps, that is greater than the number of titratable acid groups in the system, N_ACID.

In [17]:
def equilibrate_reaction(reaction_steps=1):
    RE.reaction(reaction_steps=reaction_steps)

After the system has been equilibrated, the integration/sampling loop follows.

Exercise:

  • Write a function called perform_sampling() that implements the sampling loop

  • The function should take the following parameters:

    • an integer type_A that contains the particle type that should be followed when sampling the composition
    • an integer num_samples that determines the number of sampling steps to be performed
    • a numpy array num_As, such that len(num_As) == num_samples which will be used to store the time series of the instantaneous values of $N_{\mathrm{A^{-}}}$
    • a float reaction_steps that determines the number of reactions to be performed
  • The function should perform the following tasks:
    • sample the composition fluctuations using reaction moves of the constant $\mathrm{pH}$ algorithm by calling RE.reaction()
    • sample the configuration space using the Langevin dynamics (LD) integration
    • perform the reaction moves at each sampling step
    • perform the LD integration in each sampling step with the probability PROB_INTEGRATION, and only if the interactions have been activated
    • after each sampling step, save the current number of particles of type $\mathrm{A^-}$ in the array num_As for later analysis

Hint:

  • the value of reaction_steps in each sampling step should be comparable to the number of titratable groups (N_ACID) in the system
  • the number of LD integration steps should be at least 1000
  • the number of particles of a certain type can be obtained via the function espressomd.system.System.number_of_particles()
In [18]:
def perform_sampling(type_A, num_samples, num_As:np.ndarray, reaction_steps, 
                     prob_integration=0.5, integration_steps=1000):
    for i in range(num_samples):
        if USE_WCA and np.random.random() < prob_integration:
            system.integrator.run(integration_steps)
        # we should do at least one reaction attempt per reactive particle
        RE.reaction(reaction_steps=reaction_steps)        
        num_As[i] = system.number_of_particles(type=type_A)

Finally we have everything together to run our simulations. We set the $\mathrm{pH}$ value in RE.constant_pH and use our equilibrate_reaction function to equilibrate the system. After that the samplings are performed with our perform_sampling function.

In [19]:
# empty numpy array as a placeholder for collecting the data
num_As_at_each_pH = -np.ones((len(pHs), NUM_SAMPLES))  # number of A- species observed at each sample

# run a productive simulation and collect the data
print(f"Simulated pH values: {pHs}")
for ipH, pH in enumerate(pHs):
    print(f"Run pH {pH:.2f} ...")
    RE.constant_pH = pH  # set new pH value
    start_time = time.time()
    equilibrate_reaction(reaction_steps=N_ACID + 1)  # pre-equilibrate the reaction to the new pH value
    perform_sampling(type_A=TYPES["A"],
                     num_samples=NUM_SAMPLES, 
                     num_As=num_As_at_each_pH[ipH, :],
                     reaction_steps=N_ACID + 1,
                     prob_integration=PROB_INTEGRATION)  # perform sampling / run production simulation
    runtime = (time.time() - start_time) * ureg.s
    print(f"runtime: {runtime:.2g};",
          f"measured number of A-: {np.mean(num_As_at_each_pH[ipH]):.2f}", 
          f"(ideal: {N_ACID * ideal_alpha(pH, pKa):.2f})", 
          )
print("\nDONE")
Simulated pH values: [3.88       4.16571429 4.45142857 4.73714286 5.02285714 5.30857143
 5.59428571 5.88       6.16571429 6.45142857 6.73714286 7.02285714
 7.30857143 7.59428571 7.88      ]
Run pH 3.88 ...
runtime: 3 second; measured number of A-: 1.61 (ideal: 1.82)
Run pH 4.17 ...
runtime: 3.3 second; measured number of A-: 3.26 (ideal: 3.24)
Run pH 4.45 ...
runtime: 3.3 second; measured number of A-: 5.21 (ideal: 5.43)
Run pH 4.74 ...
runtime: 3.3 second; measured number of A-: 8.35 (ideal: 8.37)
Run pH 5.02 ...
runtime: 3.9 second; measured number of A-: 11.65 (ideal: 11.63)
Run pH 5.31 ...
runtime: 3.8 second; measured number of A-: 14.67 (ideal: 14.57)
Run pH 5.59 ...
runtime: 3.9 second; measured number of A-: 16.84 (ideal: 16.76)
Run pH 5.88 ...
runtime: 3.9 second; measured number of A-: 18.14 (ideal: 18.18)
Run pH 6.17 ...
runtime: 3.6 second; measured number of A-: 18.94 (ideal: 19.02)
Run pH 6.45 ...
runtime: 3.3 second; measured number of A-: 19.42 (ideal: 19.48)
Run pH 6.74 ...
runtime: 3.2 second; measured number of A-: 19.78 (ideal: 19.73)
Run pH 7.02 ...
runtime: 3.2 second; measured number of A-: 19.86 (ideal: 19.86)
Run pH 7.31 ...
runtime: 3.1 second; measured number of A-: 19.93 (ideal: 19.93)
Run pH 7.59 ...
runtime: 2.8 second; measured number of A-: 19.96 (ideal: 19.96)
Run pH 7.88 ...
runtime: 2.8 second; measured number of A-: 19.98 (ideal: 19.98)

DONE

Results

Now we plot our results and compare them to the analytical results obtained from the Henderson-Hasselbalch equation. Since our simulation is a stochastic process, its outputs are random numbers that may change with each run. However, in the limit of infinite simulation time, they converge to one specific value, that is the ensemble average. For a meaningful comparison of simulation results with the theory, it is necessary to supply the estimates of the ensemble averages as well as the estimates of their statistical error. Differences between the simulation and theory can be considered significant only if they are greater than the estimated statistical error.

Statistical analysis of the data

The molecular simulation produces a time series of the observables, that constitute a Markov chain. It is a sequence of realizations of a random process, where the next value in the sequence depends on the preceding one. Therefore, the subsequent values are correlated. To estimate statistical error of the averages determined in the simulation, one needs to correct for the correlations.

Here, we will use a rudimentary way of correcting for correlations, termed the binning method. We refer the reader to specialized literature for a more sophisticated discussion, for example [Janke2002]. The general idea is to group a long sequence of correlated values into a rather small number of blocks, and compute an average per each block. If the blocks are big enough, they can be considered uncorrelated, and one can apply the formula for standard error of the mean of uncorrelated values. If the number of blocks is small, then they are uncorrelated but the obtained error estimate has a high uncertainty. If the number of blocks is high, then they are too short to be uncorrelated, and the obtained error estimates are systematically lower than the correct value. Therefore, the method works well only if the sample size is much greater than the autocorrelation time, so that it can be divided into a sufficient number of mutually uncorrelated blocks.

In [20]:
# statistical analysis of the results
def block_analyze(input_data, n_blocks=16):
    data = np.asarray(input_data)
    block = 0
    # this number of blocks is recommended by Janke as a reasonable compromise
    # between the conflicting requirements on block size and number of blocks
    block_size = int(data.shape[1] / n_blocks)
    print(f"block_size: {block_size}")
    # initialize the array of per-block averages
    block_average = np.zeros((n_blocks, data.shape[0]))
    # calculate averages per each block
    for block in range(n_blocks):
        block_average[block] = np.average(data[:, block * block_size: (block + 1) * block_size], axis=1)
    # calculate the average and average of the square
    av_data = np.average(data, axis=1)
    av2_data = np.average(data * data, axis=1)
    # calculate the variance of the block averages
    block_var = np.var(block_average, axis=0)
    # calculate standard error of the mean
    err_data = np.sqrt(block_var / (n_blocks - 1))
    # estimate autocorrelation time using the formula given by Janke
    # this assumes that the errors have been correctly estimated
    tau_data = np.zeros(av_data.shape)
    for val in range(av_data.shape[0]):
        if av_data[val] == 0:
            # unphysical value marks a failure to compute tau
            tau_data[val] = -1.0
        else:
            tau_data[val] = 0.5 * block_size * n_blocks / (n_blocks - 1) * block_var[val] \
                / (av2_data[val] - av_data[val] * av_data[val])
    return av_data, err_data, tau_data, block_size

Now, we use the above function to calculate the average number of particles of type $\mathrm{A^-}$ and estimate its statistical error and autocorrelation time. Then we use these values to calculate the degree of ionization $\alpha$ by dividing the number of particles of type $\mathrm{A^-}$ by the number of titratable units N_ACID. Finally, we plot the degree of ionization $\alpha$ as a function of the $\mathrm{pH}$.

In [21]:
# estimate the statistical error and the autocorrelation time using the formula given by Janke
av_num_As, err_num_As, tau, block_size = block_analyze(num_As_at_each_pH, N_BLOCKS)
print(f"av = {av_num_As}")
print(f"err = {err_num_As}")
print(f"tau = {tau}")

# calculate the average ionization degree
av_alpha = av_num_As / N_ACID
err_alpha = err_num_As / N_ACID

# plot the simulation results compared with the ideal titration curve
plt.figure(figsize=(10, 6), dpi=80)
plt.errorbar(pHs - pKa, av_alpha, err_alpha, marker='o', linestyle='none',
             label=r"simulation")
pHs2 = np.linspace(pHmin, pHmax, num=50)
plt.plot(pHs2 - pKa, ideal_alpha(pHs2, pKa), label=r"Henderson-Hasselbalch")
plt.xlabel(r'$\mathrm{pH} - \mathrm{p}K_{\mathrm{A}}$', fontsize=16)
plt.ylabel(r'$\alpha$', fontsize=16)
plt.legend(fontsize=16)
plt.show()
block_size: 10
av = [ 1.6125   3.2625   5.20625  8.35    11.65    14.66875 16.8375  18.1375
 18.9375  19.41875 19.78125 19.8625  19.925   19.9625  19.975  ]
err = [0.0625     0.10363196 0.09893969 0.17199806 0.09036961 0.14222012
 0.12209115 0.10522793 0.06884463 0.04762593 0.0367636  0.02868652
 0.01707825 0.01547848 0.01936492]
tau = [0.21369804 0.38294217 0.23722356 0.46841498 0.14390602 0.45625668
 0.47726846 0.4015393  0.38549113 0.34179482 0.51883786 0.45846935
 0.33633634 0.53102453 0.48484848]

The simulation results for the non-interacting case match very well with the analytical solution of Henderson-Hasselbalch equation. There are only minor deviations, and the estimated errors are small too. This situation will change when we introduce interactions.

It is useful to check whether the estimated errors are consistent with the assumptions that were used to obtain them. To do this, we follow [Janke2002] to estimate the number of uncorrelated samples per block, and check whether each block contains a sufficient number of uncorrelated samples (we choose 10 uncorrelated samples per block as the threshold value).

Intentionally, we made our simulation slightly too short, so that it does not produce enough uncorrelated samples. We encourage the reader to vary the number of blocks or the number of samples to see how the estimated error changes with these parameters.

In [22]:
# check if the blocks contain enough data for reliable error estimates
print(f"uncorrelated samples per block:\nblock_size/tau = {block_size / tau}")
threshold = 10  # block size should be much greater than the correlation time
if np.any(block_size / tau < threshold):
    print(f"\nWarning: some blocks may contain less than {threshold} uncorrelated samples."
          "\nYour error estimated may be unreliable."
          "\nPlease, check them using a more sophisticated method or run a longer simulation.")
    print(f"? block_size/tau > threshold ? : {block_size / tau > threshold}")
else:
    print(f"\nAll blocks seem to contain more than {threshold} uncorrelated samples."
          "Error estimates should be OK.")
uncorrelated samples per block:
block_size/tau = [46.795      26.1136033  42.15432961 21.34859155 69.48979592 21.91748745
 20.95256813 24.90416275 25.94093407 29.25731917 19.27384393 21.81170886
 29.73214286 18.83152174 20.625     ]

All blocks seem to contain more than 10 uncorrelated samples.Error estimates should be OK.

To look in more detail at the statistical accuracy, it is useful to plot the deviations from the analytical result. This provides another way to check the consistency of error estimates. In the case of non-interacting system, the simulation should exactly reproduce the Henderson-Hasselbalch equation. In such case, about 68% of the results should be within one error bar from the analytical result, whereas about 95% of the results should be within two times the error bar. Indeed, if you plot the deviations by running the script below, you should observe that most of the results are within one error bar from the analytical solution, a smaller fraction of the results is slightly further than one error bar, and one or two might be about two error bars apart. Again, this situation changes when we activate interactions because the ionization of the interacting system deviates from the Henderson-Hasselbalch equation.

In [23]:
# plot the deviations from the ideal result
plt.figure(figsize=(10, 6), dpi=80)
ylim = np.amax(abs(av_alpha - ideal_alpha(pHs, pKa)))
plt.ylim((-1.5 * ylim, 1.5 * ylim))
plt.errorbar(pHs - pKa, av_alpha - ideal_alpha(pHs, pKa),
             err_alpha, marker='o', linestyle='none', label=r"simulation")
plt.plot(pHs - pKa, 0.0 * ideal_alpha(pHs, pKa), label=r"Henderson-Hasselbalch")
plt.xlabel(r'$\mathrm{pH} - \mathrm{p}K_{\mathrm{A}}$', fontsize=16)
plt.ylabel(r'$\alpha - \alpha_{ideal}$', fontsize=16)
plt.legend(fontsize=16)
plt.show()

The Neutralizing Ion $\mathrm{B^+}$

Up to now we did not discuss the chemical nature the neutralizer $\mathrm{B^+}$. It is not obvious how the chemical nature of the $\mathrm{B^+}$ ion should be interpreted in a coarse-grained model, where water molecules and $\mathrm{H^+}$ ions are not represented explicitly. Now, we address this issue using some specific examples.

In the simplest case, if we add an acidic polymer to pure water that has $\mathrm{pH} = 7$, some of the acid groups dissociate and release $\mathrm{H^+}$ ions into the solution. The $\mathrm{pH}$ decreases to a value that depends on $\mathrm{p}K_{\mathrm{A}}$ and on the concentration of ionizable groups. Now, three ionic species are present in the solution: $\mathrm{H^+}$, $\mathrm{A^-}$, and $\mathrm{OH^-}$.
In this case the $\mathrm{B^+}$ ions generated in our implementation of the reaction correspond to the $\mathrm{H^+}$ ions. The $\mathrm{H^+}$ act as the counter-ions, neutralizing both the $\mathrm{A^-}$ and the $\mathrm{OH^-}$ ions. At acidic $\mathrm{pH}$ there are only very few $\mathrm{OH^-}$ ions and nearly all $\mathrm{H^+}$ ions act as neutralizer for the $\mathrm{A^-}$ ions. Therefore, the concentration of $\mathrm{B^+}$ is very close to the concentration of $\mathrm{H^+}$ in the real aqueous solution. Only very few $\mathrm{OH^-}$ ions, and the $\mathrm{H^+}$ ions needed to neutralize them, are missing in the simulation box, when compared to the real solution.

To achieve a more acidic $\mathrm{pH}$ (with the same $\mathrm{p}K_{\mathrm{A}}$ and polymer concentration), we need to add an acid to the system. We can do that by adding a strong acid, such as $\mathrm{HCl}$ or $\mathrm{HNO}_3$. We will denote this acid by a generic name $\mathrm{HX}$ to emphasize that in general its anion can be different from the salt anion $\mathrm{Cl^{-}}$. Now, there are 4 ionic species in the solution: $\mathrm{H^+}$, $\mathrm{A^-}$, $\mathrm{OH^-}$, and $\mathrm{X^-}$ ions. By the same argument as before, the $\mathrm{B^+}$ ions again correspond to the $\mathrm{H^+}$ ions. The $\mathrm{H^+}$ ions neutralize the $\mathrm{A^-}$, $\mathrm{OH^-}$, and the $\mathrm{X^-}$ ions. Because the concentration of $\mathrm{X^-}$ is not negligible, the concentration of $\mathrm{B^+}$ in the simulation box differs from the $\mathrm{H^+}$ concentration in the real solution. Now, many more ions are missing in the simulation box, as compared to the real solution: Few $\mathrm{OH^-}$ ions, many $\mathrm{X^-}$ ions, and all the $\mathrm{H^+}$ ions that neutralize them.

To achieve a neutral $\mathrm{pH}$ we need to add some base to the system to neutralize the polymer. In the simplest case we add an alkali metal hydroxide, such as $\mathrm{NaOH}$ or $\mathrm{KOH}$, that we will generically denote as $\mathrm{MOH}$. Now, there are 4 ionic species in the solution: $\mathrm{H^+}$, $\mathrm{A^-}$, $\mathrm{OH^-}$, and $\mathrm{M^+}$. In such situation, we can not clearly attribute a specific chemical identity to the $\mathrm{B^+}$ ions. However, only very few $\mathrm{H^+}$ and $\mathrm{OH^-}$ ions are present in the system at $\mathrm{pH} = 7$. Therefore, we can make the approximation that at this $\mathrm{pH}$, all $\mathrm{A^-}$ are neutralized by the $\mathrm{M^+}$ ions, and the $\mathrm{B^+}$ correspond to $\mathrm{M^+}$. Then, the concentration of $\mathrm{B^+}$ also corresponds to the concentration of $\mathrm{M^+}$ ions. Now, again only few ions are missing in the simulation box, as compared to the real solution: Few $\mathrm{OH^-}$ ions, and few $\mathrm{H^+}$ ions.

To achieve a basic $\mathrm{pH}$ we need to add even more base to the system to neutralize the polymer. Again, there are 4 ionic species in the solution: $\mathrm{H^+}$, $\mathrm{A^-}$, $\mathrm{OH^-}$, and $\mathrm{M^+}$ and we can not clearly attribute a specific chemical identity to the $\mathrm{B^+}$ ions. Because only very few $\mathrm{H^+}$ ions should be present in the solution, we can make the approximation that at this $\mathrm{pH}$, all $\mathrm{A^-}$ ions are neutralized by the $\mathrm{M^+}$ ions, and therefore $\mathrm{B^+}$ ions in the simulation correspond to $\mathrm{M^+}$ ions in the real solution. Because additional $\mathrm{M^+}$ ions in the real solution neutralize the $\mathrm{OH^-}$ ions, the concentration of $\mathrm{B^+}$ does not correspond to the concentration of $\mathrm{M^+}$ ions. Now, again many ions are missing in the simulation box, as compared to the real solution: Few $\mathrm{H^+}$ ions, many $\mathrm{OH^-}$ ions, and a comparable amount of the $\mathrm{M^+}$ ions.

To further illustrate this subject, we compare the concentration of the neutralizer ion $\mathrm{B^+}$ calculated in the simulation with the expected number of ions of each species. At a given $\mathrm{pH}$ and $\mathrm{p}K_{\mathrm{A}}$ we can calculate the expected degree of ionization from the Henderson-Hasselbalch equation. Then we apply the electroneutrality condition $$c_\mathrm{A^-} + c_\mathrm{OH^-} + c_\mathrm{X^-} = c_\mathrm{H^+} + c_\mathrm{M^+}$$ where we use either $c_\mathrm{X^-}=0$ or $c_\mathrm{M^+}=0$ because we always only add extra acid or base, but never both. Adding both would be equivalent to adding extra salt $\mathrm{MX}$. We obtain the concentrations of $\mathrm{OH^-}$ and $\mathrm{H^+}$ from the input $\mathrm{pH}$ value, and substitute them to the electroneutrality equation to obtain $$\alpha c_\mathrm{acid} + 10^{-(\mathrm{p}K_{\mathrm{w}} - \mathrm{pH})} + 10^{-\mathrm{pH}} = c_\mathrm{M^+} - c_\mathrm{X^-}$$ Depending on whether the left-hand side of this equation is positive or negative we know whether we should add $\mathrm{M^+}$ or $\mathrm{X^-}$ ions.

In [24]:
# average concentration of B+ is the same as the concentration of A-
av_c_Bplus = av_alpha * C_ACID
err_c_Bplus = err_alpha * C_ACID  # error in the average concentration

full_pH_range = np.linspace(2, 12, 100)
ideal_c_Aminus = ideal_alpha(full_pH_range, pKa) * C_ACID
ideal_c_OH = np.power(10.0, -(pKw - full_pH_range))*ureg.molar
ideal_c_H = np.power(10.0, -full_pH_range)*ureg.molar
# ideal_c_M is calculated from electroneutrality
ideal_c_M = (ideal_c_Aminus + ideal_c_OH - ideal_c_H)

# plot the simulation results compared with the ideal results of the cations
plt.figure(figsize=(10, 6), dpi=80)
plt.errorbar(pHs,
             av_c_Bplus.magnitude,
             err_c_Bplus.magnitude,
             marker='o', c="tab:blue", linestyle='none',
             label=r"measured $c_{\mathrm{B^+}}$", zorder=2)
plt.plot(full_pH_range, 
         ideal_c_H.magnitude, 
         c="tab:green",
         label=r"ideal $c_{\mathrm{H^+}}$", 
         zorder=0)
plt.plot(full_pH_range[np.nonzero(ideal_c_M.magnitude > 0.)], 
         ideal_c_M.magnitude[np.nonzero(ideal_c_M.magnitude > 0.)], 
         c="tab:orange",
         label=r"ideal $c_{\mathrm{M^+}}$", 
         zorder=0)
plt.plot(full_pH_range, 
         ideal_c_Aminus.magnitude, 
         c="tab:blue", 
         ls=(0, (5, 5)),
         label=r"ideal $c_{\mathrm{A^-}}$", 
         zorder=1)
plt.yscale("log")
plt.ylim(5e-6,1e-2)
plt.xlabel('input pH', fontsize=16)
plt.ylabel(r'concentration $c$ $[\mathrm{mol/L}]$', fontsize=16)
plt.legend(fontsize=16)
plt.show()
/usr/lib/python3/dist-packages/pint/quantity.py:221: BehaviorChangeWarning: The way Pint handles NumPy operations has changed with the
implementation of NEP 18. Unimplemented NumPy operations will now fail instead of making
assumptions about units. Some functions, eg concat, will now return Quanties with units, where
they returned ndarrays previously. See https://github.com/hgrecco/pint/pull/905.

To hide this warning, wrap your first creation of an array Quantity with
warnings.catch_warnings(), like the following:

import numpy as np
import warnings
from pint import Quantity

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    Quantity([])

To disable the new behavior, see
https://www.numpy.org/neps/nep-0018-array-function-protocol.html#implementation

  warnings.warn(array_function_change_msg, BehaviorChangeWarning)

The plot shows that at intermediate $\mathrm{pH}$ the concentration of $\mathrm{B^+}$ ions is approximately equal to the concentration of $\mathrm{M^+}$ ions. Only at one specific $\mathrm{pH}$ the concentration of $\mathrm{B^+}$ ions is equal to the concentration of $\mathrm{H^+}$ ions. This is the $\mathrm{pH}$ one obtains when dissolving the weak acid $\mathrm{A}$ in pure water.

In an ideal system, the ions missing in the simulation have no effect on the ionization degree. In an interacting system, the presence of ions in the box affects the properties of other parts of the system. Therefore, in an interacting system this discrepancy is harmless only at intermediate $\mathrm{pH}$. The effect of the small ions on the rest of the system can be estimated from the overall the ionic strength. $$ I = \frac{1}{2}\sum_i c_i z_i^2 $$

In [25]:
ideal_c_X = -(ideal_c_Aminus + ideal_c_OH - ideal_c_H)

ideal_ionic_strength = 0.5 * \
    (ideal_c_X + ideal_c_M + ideal_c_H + ideal_c_OH + 2 * C_SALT)
# in constant-pH simulation ideal_c_Aminus = ideal_c_Bplus
cpH_ionic_strength = 0.5 * (ideal_c_Aminus + 2 * C_SALT)
cpH_ionic_strength_measured = 0.5 * (av_c_Bplus + 2 * C_SALT)
cpH_error_ionic_strength_measured = 0.5 * err_c_Bplus

plt.figure(figsize=(10, 6), dpi=80)
plt.errorbar(pHs,
             cpH_ionic_strength_measured.magnitude,
             cpH_error_ionic_strength_measured.magnitude,
             c="tab:blue",
             linestyle='none', marker='o',
             label=r"measured", zorder=3)
plt.plot(full_pH_range,
         cpH_ionic_strength.magnitude,
         c="tab:blue",
         ls=(0, (5, 5)),
         label=r"constant-pH", zorder=2)
plt.plot(full_pH_range,
         ideal_ionic_strength.magnitude,
         c="tab:orange",
         linestyle='-',
         label=r"Henderson-Hasselbalch", zorder=1)

plt.ylim(1.8e-3,3e-3)
plt.xlabel('input pH', fontsize=16)
plt.ylabel(r'Ionic Strength [$\mathrm{mol/L}$]', fontsize=16)
plt.legend(fontsize=16)
plt.show()

We see that the ionic strength in the simulation box significantly deviates from the ionic strength of the real solution only at high or low $\mathrm{pH}$ value. If the $\mathrm{p}K_{\mathrm{A}}$ value is sufficiently large, then the deviation at very low $\mathrm{pH}$ can also be neglected because then the polymer is uncharged in the region where the ionic strength is not correctly represented in the constant-$\mathrm{pH}$ simulation. At a high $\mathrm{pH}$ the ionic strength will have an effect on the weak acid, because then it is fully charged. The $\mathrm{pH}$ range in which the constant-$\mathrm{pH}$ method uses approximately the right ionic strength depends on salt concentration, weak acid concentration and the $\mathrm{p}K_{\mathrm{A}}$ value. See also [Landsgesell2019] for a more detailed discussion of this issue, and its consequences.

Suggested problems for further work

  • Try changing the concentration of ionizable species in the non-interacting system. You should observe that it does not affect the obtained titration.

  • Try changing the number of samples and the number of particles to see how the estimated error and the number of uncorrelated samples will change. Be aware that if the number of uncorrelated samples is low, the error estimation is too optimistic.

  • Try running the same simulations with steric repulsion and then again with electrostatic interactions. Observe how the ionization equilibrium is affected by various interactions. Warning: simulations with electrostatics are much slower. If you want to obtain your results more quickly, then decrease the number of $\mathrm{pH}$ values.

References

[Janke2002] Janke W. Statistical Analysis of Simulations: Data Correlations and Error Estimation, In Quantum Simulations of Complex Many-Body Systems: From Theory to Algorithms, Lecture Notes, J. Grotendorst, D. Marx, A. Muramatsu (Eds.), John von Neumann Institute for Computing, Jülich, NIC Series, Vol. 10, ISBN 3-00-009057-6, pp. 423-445, 2002, link.
[Landsgesell2019] Landsgesell, J.; Nová, L.; Rud, O.; Uhlík, F.; Sean, D.; Hebbeker, P.; Holm, C.; Košovan, P. Simulations of Ionization Equilibria in Weak Polyelectrolyte Solutions and Gels. Soft Matter 2019, 15 (6), 1155–1185, doi:10.1039/C8SM02085J.
[Reed1992] Reed, C. E.; Reed, W. F. Monte Carlo Study of Titration of Linear Polyelectrolytes. The Journal of Chemical Physics 1992, 96 (2), 1609–1620, doi:10.1063/1.462145.
[Smith1994] Smith, W. R.; Triska, B. The Reaction Ensemble Method for the Computer Simulation of Chemical and Phase Equilibria. I. Theory and Basic Examples. The Journal of Chemical Physics 1994, 100 (4), 3019–3027, doi:10.1063/1.466443.
[Israelachvili2011] Israelachvili, J. N. Intermolecular and Surface Forces, Third Edition, 2011, Academic Press, ISBN 978-0-12-391927-4, doi:10.1016/C2011-0-05119-0.