To work with this tutorial, you should be familiar with the following topics:

- Setting up and running simulations in ESPResSo - creating particles, interactions, electrostatics
- Basic knowledge of chemistry, chemical reaction equilibria and specifically acid-base equilibria
- Thermodynamics, especially chemical potentials and activities
- Reduced units, as described in the ESPResSo user guide

This tutorial introduces the basic features for simulating titratable systems via the constant $\mathrm{pH}$ method. It is one of the methods implemented within the Reaction Ensemble module for simulating systems which involve chemical reaction equilibria. It is a Monte Carlo method designed to model an acid-base ionization reaction at a prescribed value of solution $\mathrm{pH}$.

To start with, consider a homogeneous aqueous solution of a titratable acidic species $\mathrm{HA}$ that can dissociate in a reaction $$\mathrm{HA} \Leftrightarrow \mathrm{A}^- + \mathrm{H}^+ ,$$ characterized by the equilibrium constant $\mathrm{p}K_{\mathrm{A}} = -\log_{10} K_{\mathrm{A}}$

The ionization degree $\alpha$ is defined as the fraction of ionized form of the acid $\mathrm{A}$, relative to its total amount, given by the sum of both, ionized and non-ionized form. By introducing $N_{\mathrm{A}} = N_{\mathrm{HA}} + N_{\mathrm{A}^-}$ as the total number of titratable groups in solution, we can express the degree of dissociation $\alpha$ as:

$$\alpha = \dfrac{ N_{\mathrm{A}^-} }{N_{\mathrm{HA}} + N_{\mathrm{A}^-}} = \dfrac{ N_{\mathrm{A}^-} }{ N_{\mathrm{A}} }.$$The ionization degree is one of the key quantities that can be used to describe the acid-base equilibrium. Usually, the goal of the simulation is to predict the value of $\alpha$ under given conditions in a complex system with interactions.

The equilibrium reaction constant describes the chemical equilibrium of a given reaction. The values of equilibrium constants for various reactions can be found in tables. The equilibrium constant of dissociation of a weak acid is conventionally called the acidity constant, and it is defined as $$ K_{\mathrm{A}} = \frac{a_{\mathrm{H}^+} a_{\mathrm{A}^-} } {a_{\mathrm{HA}}} $$ where $a_i$ is the activity of species $i$. The activity $a_i$ is related to the chemical potential $\mu_i$ and to the concentration $c_i$ $$ \mu_i = \mu_i^\mathrm{\ominus} + k_{\mathrm{B}}T \ln a_i \,,\qquad a_i = \frac{c_i \gamma_i}{c^{\ominus}}\,, $$ where $\gamma_i$ is the activity coefficient, and $c^{\ominus}$ is the (arbitrary) reference concentration, often chosen to be the standard concentration, $c^{\ominus} = 1\,\mathrm{mol/L}$, and $\mu_i^\mathrm{\ominus}$ is the chemical potential in the reference state. Note that $K_{\mathrm{A}}$ is a dimensionless quantity. Its numerical value depends on the choice of $c^{\ominus}$ but the thermodynamics is independent of this choice. The activity coefficient, $\gamma_i$, accounts for the effect of interactions on the chemical potential. Therefore, by definition, $\gamma_i=1$ for an ideal system, whereas for an interacting system $\gamma_i$ is a non-trivial function of the interactions. For an ideal system we can rewrite $K_{\mathrm{A}}$ in terms of equilibrium concentrations of the reacting species $$ K_{\mathrm{A}} \overset{\mathrm{ideal}}{=} \frac{c_{\mathrm{H}^+} c_{\mathrm{A}^-} } {c_{\mathrm{HA}} c^{\ominus}} $$

In the chemical literature, the ionization degree is usually expressed in terms of the concentrations $$ \alpha = \frac{N_{\mathrm{A}^-}}{N_{\mathrm{HA}} + N_{\mathrm{A}^-}} = \frac{c_{\mathrm{A}^-}}{c_{\mathrm{HA}}+c_{\mathrm{A}^-}} = \frac{c_{\mathrm{A}^-}}{c_{\mathrm{A}}} $$ where $c_{\mathrm{A}}=c_{\mathrm{HA}}+c_{\mathrm{A}^-}$ is the total (analytical) concentration of titratable acid groups, irrespective of their ionization state. Then, we can characterize the acid-base ionization equilibrium using the ionization degree and $\mathrm{pH}$, defined as $$ \mathrm{pH} = -\log_{10} a_{\mathrm{H^{+}}} \overset{\mathrm{ideal}}{=} -\log_{10} (c_{\mathrm{H^{+}}} / c^{\ominus}) $$ Note that the second equality is valid only in the ideal case. Substituting for the ionization degree and $\mathrm{pH}$ into the expression for $K_{\mathrm{A}}$ we obtain the Henderson-Hasselbalch equation $$ \mathrm{pH}-\mathrm{p}K_{\mathrm{A}} = \log_{10} \frac{\alpha}{1-\alpha} $$ One implication of the Henderson-Hasselbalch equation is that at a fixed $\mathrm{pH}$ value the ionization degree of an ideal acid is independent of concentration. Another implication is that the degree of ionization depends only on the difference, $\mathrm{pH}-\mathrm{p}K_{\mathrm{A}}$ but not on the absolute values of $\mathrm{p}K_{\mathrm{A}}$ and $\mathrm{pH}$ alone. Therefore, for an ideal system, the ionization degree $\alpha$ can be obtained from the equation via the following function:

In [1]:

```
# ionization degree alpha calculated from the Henderson-Hasselbalch equation for an ideal system
def ideal_alpha(pH, pK):
return 1. / (1 + 10**(pK - pH))
```

The constant $\mathrm{pH}$ method [Reed1992] has been designed to simulate an acid-base ionization reaction at a given $\mathrm{pH}$. It assumes that $\mathrm{pH}$ in the system is constant, independent of the ionization degree. In a real solution, the ionization reaction generates an $\mathrm{H^+}$ ion, that immediately reacts with the buffer that has been added to the solution in order to keep the $\mathrm{pH}$ (approximately) constant. Effectively, the net reaction can be written as $$\mathrm{HA} \Leftrightarrow \mathrm{A}^- + \mathrm{B}^+ ,$$ where we use $\mathrm{B}^+$ to denote a generic neutralizing counterion that could be either the $\mathrm{H}^+$ ion, or another cation that is generated by the reaction of the $\mathrm{H}^+$ ion with the buffer. In the literature, this counterion is often denoted as $\mathrm{H^+}$, however, this may lead to confusion because then the actual number of $\mathrm{H^+}$ ions in the box does not correspond to their concentration implied by the fixed value of the $\mathrm{pH}$.

In ESPResSo, the forward step of the ionization reaction (from left to right) is implemented by changing the chemical identity (particle type) of a randomly selected $\mathrm{HA}$ particle to $\mathrm{A}^-$, and inserting another particle that represents a neutralizing counterion. In the reverse direction (from right to left), the chemical identity (particle type) of a randomly selected $\mathrm{A}^{-}$ is changed to $\mathrm{HA}$, and a randomly selected $\mathrm{B}^+$ is deleted from the simulation box. The probability of proposing the forward reaction step is $P_\mathrm{for}=N_\mathrm{HA}/N_\mathrm{A}$, and probability of proposing the reverse step is $P_\mathrm{rev}=N_\mathrm{A^{-}}/N_\mathrm{A} = 1 - P_\mathrm{for}$. The trial move is accepted with the acceptance probability

$$ P_{\mathrm{acc}} = \operatorname{min}\left(1, \exp(-\beta \Delta E_\mathrm{pot} \pm \ln(10) \cdot (\mathrm{pH - p}K_A) ) \right)$$Here $\Delta E_\text{pot}$ is the potential energy change due to the reaction, while $\text{pH - p}K_\mathrm{A}$ is an input parameter. The signs $\pm 1$ correspond to the forward and reverse direction of the ionization reaction, respectively.

First we import all necessary modules including ESPResSo for simulations and others for convenience.

In [2]:

```
%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams.update({'font.size': 18})
import numpy as np
import pkg_resources
import pint # module for working with units and dimensions
import time
assert pkg_resources.packaging.specifiers.SpecifierSet('>=0.10.1').contains(pint.__version__), \
f'pint version {pint.__version__} is too old: several numpy operations can cast away the unit'
import espressomd
espressomd.assert_features(['WCA', 'ELECTROSTATICS'])
import espressomd.electrostatics
import espressomd.reaction_methods
import espressomd.polymer
from espressomd.interactions import HarmonicBond
```

`pint.UnitRegistry`

and access its unit definitions and automatic conversions. For more information or a quick introduction please look at the pint-documentation or pint-tutorials.

In [3]:

```
ureg = pint.UnitRegistry()
```

The inputs that we need to define our system in the simulation include

- temperature
`TEMPERATURE`

- relative permittivity of water
`WATER_PERMITTIVITY`

- Bjerrum length
`BJERRUM_LENGTH`

- concentration of the titratable units
`C_ACID`

- system size (given by the number of titratable units)
`N_ACID`

- concentration of added salt
`C_SALT`

- dissociation constant
`pK`

`pH`

- types of non-bonded interactions we want to use
- particle types
`TYPES`

and charges`CHARGES`

and their mapping to espresso

We start by setting the desired temperature $T$ using the variable `TEMPERATURE`

, particle size $\sigma$ using the variable `PARTICLE_SIZE`

and the relative permittivity $\varepsilon_{\mathrm{r}}$ of the aqueous medium at this temperature, using the variable `WATER_PERMITTIVITY`

.

Another important length scale is the `BJERRUM_LENGTH`

which has the physical meaning of the distance, at which the interaction energy between two elementary charges is equal to $k_{\mathrm{B}}T$:
$$ l_{\mathrm{B}} = \frac{e^2}{4\pi \varepsilon_0 \varepsilon_{\mathrm{r}} k_{\mathrm{B}} T} $$

Next, we use the above values to define the conversion factors between the values of input parameters in the SI units and the reduced units of length and energy that ESPResSo requires as inputs.

In coarse-grained models of aqueous solutions, it is customary to set the size of ions to approximately $\sigma = 0.355\,\mathrm{nm}$, that corresponds to one half of Bjerrum length in water at room temperature $l_{\mathrm{B}} =0.71\,\mathrm{nm}$. We choose the ion size, $\sigma = 0.355\,\mathrm{nm}$, as the unit of length, which we conveniently do by defining a new unit using the pint module. Then, the Bjerrum length should attain the value of 2.0 in the reduced units. The ratio of Bjerrum length to other important length scales in the system is an important parameter that determines the relative strength of the electrostatic interactions. Furthermore, we choose the thermal energy $k_{\mathrm{B}}T$ as the unit of energy, and the elementary charge $e$ as the unit of charge.

When calculating the Bjerrum length in the script, it is convenient to convert the computed value to $\mathrm{nm}$ using pint, because then it is easier to use in further calculations.

In [4]:

```
TEMPERATURE = 298 * ureg.K
WATER_PERMITTIVITY = 78.5 # at 298 K
KT = TEMPERATURE * ureg.k_B
PARTICLE_SIZE = 0.355 * ureg.nm
BJERRUM_LENGTH = (ureg.e**2 / (4 * ureg.pi * ureg.eps0 * WATER_PERMITTIVITY * KT)).to('nm')
ureg.define(f'reduced_energy = {TEMPERATURE} * boltzmann_constant')
ureg.define(f'reduced_length = {PARTICLE_SIZE}')
ureg.define(f'reduced_charge = 1*e')
# store the values of some parameters in dimensionless reduced units, that will be later passed to ESPResSo
KT_REDUCED = KT.to('reduced_energy').magnitude
BJERRUM_LENGTH_REDUCED = BJERRUM_LENGTH.to('reduced_length').magnitude
PARTICLE_SIZE_REDUCED = PARTICLE_SIZE.to('reduced_length').magnitude
print("KT = {KT.to('reduced_energy'):4.3f}")
print("PARTICLE_SIZE = {PARTICLE_SIZE.to('reduced_length'):4.3f}")
print("BJERRUM_LENGTH = {BJERRUM_LENGTH.to('reduced_length'):4.3f}")
```

We want to simulate a solution of polyelectrolyte chains composed of weak acidic groups. The chains are characterized by the number of acidic groups, their $\mathrm{p}K_{\mathrm{A}}$ and their concentration in the solution. The polyelectrolytes are dissolved in an aqueous solution of NaCl, further characterized by the salt concentration and by the $\mathrm{pH}$ value.

To specify the system properties, we to define the total number of titratable units `N_ACID`

, their concentration `C_ACID`

and the salt concentration `C_SALT`

. We set the dissociation constant of the acid to $\mathrm{p}K_\mathrm{A}=4.88$, that is the acidity constant of propionic acid. The structure of propionic acid resembles the repeating unit of poly(acrylic acid), the most commonly used weak polyacid.

In [5]:

```
# Parameters that define properties of the system
N_ACID = 20
C_ACID = 1e-3 * ureg.molar
C_SALT = 2 * C_ACID
pKa = 4.88 # acidity constant
pKw = 14.0 # autoprotolysis constant of water
```

We will simulate multiple $\mathrm{pH}$ values, where the range is determined by the parameters `OFFSET`

and `NUM_PHS`

.
The titration curve of the ideal system is symmetric around $\mathrm{pH} = \mathrm{p}K_{\mathrm{A}}$ but the titration curve of the non-ideal system is shifted to higher $\mathrm{pH}$ values. Therefore, we apply the offset asymmetrically.

In [6]:

```
# Number of pH values that we want to sample
NUM_PHS = 15 # number of pH values
OFFSET = 2.0 # range of pH values to be used = pKa +/- offset
pHmin = pKa - 0.5 * OFFSET # lowest pH value to be used
pHmax = pKa + 1.5 * OFFSET # highest pH value to be used
pHs = np.linspace(pHmin, pHmax, NUM_PHS) # list of pH values
```

Here we decide what kind of non-bonded interactions we want to use. By setting `USE_WCA`

to `True`

the script below creates WCA-interactions between all particles. Setting `USE_ELECTROSTATICS`

to `True`

will result in electrostatic interactions being turned on. Using electrostatic interactions and differently charged particles always has to be coupled with a short-range-repulsion-interaction, commonly a WCA-interaction.

To be able to compare our results to the analytical solutions for ideal systems and to obtain results very quickly, we begin with all non-bonded interactions turned off. In the next runs, we will add the steric repulsion and electrostatic interactions to observe their effect on the ionization.

Because it is an interactive tutorial, we want it to be completed within few minutes. Therefore, we choose the Debye-Hückel potential for electrostatics by default. It produces results that are similar to the full electrostatics calculated by the P3M method but we gain about a factor of 5 in the simulation runtime. In this way, we sacrifice accuracy for speed. It's definitely not recommended for production simulations.

In [7]:

```
# Simulate an interacting system with steric repulsion (Warning: it will be slower than without WCA!)
USE_WCA = True
# Simulate an interacting system with electrostatics
USE_ELECTROSTATICS = False
# By default, we use the Debye-Huckel potential because it allows short runtime of the tutorial
# You can also use the P3M method for full electrostatics (Warning: it will be much slower than DH!)
USE_P3M = False
if USE_ELECTROSTATICS:
assert USE_WCA, "Use electrostatics only with a short-range repulsive potential. Otherwise, singularity occurs at zero distance."
```

For the error analysis we specify the number of blocks `N_BLOCKS`

and the desired number of samples per block `DESIRED_BLOCK_SIZE`

. From that we can estimate the total number of statistically independent samples that we want to collect `NUM_SAMPLES`

. Note that samples collected in our simulations are not really statistically independent, so the actual number of independent samples is always lower than `NUM_SAMPLES`

.
Therefore, the actual number of independent samples depends on the properties of the simulated system and needs to be estimated at the end of the simulation.
We keep the number of samples rather low, so that the tutorial completes fast. If you want to obtain data with a better statistical quality, we encourage you to increase the number of samples.

In [8]:

```
N_BLOCKS = 16 # number of block to be used in data analysis
DESIRED_BLOCK_SIZE = 10 # desired number of samples per block
PROB_INTEGRATION = 0.6 # Probability of running MD integration after the reaction move.
# This parameter changes the speed of convergence but not the limiting result
# to which the simulation converges
# number of reaction samples per each pH value
NUM_SAMPLES = int(N_BLOCKS * DESIRED_BLOCK_SIZE)
```

After setting all system parameters that can be chosen independently, we calculate additional parameters that are uniquely defined by the choices above and will be used as direct inputs for ESPResSo.

From both the specified concentration and number of titratable units we can calculate the box volume `BOX_V`

. With our choice of a cubic simulation box we can subsequently determine the box length `BOX_L`

.
Once the box volume is fixed, the chosen salt concentration and the box volume determine the number of additional salt ion pairs `N_SALT`

that should be present in the system.

We also calculate the Debye length using the semi-empirical formula from equation 14.36 in chapter 14 of [Israelachvili2011]: $\lambda_{\mathrm{D}} [\mathrm{nm}] = \kappa^{-1} [\mathrm{nm}] = 0.304/\sqrt{I(\mathrm{[M]})} $

In [9]:

```
BOX_V = (N_ACID / (ureg.avogadro_constant * C_ACID)).to("nm^3")
BOX_L = np.cbrt(BOX_V)
BOX_L_REDUCED = BOX_L.to('reduced_length').magnitude
KAPPA = np.sqrt(C_SALT.to('mol/L').magnitude) / 0.304 / ureg.nm
KAPPA_REDUCED = KAPPA.to('1/reduced_length').magnitude
print(f"KAPPA = {KAPPA:.3f}")
print(f"KAPPA_REDUCED = {KAPPA_REDUCED:.3f}")
print(f"Debye_length: {1. / KAPPA:.2f} = {(1. / KAPPA).to('reduced_length'):.2f}")
N_SALT = int((C_SALT * BOX_V * ureg.avogadro_constant))
print("Calculated values:")
print(f"BOX_L = {BOX_L:.3g} = {BOX_L.to('reduced_length'):.3g}")
print(f"BOX_V = {BOX_V:.3g} = {BOX_V.to('reduced_length^3'):.3g}")
print(f"N_SALT = {N_SALT}")
```

Finally we have to set the particle types we want to simulate and their mapping to ESPResSo-particle-types, as well as particle charges.

In [10]:

```
# particle types of different species
TYPES = {
"HA": 0,
"A": 1,
"B": 2,
"Na": 3,
"Cl": 4,
}
# particle charges of different species
CHARGES = {
"HA": (0 * ureg.e).to("reduced_charge").magnitude,
"A": (-1 * ureg.e).to("reduced_charge").magnitude,
"B": (+1 * ureg.e).to("reduced_charge").magnitude,
"Na": (+1 * ureg.e).to("reduced_charge").magnitude,
"Cl": (-1 * ureg.e).to("reduced_charge").magnitude,
}
```

When initializing the ESPResSo system, some required parameters must be set.

`box_l`

defines the edge length of the cubic simulation box`time_step`

defines the integration step of the MD integration.`skin`

defines the properties of the cell system that do not affect the results but may affect the speed of the calculation, see user guide for more details.`seed`

defines the seed of the random number generator used by the ESPResSo system

In [11]:

```
system = espressomd.System(box_l=[BOX_L_REDUCED] * 3)
system.time_step = 0.01
system.cell_system.skin = 2.0
np.random.seed(seed=10) # initialize the random number generator in numpy
```

After defining the simulation parameters, we set up the system that we want to simulate. It is a polyelectrolyte chain with some added salt that is used to control the ionic strength of the solution.

First we define the parameters of bonds connecting the polymer particles and add the bonded interaction type to the system.
Then we create the particles. Bonded particle positions of a linear polymer can be created via the `espressomd.polymer.linear_polymer_positions`

, for more details see corresponding section in the documentation. Finally we add the $\mathrm{B}^+$-ions to the system, followed by adding the salt-ion pairs to the system.

In [12]:

```
# we need to define bonds before creating polymers
hb = HarmonicBond(k=30, r_0=1.0)
system.bonded_inter.add(hb)
# create the polymer positions
polymers = espressomd.polymer.linear_polymer_positions(n_polymers=1,
beads_per_chain=N_ACID,
bond_length=0.9,
seed=23)
# add the polymer particles composed of ionizable acid groups, initially in the ionized state
for polymer in polymers:
prev_particle = None
for position in polymer:
p = system.part.add(pos=position, type=TYPES["A"], q=CHARGES["A"])
if prev_particle:
p.add_bond((hb, prev_particle))
prev_particle = p
# add the corresponding number of H+ ions
system.part.add(pos=np.random.random((N_ACID, 3)) * BOX_L_REDUCED,
type=[TYPES["B"]] * N_ACID,
q=[CHARGES["B"]] * N_ACID)
# add salt ion pairs
system.part.add(pos=np.random.random((N_SALT, 3)) * BOX_L_REDUCED,
type=[TYPES["Na"]] * N_SALT,
q=[CHARGES["Na"]] * N_SALT)
system.part.add(pos=np.random.random((N_SALT, 3)) * BOX_L_REDUCED,
type=[TYPES["Cl"]] * N_SALT,
q=[CHARGES["Cl"]] * N_SALT)
print(f"The system contains {len(system.part)} particles")
```

The system contains 120 particles

If the WCA interaction has been enabled via the `USE_WCA`

flag, we activate it for each possible pair of particle types in the system. Afterwards we relax the initial overlaps using the steepest-descent integrator. Then we add the Langevin thermostat to the system and let the system relax further by running `1000`

integration steps of the Langevin dynamics.

After the initial relaxation we set the electrostatic interactions between the particles if it has been enabled via the `USE_ELECTROSTATICS`

flag. For electrostatics can use either the Debye-Hückel `DH`

algorithm or the `P3M`

algorithm. The `DH`

algorithm is based on the Debye-Hückel approximation, the assumptions of which are not satisfied in our simulated system. However, it runs much faster than the `P3M`

algorithm, and the approximate result closely resembles the correct one. Therefore, the `DH`

algorithm should not be used in production simulations. By using the `DH`

algorithm in this tutorial, we sacrifice the accuracy for speed.
To obtain an accurate result, we can use the `P3M`

algorithm using `accuracy`

of $10^{-3}$ as an acceptable tradeoff between accuracy and performance. For production runs it might be necessary to use a lower value of `accuracy`

, depending on the simulated system.

By default, ESPResSo uses the regular decomposition cell system to speed up the calculation of short-range interactions. However, for a system with small number of particles and without electrostatics, it runs faster with `n_square`

cell system. See the user guide for additional details on the cell systems.

In [13]:

```
if USE_WCA:
# set the WCA interaction between all particle pairs
for type_1 in TYPES.values():
for type_2 in TYPES.values():
if type_1 >= type_2:
system.non_bonded_inter[type_1, type_2].wca.set_params(epsilon=1.0, sigma=1.0)
# relax the overlaps with steepest descent
system.integrator.set_steepest_descent(f_max=0, gamma=0.1, max_displacement=0.1)
system.integrator.run(20)
system.integrator.set_vv() # to switch back to velocity Verlet
# add thermostat and short integration to let the system relax further
system.thermostat.set_langevin(kT=KT_REDUCED, gamma=1.0, seed=7)
system.integrator.run(steps=1000)
if USE_ELECTROSTATICS:
COULOMB_PREFACTOR=BJERRUM_LENGTH_REDUCED * KT_REDUCED
if USE_P3M:
coulomb = espressomd.electrostatics.P3M(prefactor=COULOMB_PREFACTOR,
accuracy=1e-3)
else:
coulomb = espressomd.electrostatics.DH(prefactor=COULOMB_PREFACTOR,
kappa=KAPPA_REDUCED,
r_cut=1. / KAPPA_REDUCED)
system.electrostatics.solver = coulomb
else:
# this speeds up the simulation of dilute systems with small particle numbers
system.cell_system.set_n_square()
```

After the particles have been added to the system we initialize the `espressomd.reaction_methods`

. The parameters to set are:

`kT`

specifies the $k_\mathrm{B}T$ value which is used as the inverse-temperature in the Boltzmann-factor to calculate the probabilities for the insertion.`exclusion_range`

specifies the distance around particles already existing in the system, where new particles will not be inserted. This ensures that there are no big overlaps with the newly inserted particles, so that MD integration does not crash. The value of the exclusion range does not affect the limiting result, however, it affects the convergence and the stability of the integration. For interacting systems, exclusion range should be comparable to the particle size. For non-interacting systems, we can set the exclusion range to $0.0$.`seed`

for the random number generator (RNG) is an arbitrary number that defines the starting point in the RNG sequence. Using the same seed should yield reproducible simualtion results if the same RNG implementation is used. When running multiple copies of the same simulation, each of them should use a different seed.

**Exercise:**

- Use
`espressomd.reaction_methods.ConstantpHEnsemble`

to create an instance of the reaction-ensemble constant $\mathrm{pH}$-method called`RE`

In [14]:

```
exclusion_range = PARTICLE_SIZE_REDUCED if USE_WCA else 0.0
RE = espressomd.reaction_methods.ConstantpHEnsemble(
kT=KT_REDUCED,
exclusion_range=exclusion_range,
seed=77,
constant_pH=2 # temporary value
)
RE.set_non_interacting_type(type=len(TYPES)) # this parameter helps speed up the calculation in an interacting system
```

**Exercise:**

- Use
`espressomd.reaction_methods.ConstantpHEnsemble.add_reaction()`

to add the reaction; remember to use the variables that were set up above for the reaction constant and the particle types and charges

** Hint:** Make sure to place

`TYPES["HA"]`

and `TYPES["A"]`

as first elements in the `reactant_types`

and `product_types`

lists respectivelyIn [15]:

```
RE.add_reaction(
gamma=10**(-pKa),
reactant_types=[TYPES["HA"]],
product_types=[TYPES["A"], TYPES["B"]],
default_charges={TYPES["HA"]: CHARGES["HA"],
TYPES["A"]: CHARGES["A"],
TYPES["B"]: CHARGES["B"]}
)
```

In the example above, the order of reactants and products ensures that identity of $\mathrm{HA}$ is changed to $\mathrm{A^{-}}$ and vice versa, while $\mathrm{B^{+}}$ is inserted/deleted in the reaction move.

If we had used the reverse the order of products in our reaction (i.e. from `product_types=[TYPES["A"], TYPES["B"]]`

to `product_types=[TYPES["B"], TYPES["A"]]`

), then the identity $\mathrm{HA}$ would be changed to $\mathrm{B^{+}}$, while $\mathrm{A^{-}}$ would be inserted/deleted at a random position in the box. Therefore $\mathrm{B^{+}}$ would become a part of the polymer chain whereas $\mathrm{A^{-}}$ would become a free ion.

Finally, we can perform simulations at different $\mathrm{pH}$ values. To do so in a single loop, we need to equilibrate the simulated system each time when we change the $\mathrm{pH}$, before collecting the samples.

**Exercise:**

- Write a function called
`equilibrate_reaction()`

that performs the equilibration of the $\mathrm{pH}$ value by performing reaction moves in the system by calling`RE.reaction()`

.

** Hint:** To ensure sufficient equilibration of the reaction, one should use a number of steps,

`reaction_steps`

, that is greater than the number of titratable acid groups in the system, `N_ACID`

.In [16]:

```
def equilibrate_reaction(reaction_steps=1):
RE.reaction(steps=reaction_steps)
```

After the system has been equilibrated, the integration/sampling loop follows.

**Exercise:**

Write a function called

`perform_sampling()`

that implements the sampling loopThe function should take the following parameters:

- an integer
`type_A`

that contains the particle type that should be followed when sampling the composition - an integer
`num_samples`

that determines the number of sampling steps to be performed - a numpy array
`num_As`

, such that`len(num_As) == num_samples`

which will be used to store the time series of the instantaneous values of $N_{\mathrm{A^{-}}}$ - a float
`reaction_steps`

that determines the number of reactions to be performed

- an integer

- The function should perform the following tasks:
- sample the composition fluctuations using reaction moves of the constant $\mathrm{pH}$ algorithm by calling
`RE.reaction()`

- sample the configuration space using the Langevin dynamics (LD) integration
- perform the reaction moves at each sampling step
- perform the LD integration in each sampling step with the probability
`PROB_INTEGRATION`

, and only if the interactions have been activated - after each sampling step, save the current number of particles of type $\mathrm{A^-}$ in the array
`num_As`

for later analysis

- sample the composition fluctuations using reaction moves of the constant $\mathrm{pH}$ algorithm by calling

*Hint:*

- the value of
`reaction_steps`

in each sampling step should be comparable to the number of titratable groups (`N_ACID`

) in the system - the number of LD integration steps should be at least
`1000`

- the number of particles of a certain type can be obtained via the function
`espressomd.system.System.number_of_particles()`

In [17]:

```
def perform_sampling(type_A, num_samples, num_As:np.ndarray, reaction_steps,
prob_integration=0.5, integration_steps=1000):
for i in range(num_samples):
if USE_WCA and np.random.random() < prob_integration:
system.integrator.run(integration_steps)
# we should do at least one reaction attempt per reactive particle
RE.reaction(steps=reaction_steps)
num_As[i] = system.number_of_particles(type=type_A)
```

`RE.constant_pH`

and use our `equilibrate_reaction`

function to equilibrate the system. After that the samplings are performed with our `perform_sampling`

function.

In [18]:

```
# empty numpy array as a placeholder for collecting the data
num_As_at_each_pH = -np.ones((len(pHs), NUM_SAMPLES)) # number of A- species observed at each sample
# run a productive simulation and collect the data
print(f"Simulated pH values: {pHs}")
for ipH, pH in enumerate(pHs):
print(f"Run pH {pH:.2f} ...")
RE.constant_pH = pH # set new pH value
start_time = time.time()
equilibrate_reaction(reaction_steps=N_ACID + 1) # pre-equilibrate the reaction to the new pH value
perform_sampling(type_A=TYPES["A"],
num_samples=NUM_SAMPLES,
num_As=num_As_at_each_pH[ipH, :],
reaction_steps=N_ACID + 1,
prob_integration=PROB_INTEGRATION) # perform sampling / run production simulation
runtime = (time.time() - start_time) * ureg.s
print(f"runtime: {runtime:.2g};",
f"measured number of A-: {np.mean(num_As_at_each_pH[ipH]):.2f}",
f"(ideal: {N_ACID * ideal_alpha(pH, pKa):.2f})",
)
print("\nDONE")
```

Now we plot our results and compare them to the analytical results obtained from the Henderson-Hasselbalch equation. Since our simulation is a stochastic process, its outputs are random numbers that may change with each run. However, in the limit of infinite simulation time, they converge to one specific value, that is the ensemble average. For a meaningful comparison of simulation results with the theory, it is necessary to supply the estimates of the ensemble averages as well as the estimates of their statistical error. Differences between the simulation and theory can be considered significant only if they are greater than the estimated statistical error.

The molecular simulation produces a time series of the observables, that constitute a Markov chain. It is a sequence of realizations of a random process, where the next value in the sequence depends on the preceding one. Therefore, the subsequent values are correlated. To estimate statistical error of the averages determined in the simulation, one needs to correct for the correlations.

Here, we will use a rudimentary way of correcting for correlations, termed the binning method. We refer the reader to specialized literature for a more sophisticated discussion, for example [Janke2002]. The general idea is to group a long sequence of correlated values into a rather small number of blocks, and compute an average per each block. If the blocks are big enough, they can be considered uncorrelated, and one can apply the formula for standard error of the mean of uncorrelated values. If the number of blocks is small, then they are uncorrelated but the obtained error estimate has a high uncertainty. If the number of blocks is high, then they are too short to be uncorrelated, and the obtained error estimates are systematically lower than the correct value. Therefore, the method works well only if the sample size is much greater than the autocorrelation time, so that it can be divided into a sufficient number of mutually uncorrelated blocks.

In [19]:

```
# statistical analysis of the results
def block_analyze(input_data, n_blocks=16):
data = np.asarray(input_data)
block = 0
# this number of blocks is recommended by Janke as a reasonable compromise
# between the conflicting requirements on block size and number of blocks
block_size = int(data.shape[1] / n_blocks)
print(f"block_size: {block_size}")
# initialize the array of per-block averages
block_average = np.zeros((n_blocks, data.shape[0]))
# calculate averages per each block
for block in range(n_blocks):
block_average[block] = np.average(data[:, block * block_size: (block + 1) * block_size], axis=1)
# calculate the average and average of the square
av_data = np.average(data, axis=1)
av2_data = np.average(data * data, axis=1)
# calculate the variance of the block averages
block_var = np.var(block_average, axis=0)
# calculate standard error of the mean
err_data = np.sqrt(block_var / (n_blocks - 1))
# estimate autocorrelation time using the formula given by Janke
# this assumes that the errors have been correctly estimated
tau_data = np.zeros(av_data.shape)
for val in range(av_data.shape[0]):
if av_data[val] == 0:
# unphysical value marks a failure to compute tau
tau_data[val] = -1.0
else:
tau_data[val] = 0.5 * block_size * n_blocks / (n_blocks - 1) * block_var[val] \
/ (av2_data[val] - av_data[val] * av_data[val])
return av_data, err_data, tau_data, block_size
```

`N_ACID`

. Finally, we plot the degree of ionization $\alpha$ as a function of the $\mathrm{pH}$.

In [20]:

```
# estimate the statistical error and the autocorrelation time using the formula given by Janke
av_num_As, err_num_As, tau, block_size = block_analyze(num_As_at_each_pH, N_BLOCKS)
print(f"av = {av_num_As}")
print(f"err = {err_num_As}")
print(f"tau = {tau}")
# calculate the average ionization degree
av_alpha = av_num_As / N_ACID
err_alpha = err_num_As / N_ACID
# plot the simulation results compared with the ideal titration curve
plt.figure(figsize=(10, 6), dpi=80)
plt.errorbar(pHs - pKa, av_alpha, err_alpha, marker='o', linestyle='none',
label=r"simulation")
pHs2 = np.linspace(pHmin, pHmax, num=50)
plt.plot(pHs2 - pKa, ideal_alpha(pHs2, pKa), label=r"Henderson-Hasselbalch")
plt.xlabel(r'$\mathrm{pH} - \mathrm{p}K_{\mathrm{A}}$', fontsize=16)
plt.ylabel(r'$\alpha$', fontsize=16)
plt.legend(fontsize=16)
plt.show()
```

The simulation results for the non-interacting case match very well with the analytical solution of Henderson-Hasselbalch equation. There are only minor deviations, and the estimated errors are small too. This situation will change when we introduce interactions.

It is useful to check whether the estimated errors are consistent with the assumptions that were used to obtain them. To do this, we follow [Janke2002] to estimate the number of uncorrelated samples per block, and check whether each block contains a sufficient number of uncorrelated samples (we choose 10 uncorrelated samples per block as the threshold value).

Intentionally, we made our simulation slightly too short, so that it does not produce enough uncorrelated samples. We encourage the reader to vary the number of blocks or the number of samples to see how the estimated error changes with these parameters.

In [21]:

```
# check if the blocks contain enough data for reliable error estimates
print(f"uncorrelated samples per block:\nblock_size/tau = {block_size / tau}")
threshold = 10 # block size should be much greater than the correlation time
if np.any(block_size / tau < threshold):
print(f"\nWarning: some blocks may contain less than {threshold} uncorrelated samples."
"\nYour error estimated may be unreliable."
"\nPlease, check them using a more sophisticated method or run a longer simulation.")
print(f"? block_size/tau > threshold ? : {block_size / tau > threshold}")
else:
print(f"\nAll blocks seem to contain more than {threshold} uncorrelated samples."
"Error estimates should be OK.")
```

In [22]:

```
# plot the deviations from the ideal result
plt.figure(figsize=(10, 6), dpi=80)
ylim = np.amax(abs(av_alpha - ideal_alpha(pHs, pKa)))
plt.ylim((-1.5 * ylim, 1.5 * ylim))
plt.errorbar(pHs - pKa, av_alpha - ideal_alpha(pHs, pKa),
err_alpha, marker='o', linestyle='none', label=r"simulation")
plt.plot(pHs - pKa, 0.0 * ideal_alpha(pHs, pKa), label=r"Henderson-Hasselbalch")
plt.xlabel(r'$\mathrm{pH} - \mathrm{p}K_{\mathrm{A}}$', fontsize=16)
plt.ylabel(r'$\alpha - \alpha_{ideal}$', fontsize=16)
plt.legend(fontsize=16)
plt.show()
```

Up to now we did not discuss the chemical nature the neutralizer $\mathrm{B^+}$. It is not obvious how the chemical nature of the $\mathrm{B^+}$ ion should be interpreted in a coarse-grained model, where water molecules and $\mathrm{H^+}$ ions are not represented explicitly. Now, we address this issue using some specific examples.

In the simplest case, if we add an acidic polymer to pure water that has $\mathrm{pH} = 7$, some of the acid groups dissociate and release $\mathrm{H^+}$ ions into the solution. The $\mathrm{pH}$ decreases to a value that depends on $\mathrm{p}K_{\mathrm{A}}$ and on the concentration of ionizable groups. Now, three ionic species are present in the solution: $\mathrm{H^+}$, $\mathrm{A^-}$, and $\mathrm{OH^-}$.

In this case the $\mathrm{B^+}$ ions generated in our implementation of the reaction correspond to the $\mathrm{H^+}$ ions. The $\mathrm{H^+}$ act as the counter-ions, neutralizing both the $\mathrm{A^-}$ and the $\mathrm{OH^-}$ ions. At acidic $\mathrm{pH}$ there are only very few $\mathrm{OH^-}$ ions and nearly all $\mathrm{H^+}$ ions act as neutralizer for the $\mathrm{A^-}$ ions. Therefore, the concentration of $\mathrm{B^+}$ is very close to the concentration of $\mathrm{H^+}$ in the real aqueous solution. Only very few $\mathrm{OH^-}$ ions, and the $\mathrm{H^+}$ ions needed to neutralize them, are missing in the simulation box, when compared to the real solution.

To achieve a more acidic $\mathrm{pH}$ (with the same $\mathrm{p}K_{\mathrm{A}}$ and polymer concentration), we need to add an acid to the system. We can do that by adding a strong acid, such as $\mathrm{HCl}$ or $\mathrm{HNO}_3$. We will denote this acid by a generic name $\mathrm{HX}$ to emphasize that in general its anion can be different from the salt anion $\mathrm{Cl^{-}}$. Now, there are 4 ionic species in the solution: $\mathrm{H^+}$, $\mathrm{A^-}$, $\mathrm{OH^-}$, and $\mathrm{X^-}$ ions. By the same argument as before, the $\mathrm{B^+}$ ions again correspond to the $\mathrm{H^+}$ ions. The $\mathrm{H^+}$ ions neutralize the $\mathrm{A^-}$, $\mathrm{OH^-}$, and the $\mathrm{X^-}$ ions. Because the concentration of $\mathrm{X^-}$ is not negligible, the concentration of $\mathrm{B^+}$ in the simulation box differs from the $\mathrm{H^+}$ concentration in the real solution. Now, many more ions are missing in the simulation box, as compared to the real solution: Few $\mathrm{OH^-}$ ions, many $\mathrm{X^-}$ ions, and all the $\mathrm{H^+}$ ions that neutralize them.

To achieve a neutral $\mathrm{pH}$ we need to add some base to the system to neutralize the polymer. In the simplest case we add an alkali metal hydroxide, such as $\mathrm{NaOH}$ or $\mathrm{KOH}$, that we will generically denote as $\mathrm{MOH}$. Now, there are 4 ionic species in the solution: $\mathrm{H^+}$, $\mathrm{A^-}$, $\mathrm{OH^-}$, and $\mathrm{M^+}$. In such situation, we can not clearly attribute a specific chemical identity to the $\mathrm{B^+}$ ions. However, only very few $\mathrm{H^+}$ and $\mathrm{OH^-}$ ions are present in the system at $\mathrm{pH} = 7$. Therefore, we can make the approximation that at this $\mathrm{pH}$, all $\mathrm{A^-}$ are neutralized by the $\mathrm{M^+}$ ions, and the $\mathrm{B^+}$ correspond to $\mathrm{M^+}$. Then, the concentration of $\mathrm{B^+}$ also corresponds to the concentration of $\mathrm{M^+}$ ions. Now, again only few ions are missing in the simulation box, as compared to the real solution: Few $\mathrm{OH^-}$ ions, and few $\mathrm{H^+}$ ions.

To achieve a basic $\mathrm{pH}$ we need to add even more base to the system to neutralize the polymer. Again, there are 4 ionic species in the solution: $\mathrm{H^+}$, $\mathrm{A^-}$, $\mathrm{OH^-}$, and $\mathrm{M^+}$ and we can not clearly attribute a specific chemical identity to the $\mathrm{B^+}$ ions. Because only very few $\mathrm{H^+}$ ions should be present in the solution, we can make the approximation that at this $\mathrm{pH}$, all $\mathrm{A^-}$ ions are neutralized by the $\mathrm{M^+}$ ions, and therefore $\mathrm{B^+}$ ions in the simulation correspond to $\mathrm{M^+}$ ions in the real solution. Because additional $\mathrm{M^+}$ ions in the real solution neutralize the $\mathrm{OH^-}$ ions, the concentration of $\mathrm{B^+}$ does not correspond to the concentration of $\mathrm{M^+}$ ions. Now, again many ions are missing in the simulation box, as compared to the real solution: Few $\mathrm{H^+}$ ions, many $\mathrm{OH^-}$ ions, and a comparable amount of the $\mathrm{M^+}$ ions.

To further illustrate this subject, we compare the concentration of the neutralizer ion $\mathrm{B^+}$ calculated in the simulation with the expected number of ions of each species. At a given $\mathrm{pH}$ and $\mathrm{p}K_{\mathrm{A}}$ we can calculate the expected degree of ionization from the Henderson-Hasselbalch equation. Then we apply the electroneutrality condition $$c_\mathrm{A^-} + c_\mathrm{OH^-} + c_\mathrm{X^-} = c_\mathrm{H^+} + c_\mathrm{M^+}$$ where we use either $c_\mathrm{X^-}=0$ or $c_\mathrm{M^+}=0$ because we always only add extra acid or base, but never both. Adding both would be equivalent to adding extra salt $\mathrm{MX}$. We obtain the concentrations of $\mathrm{OH^-}$ and $\mathrm{H^+}$ from the input $\mathrm{pH}$ value, and substitute them to the electroneutrality equation to obtain $$\alpha c_\mathrm{acid} + 10^{-(\mathrm{p}K_{\mathrm{w}} - \mathrm{pH})} + 10^{-\mathrm{pH}} = c_\mathrm{M^+} - c_\mathrm{X^-}$$ Depending on whether the left-hand side of this equation is positive or negative we know whether we should add $\mathrm{M^+}$ or $\mathrm{X^-}$ ions.

In [23]:

```
# average concentration of B+ is the same as the concentration of A-
av_c_Bplus = av_alpha * C_ACID
err_c_Bplus = err_alpha * C_ACID # error in the average concentration
full_pH_range = np.linspace(2, 12, 100)
ideal_c_Aminus = ideal_alpha(full_pH_range, pKa) * C_ACID
ideal_c_OH = np.power(10.0, -(pKw - full_pH_range))*ureg.molar
ideal_c_H = np.power(10.0, -full_pH_range)*ureg.molar
# ideal_c_M is calculated from electroneutrality
ideal_c_M = (ideal_c_Aminus + ideal_c_OH - ideal_c_H)
# plot the simulation results compared with the ideal results of the cations
plt.figure(figsize=(10, 6), dpi=80)
plt.errorbar(pHs,
av_c_Bplus.magnitude,
err_c_Bplus.magnitude,
marker='o', c="tab:blue", linestyle='none',
label=r"measured $c_{\mathrm{B^+}}$", zorder=2)
plt.plot(full_pH_range,
ideal_c_H.magnitude,
c="tab:green",
label=r"ideal $c_{\mathrm{H^+}}$",
zorder=0)
plt.plot(full_pH_range[np.nonzero(ideal_c_M.magnitude > 0.)],
ideal_c_M.magnitude[np.nonzero(ideal_c_M.magnitude > 0.)],
c="tab:orange",
label=r"ideal $c_{\mathrm{M^+}}$",
zorder=0)
plt.plot(full_pH_range,
ideal_c_Aminus.magnitude,
c="tab:blue",
ls=(0, (5, 5)),
label=r"ideal $c_{\mathrm{A^-}}$",
zorder=1)
plt.yscale("log")
plt.ylim(5e-6,1e-2)
plt.xlabel('input pH', fontsize=16)
plt.ylabel(r'concentration $c$ $[\mathrm{mol/L}]$', fontsize=16)
plt.legend(fontsize=16)
plt.show()
```

The plot shows that at intermediate $\mathrm{pH}$ the concentration of $\mathrm{B^+}$ ions is approximately equal to the concentration of $\mathrm{M^+}$ ions. Only at one specific $\mathrm{pH}$ the concentration of $\mathrm{B^+}$ ions is equal to the concentration of $\mathrm{H^+}$ ions. This is the $\mathrm{pH}$ one obtains when dissolving the weak acid $\mathrm{A}$ in pure water.

In an ideal system, the ions missing in the simulation have no effect on the ionization degree. In an interacting system, the presence of ions in the box affects the properties of other parts of the system. Therefore, in an interacting system this discrepancy is harmless only at intermediate $\mathrm{pH}$. The effect of the small ions on the rest of the system can be estimated from the overall the ionic strength. $$ I = \frac{1}{2}\sum_i c_i z_i^2 $$

In [24]:

```
ideal_c_X = -(ideal_c_Aminus + ideal_c_OH - ideal_c_H)
ideal_ionic_strength = 0.5 * \
(ideal_c_X + ideal_c_M + ideal_c_H + ideal_c_OH + 2 * C_SALT)
# in constant-pH simulation ideal_c_Aminus = ideal_c_Bplus
cpH_ionic_strength = 0.5 * (ideal_c_Aminus + 2 * C_SALT)
cpH_ionic_strength_measured = 0.5 * (av_c_Bplus + 2 * C_SALT)
cpH_error_ionic_strength_measured = 0.5 * err_c_Bplus
plt.figure(figsize=(10, 6), dpi=80)
plt.errorbar(pHs,
cpH_ionic_strength_measured.magnitude,
cpH_error_ionic_strength_measured.magnitude,
c="tab:blue",
linestyle='none', marker='o',
label=r"measured", zorder=3)
plt.plot(full_pH_range,
cpH_ionic_strength.magnitude,
c="tab:blue",
ls=(0, (5, 5)),
label=r"constant-pH", zorder=2)
plt.plot(full_pH_range,
ideal_ionic_strength.magnitude,
c="tab:orange",
linestyle='-',
label=r"Henderson-Hasselbalch", zorder=1)
plt.ylim(1.8e-3,3e-3)
plt.xlabel('input pH', fontsize=16)
plt.ylabel(r'Ionic Strength [$\mathrm{mol/L}$]', fontsize=16)
plt.legend(fontsize=16)
plt.show()
```

Try changing the concentration of ionizable species in the non-interacting system. You should observe that it does not affect the obtained titration.

Try changing the number of samples and the number of particles to see how the estimated error and the number of uncorrelated samples will change. Be aware that if the number of uncorrelated samples is low, the error estimation is too optimistic.

Try running the same simulations with steric repulsion and then again with electrostatic interactions. Observe how the ionization equilibrium is affected by various interactions. Warning: simulations with electrostatics are much slower. If you want to obtain your results more quickly, then decrease the number of $\mathrm{pH}$ values.

[Janke2002] Janke W. Statistical Analysis of Simulations: Data Correlations and Error Estimation,
In Quantum Simulations of Complex Many-Body Systems: From Theory to Algorithms, Lecture Notes,
J. Grotendorst, D. Marx, A. Muramatsu (Eds.), John von Neumann Institute for Computing, Jülich,
NIC Series, Vol. 10, ISBN 3-00-009057-6, pp. 423-445, 2002, link.

[Landsgesell2019] Landsgesell, J.; Nová, L.; Rud, O.; Uhlík, F.; Sean, D.; Hebbeker, P.; Holm, C.; Košovan, P. Simulations of Ionization Equilibria in Weak Polyelectrolyte Solutions and Gels. Soft Matter 2019, 15 (6), 1155–1185, doi:10.1039/C8SM02085J.

[Reed1992] Reed, C. E.; Reed, W. F. Monte Carlo Study of Titration of Linear Polyelectrolytes. The Journal of Chemical Physics 1992, 96 (2), 1609–1620, doi:10.1063/1.462145.

[Smith1994] Smith, W. R.; Triska, B. The Reaction Ensemble Method for the Computer Simulation of Chemical and Phase Equilibria. I. Theory and Basic Examples. The Journal of Chemical Physics 1994, 100 (4), 3019–3027, doi:10.1063/1.466443.

[Israelachvili2011] Israelachvili, J. N. Intermolecular and Surface Forces, Third Edition, 2011, Academic Press, ISBN 978-0-12-391927-4, doi:10.1016/C2011-0-05119-0.