Introduction

This tutorial introduces the basic features for simulating titratable systems via the constant pH method. The constant pH method is one of the methods implemented for simulating systems with chemical reactions within the Reaction Ensemble module. It is a Monte Carlo method designed to model an acid-base ionization reaction at a given (fixed) value of solution pH.

We will consider a homogeneous aqueous solution of a titratable acidic species $\mathrm{HA}$ that can dissociate in a reaction, that is characterized by the equilibrium constant $\mathrm{p}K_A=-\log_{10} K_A$ $$\mathrm{HA} \Leftrightarrow \mathrm{A}^- + \mathrm{H}^+$$

If $N_0 = N_{\mathrm{HA}} + N_{\mathrm{A}^-}$ is the number of titratable groups in solution, then we define the degree of dissociation $\alpha$ as:

$$\alpha = \dfrac{N_{\mathrm{A}^-}}{N_0}.$$

This is one of the key quantities that can be used to describe the acid-base equilibrium. Usually, the goal of the simulation is to predict the value of $\alpha$ under given conditions in a complex system with interactions.

The Chemical Equilibrium and Reaction Constant

The equilibrium reaction constant describes the chemical equilibrium of a given reaction. The values of equilibrium constants for various reactions can be found in tables. For the acid-base ionization reaction, the equilibrium constant is conventionally called the acidity constant, and it is defined as \begin{equation} K_A = \frac{a_{\mathrm{H}^+} a_{\mathrm{A}^-} } {a_{\mathrm{HA}}} \end{equation} where $a_i$ is the activity of species $i$. It is related to the chemical potential $\mu_i$ and to the concentration $c_i$ \begin{equation} \mu_i = \mu_i^\mathrm{ref} + k_{\mathrm{B}}T \ln a_i \,,\qquad a_i = \frac{c_i \gamma_i}{c^{\ominus}}\,, \end{equation} where $\gamma_i$ is the activity coefficient, and $c^{\ominus}$ is the (arbitrary) reference concentration, often chosen to be the standard concentration, $c^{\ominus} = 1\,\mathrm{mol/L}$, and $\mu_i^\mathrm{ref}$ is the reference chemical potential. Note that $K$ is a dimensionless quantity but its numerical value depends on the choice of $c^0$. For an ideal system, $\gamma_i=1$ by definition, whereas for an interacting system $\gamma_i$ is a non-trivial function of the interactions. For an ideal system we can rewrite $K$ in terms of equilibrium concentrations \begin{equation} K_A \overset{\mathrm{ideal}}{=} \frac{c_{\mathrm{H}^+} c_{\mathrm{A}^-} } {c_{\mathrm{HA}} c^{\ominus}} \end{equation}

The ionization degree can also be expressed via the ratio of concentrations: \begin{equation} \alpha = \frac{N_{\mathrm{A}^-}}{N_0} = \frac{N_{\mathrm{A}^-}}{N_{\mathrm{HA}} + N_{\mathrm{A}^-}} = \frac{c_{\mathrm{A}^-}}{c_{\mathrm{HA}}+c_{\mathrm{A}^-}} = \frac{c_{\mathrm{A}^-}}{c_{\mathrm{A}}}. \end{equation} where $c_{\mathrm{A}}=c_{\mathrm{HA}}+c_{\mathrm{A}^-}$ is the total concentration of titratable acid groups irrespective of their ionization state. Then, we can characterize the acid-base ionization equilibrium using the ionization degree and pH, defined as \begin{equation} \mathrm{pH} = -\log_{10} a_{\mathrm{H^{+}}} \overset{\mathrm{ideal}}{=} -\log_{10} (c_{\mathrm{H^{+}}} / c^{\ominus}) \end{equation} Substituting for the ionization degree and pH into the expression for $K_A$ we obtain the Henderson-Hasselbalch equation \begin{equation} \mathrm{pH}-\mathrm{p}K_A = \log_{10} \frac{\alpha}{1-\alpha} \end{equation} One result of the Henderson-Hasselbalch equation is that at a fixed pH value the ionization degree of an ideal acid is independent of concentration. Another implication is, that the degree of ionization does not depend on the absolute values of $\mathrm{p}K_A$ and $\mathrm{pH}$, but only on their difference, $\mathrm{pH}-\mathrm{p}K_A$.

Constant pH Method

The constant pH method Reed1992 is designed to simulate an acid-base ionization reaction at a given pH. It assumes that the simulated system is coupled to an implicit reservoir of $\mathrm{H^+}$ ions but exchange of ions with this reservoir is not explicitly simulated. Therefore, the concentration of ions in the simulation box is not equal to the concentration of $\mathrm{H^+}$ ions at the chosen pH. This may lead to artifacts when simulating interacting systems, especially at high of low pH values. Discussion of these artifacts is beyond the scope of this tutorial (see e.g. Landsgesell2019 for further details).

In ESPResSo, the forward step of the ionization reaction (from left to right) is implemented by changing the chemical identity (particle type) of a randomly selected $\mathrm{HA}$ particle to $\mathrm{A}^-$, and inserting another particle that represents a neutralizing counterion. The neutralizing counterion is not necessarily an $\mathrm{H^+}$ ion. Therefore, we give it a generic name $\mathrm{B^+}$. In the reverse direction (from right to left), the chemical identity (particle type) of a randomly selected $\mathrm{A}^{-}$ is changed to $\mathrm{HA}$, and a randomly selected $\mathrm{B}^+$ is deleted from the simulation box. The probability of proposing the forward reaction step is $P_\text{prop}=N_\mathrm{HA}/N_0$, and probability of proposing the reverse step is $P_\text{prop}=N_\mathrm{A}/N_0$. The trial move is accepted with the acceptance probability

$$ P_{\mathrm{acc}} = \operatorname{min}\left(1, \exp(-\beta \Delta E_\mathrm{pot} \pm \ln(10) \cdot (\mathrm{pH - p}K_A) ) \right)$$

Here $\Delta E_\text{pot}$ is the potential energy change due to the reaction, while $\text{pH - p}K$ is an input parameter. The signs $\pm 1$ correspond to the forward and reverse direction of the ionization reaction, respectively.

Setup

The inputs that we need to define our system in the simulation include

From the concentration of titratable units and the number of titratable units we calculate the box length. We create a system with this box size. From the salt concentration we calculate the number of additional salt ion pairs that should be present in the system. We set the dissociation constant of the acid to $\mathrm{p}K_A=4.88$, that is the acidity constant of propionic acid. We choose propionic acid because its structure is closest to the repeating unit of poly(acrylic acid), the most commonly used weak polyacid.

We will simulate multiple pH values, the range of which is determined by the parameters offset and num_pHs.

After defining the simulation parameters, we set up the system that we want to simulate. It is a polyelectrolyte chain with some added salt that is used to control the ionic strength of the solution. For the first run, we set up the system without any steric repulsion and without electrostatic interactions. In the next runs, we will add the steric repulsion and electrostatic interactions to observe their effect on the ionization.

After setting creating the particles we initialize the reaction ensemble by setting the temperature, exclusion radius and seed of the random number generator. We set the temperature to unity, that determines that our reduced unit of energy will be $\varepsilon=1k_{\mathrm{B}}T$. In an interacting system the exclusion radius ensures that particle insertions too close to other particles are not attempted. Such insertions would make the subsequent Langevin dynamics integration unstable. If the particles are not interacting, we can set the exclusion radius to $0.0$. Otherwise, $1.0$ is a good value. We set the seed to a constant value to ensure reproducible results.

The next step is to define the reaction system. The order in which species are written in the lists of reactants and products is very important for ESPResSo. When a reaction move is performed, identity of the first species in the list of reactants is changed to the first species in the list of products, the second reactant species is changed to the second product species, and so on. If the reactant list has more species than the product list, then excess reactant species are deleted from the system. If the product list has more species than the reactant list, then product the excess product species are created and randomly placed inside the simulation box. This convention is especially important if some of the species belong to a chain-like molecule, and cannot be placed at an arbitrary position.

In the example below, the order of reactants and products ensures that identity of $\mathrm{HA}$ is changed to $\mathrm{A^{-}}$ and vice versa, while $\mathrm{H^{+}}$ is inserted/deleted in the reaction move. Reversing the order of products in our reaction (i.e. from product_types=[TYPE_B, TYPE_A] to product_types=[TYPE_A, TYPE_B]), would result in a reaction move, where the identity HA would be changed to $\mathrm{H^{+}}$, while $\mathrm{A^{-}}$ would be inserted/deleted at a random position in the box. We also assign charges to each type because the charge will play an important role later, in simulations with electrostatic interactions.

Next, we perform simulations at different pH values. The system must be equilibrated at each pH before taking samples. Calling RE.reaction(X) attempts in total X reactions (in both backward and forward direction).

Results

Finally we plot our results and compare them to the analytical results obtained from the Henderson-Hasselbalch equation.

Statistical Uncertainty

The molecular simulation produces a sequence of snapshots of the system, that constitute a Markov chain. It is a sequence of realizations of a random process, where the next value in the sequence depends on the preceding one. Therefore, the subsequent values are correlated. To estimate statistical error of the averages determined in the simulation, one needs to correct for the correlations.

Here, we will use a rudimentary way of correcting for correlations, termed the binning method. We refer the reader to specialized literature for a more sophisticated discussion, for example Janke2002. The general idea is to group a long sequence of correlated values into a rather small number of blocks, and compute an average per each block. If the blocks are big enough, they can be considered uncorrelated, and one can apply the formula for standard error of the mean of uncorrelated values. If the number of blocks is small, then they are uncorrelated but the obtained error estimates has a high uncertainty. If the number of blocks is high, then they are too short to be uncorrelated, and the obtained error estimates are systematically lower than the correct value. Therefore, the method works well only if the sample size is much greater than the autocorrelation time, so that it can be divided into a sufficient number of mutually uncorrelated blocks.

In the example below, we use a fixed number of 16 blocks to obtain the error estimates.

The simulation results for the non-interacting case match very well with the analytical solution of Henderson-Hasselbalch equation. There are only minor deviations, and the estimated errors are small too. This situation will change when we introduce interactions.

It is useful to check whether the estimated errors are consistent with the assumptions that were used to obtain them. To do this, we follow Janke2002 to estimate the number of uncorrelated samples per block, and check whether each block contains a sufficient number of uncorrelated samples (we choose 10 uncorrelated samples per block as the threshold value).

Intentionally, we made our simulation slightly too short, so that it does not produce enough uncorrelated samples. We encourage the reader to vary the number of blocks or the number of samples to see how the estimated error changes with these parameters.

To look in more detail at the statistical accuracy, it is useful to plot the deviations from the analytical result. This provides another way to check the consistency of error estimates. About 68% of the results should be within one error bar from the analytical result, whereas about 95% of the results should be within two times the error bar. Indeed, if you plot the deviations by running the script below, you should observe that most of the results are within one error bar from the analytical solution, a smaller fraction of the results is slightly further than one error bar, and one or two might be about two error bars apart. Again, this situation will change when we introduce interactions because the ionization of the interacting system should deviate from the Henderson-Hasselbalch equation.

The Neutralizing Ion $\mathrm{B^+}$

Up to now we did not discuss the chemical nature the neutralizer $\mathrm{B^+}$. The added salt is not relevant in this context, therefore we omit it from the discussion. The simplest case to consider is what happens if you add the acidic polymer to pure water ($\mathrm{pH} = 7$). Some of the acid groups dissociate and release $\mathrm{H^+}$ ions into the solution. The pH decreases to a value that depends on $\mathrm{p}K_{\mathrm{A}}$ and on the concentration of ionizable groups. Now, three ionic species are present in the solution: $\mathrm{H^+}$, $\mathrm{A^-}$, and $\mathrm{OH^-}$. Because the reaction generates only one $\mathrm{B^+}$ ion in the simulation box, we conclude that in this case the $\mathrm{B^+}$ ions correspond to $\mathrm{H^+}$ ions. The $\mathrm{H^+}$ ions neutralize both the $\mathrm{A^-}$ and the $\mathrm{OH^-}$ ions. At acidic pH there are only very few $\mathrm{OH^-}$ ions and nearly all $\mathrm{H^+}$ ions act as a neutralizer for the $\mathrm{A^-}$ ions. Therefore, the concentration of $\mathrm{B^+}$ is very close to the concentration of $\mathrm{H^+}$ in the real aqueous solution. Only very few $\mathrm{OH^-}$ ions, and the $\mathrm{H^+}$ ions needed to neutralize them, are missing in the simulation box, when compared to the real solution.

To achieve a more acidic pH (with the same pK and polymer concentration), we need to add an acid to the system. We can do that by adding a strong acid, such as $\mathrm{HCl}$ or $\mathrm{HNO}_3$. We will denote this acid by a generic name $\mathrm{HX}$ to emphasize that in general its anion can be different from the salt anion $\mathrm{Cl^{-}}$. Now, there are 4 ionic species in the solution: $\mathrm{H^+}$, $\mathrm{A^-}$, $\mathrm{OH^-}$, and $\mathrm{X^-}$ ions. By the same argument as before, we conclude that $\mathrm{B^+}$ ions correspond to $\mathrm{H^+}$ ions. The $\mathrm{H^+}$ ions neutralize the $\mathrm{A^-}$, $\mathrm{OH^-}$, and the $\mathrm{X^-}$ ions. Because the concentration of $\mathrm{X^-}$ is not negligible anymore, the concentration of $\mathrm{B^+}$ in the simulation box differs from the $\mathrm{H^+}$ concentration in the real solution. Now, many more ions are missing in the simulation box, as compared to the real solution: Few $\mathrm{OH^-}$ ions, many $\mathrm{X^-}$ ions, and all the $\mathrm{H^+}$ ions that neutralize them.

To achieve a neutral pH we need to add some base to the system to neutralize the polymer. In the simplest case we add an alkali metal hydroxide, such as $\mathrm{NaOH}$ or $\mathrm{KOH}$, that we will generically denote as $\mathrm{MOH}$. Now, there are 4 ionic species in the solution: $\mathrm{H^+}$, $\mathrm{A^-}$, $\mathrm{OH^-}$, and $\mathrm{M^+}$. In such situation, we can not clearly attribute a specific chemical identity to the $\mathrm{B^+}$ ions. However, only very few $\mathrm{H^+}$ and $\mathrm{OH^-}$ ions are present in the system at $\mathrm{pH} = 7$. Therefore, we can make the approximation that at this pH, all $\mathrm{A^-}$ are neutralized by the $\mathrm{M^+}$ ions, and the $\mathrm{B^+}$ correspond to $\mathrm{M^+}$. Then, the concentration of $\mathrm{B^+}$ also corresponds to the concentration of $\mathrm{M^+}$ ions. Now, again only few ions are missing in the simulation box, as compared to the real solution: Few $\mathrm{OH^-}$ ions, and few $\mathrm{H^+}$ ions.

To achieve a basic pH we need to add even more base to the system to neutralize the polymer. Again, there are 4 ionic species in the solution: $\mathrm{H^+}$, $\mathrm{A^-}$, $\mathrm{OH^-}$, and $\mathrm{M^+}$ and we can not clearly attribute a specific chemical identity to the $\mathrm{B^+}$ ions. Because only very few $\mathrm{H^+}$ ions should be present in the solution, we can make the approximation that at this pH, all $\mathrm{A^-}$ ions are neutralized by the $\mathrm{M^+}$ ions, and therefore $\mathrm{B^+}$ ions in the simulation correspond to $\mathrm{M^+}$ ions in the real solution. Because additional $\mathrm{M^+}$ ions in the real solution neutralize the $\mathrm{OH^-}$ ions, the concentration of $\mathrm{B^+}$ does not correspond to the concentration of $\mathrm{M^+}$ ions. Now, again many ions are missing in the simulation box, as compared to the real solution: Few $\mathrm{H^+}$ ions, many $\mathrm{OH^-}$ ions, and a comparable amount of the $\mathrm{M^+}$ ions.

To further illustrate this subject, we compare the concentration of the neutralizer ion $\mathrm{B^+}$ calculated in the simulation with the expected number of ions of each species. At a given pH and pK we can calculate the expected degree of ionization from the Henderson-Hasselbalch equation. Then we apply the electroneutrality condition $$c_\mathrm{A^-} + c_\mathrm{OH^-} + c_\mathrm{X^-} = c_\mathrm{H^+} + c_\mathrm{M^+}$$ where we use either $c_\mathrm{X^-}=0$ or $c_\mathrm{M^+}=0$ because we always only add extra acid or base, but never both. Adding both would be equivalent to adding extra salt $\mathrm{MX}$. We obtain the concentrations of $\mathrm{OH^-}$ and $\mathrm{H^+}$ from the input pH value, and substitute them to the electroneutrality equation to obtain $$\alpha c_\mathrm{acid} + 10^{-(\mathrm{p}K_\mathrm{w} - \mathrm{pH})} + 10^{-\mathrm{pH}} = c_\mathrm{M^+} - c_\mathrm{X^-}$$ Depending on whether the left-hand side of this equation is positive or negative we know whether we should add $\mathrm{M^+}$ or $\mathrm{X^-}$ ions.

The plot shows that at intermediate pH the concentration of $\mathrm{B^+}$ ions is approximately equal to the concentration of $\mathrm{M^+}$ ions. Only at one specific $\mathrm{pH}$ the concentration of $\mathrm{B^+}$ ions is equal to the concentration of $\mathrm{H^+}$ ions. This is the pH one obtains when dissolving the weak acid $\mathrm{A}$ in pure water.

In an ideal system, the ions missing in the simulation have no effect on the ionization degree. In an interacting system, the presence of ions in the box affects the properties of other parts of the system. Therefore, in an interacting system this discrepancy is harmless only at intermediate pH. The effect of the small ions on the rest of the system can be estimated from the overall the ionic strength. $$ I = \frac{1}{2}\sum_i c_i z_i^2 $$

We see that the ionic strength in the simulation box significantly deviates from the ionic strength of the real solution only at high or low pH value. If the $\mathrm{p}K_{\mathrm{A}}$ value is sufficiently large, then the deviation at very low pH can also be neglected because then the polymer is uncharged in the region where the ionic strength is not correctly represented in the constant-pH simulation. At a high pH the ionic strength will have an effect on the weak acid, because then it is fully charged. The pH range in which the constant-pH method uses approximately the right ionic strength depends on salt concentration, weak acid concentration and the $\mathrm{p}K_{\mathrm{A}}$ value. See also Landsgesell2019 for a more detailed discussion of this issue, and its consequences.

Suggested problems for further work

References

Janke2002 Janke W. Statistical Analysis of Simulations: Data Correlations and Error Estimation, In Quantum Simulations of Complex Many-Body Systems: From Theory to Algorithms, Lecture Notes, J. Grotendorst, D. Marx, A. Muramatsu (Eds.), John von Neumann Institute for Computing, Jülich, NIC Series, Vol. 10, ISBN 3-00-009057-6, pp. 423-445, 2002.

Landsgesell2019 Landsgesell, J.; Nová, L.; Rud, O.; Uhlík, F.; Sean, D.; Hebbeker, P.; Holm, C.; Košovan, P. Simulations of Ionization Equilibria in Weak Polyelectrolyte Solutions and Gels. Soft Matter 2019, 15 (6), 1155–1185.

Reed1992 Reed, C. E.; Reed, W. F. Monte Carlo Study of Titration of Linear Polyelectrolytes. The Journal of Chemical Physics 1992, 96 (2), 1609–1620.

Smith1994 Smith, W. R.; Triska, B. The Reaction Ensemble Method for the Computer Simulation of Chemical and Phase Equilibria. I. Theory and Basic Examples. The Journal of Chemical Physics 1994, 100 (4), 3019–3027.