14. Input and Output¶
14.1. No generic checkpointing¶
One of the most asked-for feature that seems to be missing in is checkpointing, a simple way to tell to store and restore the current state of the simulation, and to be able to write this state to or read it from a file. This would be most useful to be able to restart a simulation from a specific point in time.
Unfortunately, it is impossible to provide a simple command
(checkpoint
), out of two reasons. The main reason is that has no way
to determine what information constitutes the actual state of the
simulation. On the one hand, scripts sometimes use Tcl-variables that
contain essential information about a simulation, the stored values of
an observable that was computed in previous time steps, counters, etc.
These would have to be contained in a checkpoint. However, not all
Tcl-variables are of interest. For example, Tcl has a number of
automatically set variables that contain information about the hostname,
the machine type, etc. These variables should most probably not be
included the simulation state. has no way to distinguish between these
variables. On the other hand, the core has a number of internal
variables, the particle coordinates. While most of these are probably
good candidates for being included into a checkpoint, this is not
necessarily so. For example, when you have particles in your system that
have fixed coordinates, should these be stored in a checkpoint, or not?
If the system contains mostly fixed particles and only very few moving
particles, this would increase the memory size of a checkpoint
needlessly. And what about the interactions in the system, or the bonds?
Should these be stored in a checkpoint, or are they generated by the
script?
Another problem with a generic checkpoint would be the control flow of the script. In principle, the checkpoint would have to store where in the script the checkpointing function was called to be able to return there. All this is even further complicated by the fact that is running in parallel.
Instead, in ESPResSo the user has to specify what information needs to be saved to a file to be able to restore the simulation state. When floating point numbers are stored in text files (the particle positions), there is only a limited precision. Therefore, it is not possible to bitwise reproduce a simulation state using this text files. When you need bitwise reproducibility, you will have to use checkpointing , which stores positions, forces and velocities in binary format.
14.2. (Almost) generic checkpointing in Python¶
Referring to the previous section, generic checkpointing poses difficulties in many ways. Fortunately, the Python checkpointing module presented in this section provides a comfortable workflow for an almost generic checkpointing.
The idea is to let the user initially define which data is of interest for checkpointing and thus solve the above mentioned problem. Once this is done, checkpoints can then be saved simply by calling one save function.
The checkpoint data can then later be restored easily by calling one load function that will automatically process the checkpoint data by setting the user variables and the checkpointed properties in .
In addition, the checkpointing module is also able to catch signals that are invoked for example when the simulation is aborted by the user or by a timeout.
The checkpointing module can be imported with:
from espressomd import checkpointing
[ checkpoint_path= ]
Determines the identifier for a checkpoint. Legal characters for an id are “0-9”, “a-zA-Z”, “-“, “_”.
Specifies the relative or absolute path where the checkpoints are stored.
For example checkpoint = checkpointing.Checkpoint(checkpoint_id="mycheckpoint")
would create the new checkpoint with id “mycheckpoint” and all the
checkpointing data will be stored in the current directory.
After the system and checkpointing user variables are set up they can be registered for checkpointing. Name the string of the object or user variable that should be registered for checkpointing.
To give an example:
myvar = "some variable value"
skin = 0.4
checkpoint.register("myvar")
checkpoint.register("skin")
system = espressomd.System(box_l=[100.0, 100.0, 100.0])
# ... set system properties like time_step here ...
checkpoint.register("system")
system.thermostat.set_langevin(kT=1.0, gamma=1.0)
checkpoint.register("system.thermostat")
# ... set system.non_bonded_inter here ...
checkpoint.register("system.non_bonded_inter")
# ... add particles to the system with system.part.add(...) here ...
checkpoint.register("system.part")
# ... set charges of particles here ...
from espressomd import electrostatics
p3m = electrostatics.P3M(prefactor=1.0, accuracy=1e-2)
system.actors.add(p3m)
checkpoint.register("p3m")
will register the user variables skin
and myvar
, system properties, a
Langevin thermostat, non-bonded interactions, particle properties and a p3m
object for checkpointing. It is important to note that the checkpointing of
ESPResSo will only save basic system properties. This excludes for example the
system thermostat or the particle data. For this reason one has to explicitly
register and for checkpointing.
Analogous to this, objects that have been registered for checkpointing but are
no longer needed in the next checkpoints can be unregistered with checkpoint
unregister var
. A list of all registered object names can be generated with
checkpoint get_registered_objects
. A new checkpoint with a consecutive
index that contains the latest data of the registered objects can then be
created by calling checkpoint save [checkpoint_index]
.
An existing checkpoint can be loaded with checkpoint load
[checkpoint_index]
.
If no is passed the last checkpoint will be loaded. Concerning the procedure of registering objects for checkpointing it is good to know that all registered objects saved in a checkpoint will be automatically re-registered after loading this checkpoint.
In practical implementations it might come in handy to check if there are any
available checkpoints for a given checkpoint id. This can be done with
checkpoint has_checkpoints
which returns a bool value.
As mentioned in the introduction, the checkpointing module also enables
to catch signals in order to save a checkpoint and quit the simulation.
Therefore one has to register the signal which should be caught with
checkpoint register_signal signum=int_number
.
The registered signals are associated with the checkpoint id and will be automatically re-registered when the same checkpoint id is used later.
Following the example above, the next example loads the last checkpoint, restores the state of all checkpointed objects and registers a signal.
import espressomd
from espressomd import checkpointing
import signal
checkpoint = checkpointing.Checkpoint(checkpoint_id="mycheckpoint")
checkpoint.load()
system = espressomd.System(box_l=[100.0, 100.0, 100.0])
system.cell_system.skin = skin
system.actors.add(p3m)
# signal.SIGINT: signal 2, is sent when ctrl+c is pressed
checkpoint.register_signal(signal.SIGINT)
# integrate system until user presses ctrl+c while True:
system.integrator.run(1000)
The above example runs as long as the user interrupts by pressing ctrl+c. In this case a new checkpoint is written and the simulation quits.
It is perhaps surprising that one has to explicitly create System
again.
But this is necessary as not all ESPResSo modules like cellsystem
or
actors
have implementations for checkpointing yet. By calling System
these modules
are created and can be easily initialized with checkpointed user variables
(like skin
) or checkpointed submodules (like p3m
).
14.3. Writing H5MD-files¶
For large amounts of data it’s a good idea to store it in the hdf5 (H5MD is based on hdf5) file format (see https://www.hdfgroup.org/ for details). Currently ESPResSo supports some basic functions for writing simulation data to H5MD files. The implementation is MPI-parallelized and is capable of dealing with varying numbers of particles.
To write data in a hdf5-file according to the H5MD proposal (http://nongnu.org/h5md/), first an object of the class
espressomd.io.writer.h5md.H5md
has to be created and linked to the
respective hdf5-file. This may, for example, look like:
from espressomd.io.writer import h5md
system = espressomd.System(box_l=[100.0, 100.0, 100.0])
# ... add particles here
h5 = h5md.H5md(filename="trajectory.h5", write_pos=True, write_vel=True)
If a file with the given filename exists and has a valid H5MD structures
it will be backed up to a file with suffix “.bak”. This file will be
removed by the close()
method of the class which has to be called at the
end of the simulation to close the file. The current implementation
allows to write the following properties: positions, velocities, forces,
species (ESPResSo types), and masses of the particles. In order to write any property, you
have to set the respective boolean flag as an option to the H5md
class.
Currently available:
write_pos
: particle positionswrite_vel
: particle velocitieswrite_force
: particle forceswrite_species
: particle typeswrite_mass
: particle masseswrite_ordered
: if particles should be written ordered according to their id (implies serial write).
In simulations with varying numbers of particles (MC or reactions), the
size of the dataset will be adapted if the maximum number of particles
increases but will not be decreased. Instead a negative fill value will
be written to the trajectory for the id. If you have a parallel
simulation please keep in mind that the sequence of particles in general
changes from timestep to timestep. Therefore you have to always use the
dataset for the ids to track which position/velocity/force/type/mass
entry belongs to which particle. To write data to the hdf5 file, simply
call the H5md objects espressomd.io.writer.h5md.H5md.write()
method without any arguments.
h5.write()
After the last write call, you have to call the close()
method to remove
the backup file and to close the datasets etc.
H5MD files can be read and modified with the python module h5py (for documentation see h5py). For example all positions stored in the file called “h5mdfile.h5” can be read using
import h5py
h5file = h5py.File("h5mdfile.h5", 'r')
positions = h5file['particles/atoms/position/value']
Further the files can be inspected with the GUI tool hdfview.
14.4. Writing MPI-IO binary files¶
This method outputs binary data in parallel and is, thus, also suitable for large-scale simulations. Generally, H5MD is the preferred method because the data is easier accessible. In contrast to H5MD, the MPI-IO functionality outputs data in a machine-dependent format but has write and read capabilities. The usage is quite simple:
from espressomd.io.mppiio import mpiio
system = espressomd.System()
# ... add particles here
mpiio.write("/tmp/mydata", positions=True, velocities=True, types=True, bonds=True)
Here, /tmp/mydata
is the prefix used for several files. The call will output
particle positions, velocities, types and their bonds to the following files in
folder /tmp
:
mydata.head
mydata.id
mydata.pos
mydata.pref
mydata.type
mydata.vel
mydata.boff
mydata.bond
Depending on the chosen output, not all of these files might be created.
To read these in again, simply call espressomd.io.mpiio.Mpiio.read()
. It has the same signature as
espressomd.io.mpiio.Mpiio.write()
.
There exists a legacy python script in the tools
directory which can convert
MPI-IO data to the now unsupported blockfile format. Check it out if you want
to post-process the data without ESPResSo.
WARNING Do not attempt to read these binary files on a machine with a different architecture!
14.5. Writing VTF files¶
The formats VTF (VTF Trajectory Format), VSF (VTF Structure Format) and VCF (VTF Coordinate Format) are formats for the visualization software VMD: [HDS96]. They are intended to be human-readable and easy to produce automatically and modify.
The format distinguishes between structure blocks that contain the topological information of the system (the system size, particle names, types, radii and bonding information, amongst others), while coordinate blocks (a.k.a. as timestep blocks) contain the coordinates for the particles at a single timestep. For a visualization with VMD, one structure block and at least one coordinate block is required.
Files in the VSF format contain a single structure block, files in the VCF format contain at least one coordinate block, while files in the VTF format contain a single structure block (usually as a header) and an arbitrary number of coordinate blocks (time frames) afterwards, thus allowing to store all information for a whole simulation in a single file. For more details on the format, refer to the VTF homepage (https://github.com/olenz/vtfplugin/wiki).
Creating files in these formats from within is supported by the commands espressomd.io.writer.vtf.writevsf()
and espressomd.io.writer.vtf.writevcf()
, that write a structure and coordinate block (respectively) to the
given file. To create a standalone VTF file, first use writevsf
at the beginning of
the simulation to write the particle definitions as a header, and then writevcf
to generate a timeframe of the simulation state. For example:
A standalone VTF file can simply be
import espressomd
from espressomd.io.writer import vtf
system = espressomd.System(box_l=[100.0, 100.0, 100.0])
fp = open('trajectory.vtf', mode='w+t')
# ... add particles here
# write structure block as header
vtf.writevsf(system, fp)
# write initial positions as coordinate block
vtf.writevcf(system, fp)
# integrate and write the frame
for n in num_steps:
system.integrator.run(100)
vtf.writevcf(system, fp)
fp.close()
The structure definitions in the VTF/VSF formats are incremental, the user can easily add further structure lines to the VTF/VSF file after a structure block has been written to specify further particle properties for visualization.
Note that the ids
of the particles in ESPResSo and VMD may differ. VMD requires
the particle ids to be enumerated continuously without any holes, while
this is not required in ESPResSo. When using writevsf
and writevcf
, the particle ids are
automatically translated into VMD particle ids. The function allows the
user to get the VMD particle id for a given ESPResSo particle id.
One can specify the coordinates of which particles should be written using types
.
If types='all'
is used, all coordinates will be written (in the ordered timestep format).
Otherwise, has to be a list specifying the pids of the particles.
Also note, that these formats can not be used to write trajectories where the number of particles or their types varies between the timesteps. This is a restriction of VMD itself, not of the format.
14.5.1. writevsf
: Writing the topology¶
espressomd.io.writer.vtf.writevsf()
Writes a structure block describing the system’s structure to the given channel, for example:
import espressomd
from espressomd.io.writer import vtf
system = espressomd.System(box_l=[100.0, 100.0, 100.0])
# ... add particles here
fp = open('trajectory.vsf', mode='w+t')
vtf.writevsf(system, fp, types='all')
The output of this command can be used for a standalone VSF file, or at the beginning of a VTF file that contains a trajectory of a whole simulation.
14.5.2. writevcf
: Writing the coordinates¶
espressomd.io.writer.vtf.writevcf()
Writes a coordinate (or timestep) block that contains all coordinates of the system’s particles.
import espressomd
from espressomd.io.writer import vtf
system = espressomd.System(box_l=[100.0, 100.0, 100.0])
# ... add particles here
fp = open('trajectory.vcf', mode='w+t')
vtf.writevcf(system, fp, types='all')
14.5.3. espressomd.io.writer.vtf.vtf_pid_map()
¶
Generates a dictionary which maps ESPResSo particle id
to VTF indices.
This is motivated by the fact that the list of ESPResSo particle id
is allowed to contain holes but VMD
requires increasing and continuous indexing. The ESPResSo id
can be used as key to obtain the VTF index as the value, for example:
import espressomd
from espressomd.io.writer import vtf
system = espressomd.System(box_l=[100.0, 100.0, 100.0])
system.part.add(id=5, pos=[0, 0, 0])
system.part.add(id=3, pos=[0, 0, 0])
vtf_index = vtf.vtf_pid_map(system)
vtf_index[3]
Note that the ESPResSo particles are ordered in increasing order, thus id=3
corresponds to the zeroth VTF index.
14.6. Writing various formats using MDAnalysis¶
If the MDAnalysis package (http://mdanalysis.org) is installed, it is possible to use it to convert frames to any of the supported configuration/trajectory formats, including PDB, GROMACS, GROMOS, CHARMM/NAMD, AMBER, LAMMPS, …)
To use MDAnalysis to write in any of these formats, one has first to prepare a stream from
the ESPResSo particle data using the class espressomd.MDA_ESP
, and then read from it
using MDAnalysis. A simple example is the following:
import espressomd
import MDAnalysis as mda
from espressomd import MDA_ESP
system = espressomd.System(box_l=[100.0, 100.0, 100.0])
# ... add particles here
eos = MDA_ESP.Stream(system) # create the stream
u = mda.Universe(eos.topology, eos.trajectory) # create the MDA universe
# example: write a single frame to PDB
u.atoms.write("system.pdb")
# example: save the trajectory to GROMACS format
from MDAnalysis.coordinates.TRR import TRRWriter
W = TRRWriter("traj.trr", n_atoms=len(system.part)) # open the trajectory file
for i in range(100):
system.integrator.run(1)
u.load_new(eos.trajectory) # load the frame to the MDA universe
W.write_next_timestep(u.trajectory.ts) # append it to the trajectory
For other examples see samples/python/MDAnalysisIntegration.py
14.7. Parsing PDB Files¶
The feature allows the user to parse simple PDB files, a file format introduced by the protein database to encode molecular structures. Together with a topology file (here ) the structure gets interpolated to the grid. For the input you will need to prepare a PDB file with a force field to generate the topology file. Normally the PDB file extension is .pdb
, the topology file extension is .itp
. Obviously the PDB file is placed instead of and the topology file instead of .