GROMACS Trajectory Tutorial#

This tutorial explains in detail the 1k-water-100-frame_groxtc example input file, covering how ChemNetworks reads GROMACS trajectories and reduces a per-atom hydrogen bonding graph to a per-molecule network.

The System#

The trajectory is a box of 1,000 water molecules using a three-point model (3,000 atoms) simulated in GROMACS and stored across two files: a .gro topology file and a .xtc binary coordinate file. The .gro file supplies the molecular structure and atom naming for a single frame, while the .xtc file supplies the coordinates for all 100 frames. Read together as a groxtc trajectory, each frame provides the mol_id, mol_type, atom_id, atom_type,x, y, and z node attributes.

The first lines of the .gro file show the GROMACS atom naming convention, where every water residue (SOL) contains one oxygen (OW) and two distinct hydrogens (HW1, HW2):

SOL
 3000
    1SOL     OW    1   1.656   1.573   2.451  0.5751 -0.1557 -0.9540
    1SOL    HW1    2   1.730   1.516   2.429  0.1127 -1.2115  0.1140
    1SOL    HW2    3   1.599   1.568   2.374  1.1211 -0.1550 -1.3691
    2SOL     OW    4   1.287   2.932   2.616 -0.8504 -0.1453 -0.0601
    ...

The goal is to find every water-water hydrogen bond, build a graph in which each water molecule is a single node and each hydrogen bond is an edge, and then compute graph-theoretic properties (degree, betweenness, connected components) for every molecule.

SETTINGS Block#

[SETTINGS]
TIMESTEPS = 0 -1

TIMESTEPS = 0 -1 analyzes from the first frame through the last available frame, using the default stride of 1. The -1 placeholder for the final timestep avoids hardcoding the frame count, so every frame in the .xtc file is processed.

TRAJECTORY Block#

[TRAJECTORY]
NAME = 1k_water_100_frame_groxtc_example
FILE = production.gro production.xtc
TYPE = groxtc
SCALE = 10
PBC.DIM = read

NAME = 1k_water_100_frame_groxtc_example assigns an identifier that is referenced later by the ADD.NODES.TRAJECTORY command.

FILE = production.gro production.xtc lists two files. The groxtc reader combines them: the .gro file provides the per-atom structure and naming, and the .xtc file provides the coordinates for each frame. The order is significant, with the .gro topology first and the .xtc coordinates second.

TYPE = groxtc selects the combined GROMACS reader. Unlike a standalone .gro file, the groxtc type does not provide velocities even when they are present in the .gro file, because the coordinates are taken from the .xtc frames.

SCALE = 10 multiplies every coordinate by 10. GROMACS stores coordinates in nanometers, but the Z-matrix distance constraints below are written in angstroms. Scaling by 10 converts the trajectory to angstroms so that the dist ranges (such as 1.2 to 2.5) are interpreted in the correct unit.

PBC.DIM = read reads the simulation box dimensions from the trajectory each frame. For a groxtc trajectory, the box vectors are read from the .xtc frames. ChemNetworks then applies the minimum image convention to all distance calculations, correctly handling hydrogen bonds that span a periodic boundary.

ZMATRIX Block#

[ZMATRIX]
NAME = Water

ROW = $H
ROW = $O 1 $HB
ROW = $H 2 $CB 1 $ANG

$H = attr atom_name H
$O = attr atom_name O
$HB = dist 1.2 2.5
$CB = dist 0.0 1.2
$ANG = angle 135 180

This three-row ZMATRIX block describes a single hydrogen bond between two water molecules. Reading the rows:

Row 1 defines a hydrogen atom ($H).
Row 2 defines an oxygen ($O) within $HB (1.2 to 2.5 angstroms) of the row 1 hydrogen. This is the hydrogen bond itself, the acceptor oxygen of one molecule approached by the donor hydrogen of another.
Row 3 defines a second hydrogen ($H) within $CB (0.0 to 1.2 angstroms) of the row 2 oxygen, forming an angle $ANG (135 to 180 degrees) at that oxygen between the covalently bonded hydrogen (row 3) and the hydrogen-bonded hydrogen (row 1). The short $CB distance is the covalent O-H bond, and the wide angle keeps the hydrogen bond close to linear.

The $VAR = ... lines define each variable. attr atom_name H matches nodes whose atom_name attribute equals H. Note that the trajectory itself has no atom_name attribute, only the GROMACS atom_type values OW, HW1, and HW2. The atom_name attribute is created in the GRAPH block below, before the search runs. dist 0.0 1.2 matches pairs of nodes within that distance range, and angle 135 180 matches triplets forming an angle in that range. See Z-matrix Commands for all available commands.

GRAPH Block#

[GRAPH]
NAME = OH
ADD.NODES.TRAJECTORY = 1k_water_100_frame_groxtc_example

ADD.ATTRIBUTE.NODE.NODE = atom_type OW atom_name O
ADD.ATTRIBUTE.NODE.NODE = atom_type HW1 atom_name H
ADD.ATTRIBUTE.NODE.NODE = atom_type HW2 atom_name H

ZMATRIX_SEARCH = Water
ADD.EDGES.ZMATRIX = Water 2 1

ANALYSIS = merge_equal_nodes mol_id atom_type:ignore atom_name:ignore first
ANALYSIS = degree
ANALYSIS = betweenness node
ANALYSIS = connected_components

WRITE.EDGES = ./edg_files/edges
WRITE.NODES = ./nds_files/nodes
WRITE.NODES.ATTRIBUTES = degree betweenness connected_components

WRITE.ZMATRIX = Water ./zmt_files/zmatrices

ADD.NODES.TRAJECTORY = 1k_water_100_frame_groxtc_example populates the graph with all 3,000 atoms from the trajectory defined earlier. Each atom becomes a node carrying its trajectory attributes (mol_id, mol_type, atom_type, atom_id, x, y, z).

Unifying The Hydrogen Names#

ADD.ATTRIBUTE.NODE.NODE = atom_type OW atom_name O
ADD.ATTRIBUTE.NODE.NODE = atom_type HW1 atom_name H
ADD.ATTRIBUTE.NODE.NODE = atom_type HW2 atom_name H

GROMACS distinguishes the two hydrogens in each water molecule as HW1 and HW2, but the Z-matrix needs to treat them identically. Each ADD.ATTRIBUTE.NODE.NODE command sets a new atom_name attribute on every node matching a given atom_type: OW becomes O, while both HW1 and HW2 become H. After these three commands, the Z-matrix variables $O and $H can match against the single atom_name attribute regardless of the original GROMACS naming.

Because the search reads the atom_name attribute, these commands must appear before ZMATRIX_SEARCH. See Command Ordering for the full set of ordering rules.

Building Edges From The Search#

ZMATRIX_SEARCH = Water runs the hydrogen bond template against the graph, recording every set of atoms that satisfies it along with the calculated distances and angle.

ADD.EDGES.ZMATRIX = Water 2 1 creates an edge for every match between the node at row 2 (the acceptor oxygen) and the node at row 1 (the donor hydrogen). At this stage the graph is still per-atom, so each edge connects an oxygen to a hydrogen.

Reducing To A Per-Molecule Network#

ANALYSIS = merge_equal_nodes mol_id atom_type:ignore atom_name:ignore first
ANALYSIS = degree
ANALYSIS = betweenness node
ANALYSIS = connected_components

merge_equal_nodes mol_id ... contracts every node sharing a mol_id value into a single node, so each of the 1,000 water molecules collapses to one node and the per-atom hydrogen bond edges become molecule-to-molecule edges. The attribute combination atom_type:ignore atom_name:ignore first drops the now-ambiguous atom_type and atom_name attributes and keeps the remaining attributes from the first atom of each molecule.

The three remaining analyses operate on this per-molecule network. degree records each molecule's number of hydrogen bonds in the degree node attribute. betweenness node records node betweenness centrality in the betweenness attribute. connected_components labels each molecule with the ID of the hydrogen bonded cluster it belongs to in the connected_components attribute.

Writing Output#

WRITE.EDGES = ./edg_files/edges
WRITE.NODES = ./nds_files/nodes
WRITE.NODES.ATTRIBUTES = degree betweenness connected_components

WRITE.ZMATRIX = Water ./zmt_files/zmatrices

Unlike the Water-LiCl example, the write commands here appear directly under the [GRAPH] block rather than in a [GRAPH.WRITE] sub-block. Both forms are equivalent.

WRITE.EDGES = ./edg_files/edges writes the molecule-to-molecule edge list to ./edg_files/edges.0.edg, ./edg_files/edges.1.edg, and so on. WRITE.NODES = ./nds_files/nodes writes one node file per frame, and WRITE.NODES.ATTRIBUTES = degree betweenness connected_components restricts the node files to the three analysis results.

WRITE.ZMATRIX = Water ./zmt_files/zmatrices writes the raw hydrogen bond search results, including the matched atom indices and the calculated distances and angle for every match at each frame.

Running the Example#

From the example directory:

chemnetworks -i input.cn

for serial execution, or:

mpirun -np 4 chemnetworks -i input.cn

for parallel execution using MPI, which distributes the 100 frames across the available processes. The output is identical regardless of parallelization:

edg_files/
    edges.0.edg ... edges.99.edg
nds_files/
    nodes.0.nds ... nodes.99.nds
zmt_files/
    zmatrices.0.zmt ... zmatrices.99.zmt