ChemNetworks Code Workflow#
The workflow for ChemNetworks has three major phases, executed in order:
- Input file parsing
- Input file validation
- Timestep execution (repeated per frame, distributed via MPI)
Input File Parsing#
ChemNetworks begins execution in src/main.cpp, which initializes MPI, parses command line arguments, and calls
ChemNetworks::run() in src/chemnetworks.hpp.
The input file is read during command line argument parsing. Program_Options (src/io/program_options.hpp) handles
the -i <file> flag by passing the file path to Read_Input_File::parse_file() (src/io/read_input_file.hpp).
File Format#
The input file uses a block-based format. Blocks are denoted by square brackets (e.g., [SETTINGS]), and commands
within a block are key-value pairs separated by =. Lines beginning with # are comments. Empty lines are ignored.
Flattening into Dot-Separated Keys#
During parsing, each command is prepended with its enclosing block name to form a dot-separated key. For example,
TIMESTEPS = 0 -1 inside the [SETTINGS] block becomes the key-value pair SETTINGS.TIMESTEPS = 0 -1. Sub-commands
with dots in their names (e.g., PBC.DIM) are similarly expanded: PBC.DIM = 1 2 3 inside [TRAJECTORY] becomes
TRAJECTORY.PBC.DIM = 1 2 3.
Block headers themselves can also contain dots. For example, [GRAPH.WRITE] sets the current command block to
GRAPH.WRITE, so commands under that header are prepended with GRAPH.WRITE. instead of GRAPH.. This means that:
produces the same keys as:
Both forms flatten to GRAPH.WRITE.EDGES and GRAPH.WRITE.EDGES.ATTRIBUTES.EDGES. Dot-separated block headers are
a convenience for grouping related commands without repeating a common prefix.
The result is an ordered list of (key, value) string pairs stored in input_file_options. This list preserves the
order of the input file, which is important because blocks like [GRAPH] depend on blocks like [TRAJECTORY] and
[ZMATRIX] being defined first.
Building the Component Tree#
Settings::initialize() (src/input_file/settings.hpp) drives the remaining initialization. It calls
add_default_components() to register default components, then add_runtime_components() to process the parsed input
file options.
add_runtime_components() iterates through the input_file_options list and builds the component tree. For each
entry:
- If the command is
NAME, a new top-level command block is created using theComponent_Factoryand added to theexecution_orderlist. - The command is routed to either
Settingsitself (forSETTINGS.*commands) or the appropriate command block composite (forTRAJECTORY.*,GRAPH.*, etc.) viaadd_runtime_component().
The Composite::add_runtime_component() method (src/input_file/composite.hpp) recursively walks the dot-separated
key, creating or locating the appropriate Component at each level. Components are registered with the
Component_Factory as one of three types:
TYPE_SINGLE: Only one instance allowed per parent (e.g.,TRAJECTORY.TYPE)TYPE_LIST: Multiple instances allowed, stored in order (e.g.,ZMATRIX.ROW)TYPE_MAP: Multiple named instances, keyed by value (e.g.,GRAPH.ZMATRIX_SEARCH)
Each component that is not a default component is appended to the global execution_order list as it is created.
Input File Validation#
After the component tree is built, Settings::initialize() calls validate(). Validation proceeds in two stages:
-
Settings::validate()callsComposite::validate()on itself, which iterates through all child components invalidation_order(the order in which components were registered with the factory). Each component validates its ownvaluestring, parsing it into typed member variables. -
Settings::validate()then explicitly validates eachTRAJECTORY,ZMATRIX, andGRAPHblock by callingvalidate()on each named command block in their respectivecmmmaps.
Individual component validation typically follows this pattern:
- Parse the
valuestring (set duringadd_runtime_component) - Verify the parsed values are valid (e.g., referenced trajectories exist, numeric ranges are correct)
- Call
set_vm()to store the parsed values in the component's type-safe value map (vm) - Log the parsed result at debug level
For example, SETTINGS_TIMESTEPS parses its value string into initial, final, and stride integers, validates
that initial >= 0, stride > 0, and initial <= final, then stores them via set_vm().
Timestep Execution#
Once initialization is complete, ChemNetworks::run() iterates through each requested timestep. Timesteps are
distributed across MPI ranks using round-robin assignment.
For each timestep, a Timestep object (src/timestep.hpp) is created with a fresh Data container
(src/data/data.hpp). The Data object holds a map of named igraph objects and the current timestep index.
Three-Pass Execution#
The Timestep::execute() method runs three passes over the execution_order list:
Pass 1: Set Objects#
Each component creates its internal command objects and retrieves references to shared resources (trajectories, graphs,
configuration values). For example, GRAPH_ADD_NODES_TRAJECTORY creates its Read_Trajectory, Write_Node_Graph,
and PBC_set command objects during this pass.
Pass 2: Set Listeners#
Components wire up the Observer/Subject relationships between their command objects. For example,
GRAPH_ADD_NODES_TRAJECTORY attaches its Write_Node_Graph observer to its Read_Trajectory subject, so that when
the trajectory reader processes atoms, the graph writer is notified and adds corresponding nodes.
Pass 3: Execute#
for (const auto& component : settings->execution_order) {
component->execute(data);
if (data->end_of_trajectory) {
break;
}
}
Each component executes its core logic for the current timestep. Execution stops early if any component signals
end-of-trajectory. After all components execute, Data::destroy_igraphs() cleans up the igraph objects for that
timestep.
Structure Search#
ChemNetworks identifies structures within the trajectory using a recursive algorithm, then creates edges between atoms within the identified structure. Structures are defined by the user through a ZMATRIX block, which specifies atom types and connectivity parameters (bond lengths, angles, dihedrals) as ranges.
The search iterates through the Z-matrix rows recursively, attempting to assign a particle to each row. As the search is recursive, the complexity scales as $O(n^r)$, where $n$ is the number of particles and $r$ is the number of rows in the Z-matrix. When a set of atoms matches all Z-matrix parameters, an edge is created between the selected atoms in the graph (see: GRAPH.ADD.EDGES.ZMATRIX).
Implementation#
The search is implemented in ZMatrix_Search (src/commands/zmatrix_search.hpp). The core algorithm is the
recursively_build_adj_tensor() method, which walks through each row of the Z-matrix and attempts to assign a particle
to that row.
Verlet Neighbor Lists#
Before the recursive search begins, a Verlet neighbor list is constructed by Generate_Verlet_Neighbor_List
(src/commands/generate_verlet_neighbor_list.hpp). The neighbor lists are keyed by command name (e.g., "dist"). For
each Z-matrix command type, the range of acceptable values for that command, across all Z-matrices in the graph block,
is determined. A neighbor list is then constructed for each command, where the command's value for a given node and a
neighbor falls within that acceptable range. Pairs that don't fall within the range are excluded from the neighbor list.
During the recursive search, when the algorithm needs to find candidate particles for a given Z-matrix row, it looks up
the neighbor list for that row's two-body command and iterates only over the neighbors of the previously matched
particle. This avoids iterating over all particles in the system for each row, significantly speeding up the search.
Periodic Boundaries#
When periodic boundary conditions are enabled, the PBC_set command computes an upper-triangular box matrix from the
cell parameters and stores it as graph-level attributes (Lx, Ly, Lz, xy, xz, yz). The minimum image
convention is then applied at every distance calculation — in the neighbor list construction, the Z-matrix search
commands, and post-search edge attribute calculations — using GraphUtils::minimum_image(). This supports both
orthogonal and triclinic cells without creating additional nodes.
Observer Notification#
ZMatrix_Search extends ZMatrix_Subject and notifies attached observers at two points:
notify_zmatrix()is called each time a complete structure match is found. This provides per-match data (the matched node indices and their Z-matrix values) to observers such as the Z-matrix file writer.notify_zmatrix_results()is called once after the entire search completes, providing the full set of matched edges and values. This is used byGraph_Add_Edges_ZMatrixto create edges in the graph.
Data Flow Summary#
The three-pass structure ensures that all objects are created before any listeners are attached, and all listeners are attached before any component executes. This ordering guarantees that the Observer/Subject wiring is complete before data flows through the system.
A typical data flow for a single timestep looks like:
GRAPH_ADD_NODES_TRAJECTORYreads the trajectory frame and builds graph nodesGRAPH_ADD_ATTRIBUTE_NODE_*components add attributes to the nodesGRAPH_ZMATRIX_SEARCHsearches for structures using the minimum image convention for PBCGRAPH_ADD_EDGES_ZMATRIXcreates edges between matched atomsGRAPH_ADD_ATTRIBUTE_EDGE_*components add attributes to the edgesGRAPH_ANALYSISruns graph analyses (e.g., degree, modularity)GRAPH_WRITE_*components write output files