Skip to content

ChemNetworks Code Workflow#

The workflow for ChemNetworks has three major phases, executed in order:

  1. Input file parsing
  2. Input file validation
  3. Timestep execution (repeated per frame, distributed via MPI)

Input File Parsing#

ChemNetworks begins execution in src/main.cpp, which initializes MPI, parses command line arguments, and calls ChemNetworks::run() in src/chemnetworks.hpp.

The input file is read during command line argument parsing. Program_Options (src/io/program_options.hpp) handles the -i <file> flag by passing the file path to Read_Input_File::parse_file() (src/io/read_input_file.hpp).

File Format#

The input file uses a block-based format. Blocks are denoted by square brackets (e.g., [SETTINGS]), and commands within a block are key-value pairs separated by =. Lines beginning with # are comments. Empty lines are ignored.

[SETTINGS]
TIMESTEPS = 0 -1

[TRAJECTORY]
NAME = my_trajectory
FILE = trajectory.xyz
TYPE = xyz

Flattening into Dot-Separated Keys#

During parsing, each command is prepended with its enclosing block name to form a dot-separated key. For example, TIMESTEPS = 0 -1 inside the [SETTINGS] block becomes the key-value pair SETTINGS.TIMESTEPS = 0 -1. Sub-commands with dots in their names (e.g., PBC.DIM) are similarly expanded: PBC.DIM = 1 2 3 inside [TRAJECTORY] becomes TRAJECTORY.PBC.DIM = 1 2 3.

Block headers themselves can also contain dots. For example, [GRAPH.WRITE] sets the current command block to GRAPH.WRITE, so commands under that header are prepended with GRAPH.WRITE. instead of GRAPH.. This means that:

[GRAPH.WRITE]
EDGES = ./edg_files/edges
EDGES.ATTRIBUTES.EDGES = dist

produces the same keys as:

[GRAPH]
WRITE.EDGES = ./edg_files/edges
WRITE.EDGES.ATTRIBUTES.EDGES = dist

Both forms flatten to GRAPH.WRITE.EDGES and GRAPH.WRITE.EDGES.ATTRIBUTES.EDGES. Dot-separated block headers are a convenience for grouping related commands without repeating a common prefix.

The result is an ordered list of (key, value) string pairs stored in input_file_options. This list preserves the order of the input file, which is important because blocks like [GRAPH] depend on blocks like [TRAJECTORY] and [ZMATRIX] being defined first.

Building the Component Tree#

Settings::initialize() (src/input_file/settings.hpp) drives the remaining initialization. It calls add_default_components() to register default components, then add_runtime_components() to process the parsed input file options.

add_runtime_components() iterates through the input_file_options list and builds the component tree. For each entry:

  1. If the command is NAME, a new top-level command block is created using the Component_Factory and added to the execution_order list.
  2. The command is routed to either Settings itself (for SETTINGS.* commands) or the appropriate command block composite (for TRAJECTORY.*, GRAPH.*, etc.) via add_runtime_component().

The Composite::add_runtime_component() method (src/input_file/composite.hpp) recursively walks the dot-separated key, creating or locating the appropriate Component at each level. Components are registered with the Component_Factory as one of three types:

  • TYPE_SINGLE: Only one instance allowed per parent (e.g., TRAJECTORY.TYPE)
  • TYPE_LIST: Multiple instances allowed, stored in order (e.g., ZMATRIX.ROW)
  • TYPE_MAP: Multiple named instances, keyed by value (e.g., GRAPH.ZMATRIX_SEARCH)

Each component that is not a default component is appended to the global execution_order list as it is created.

Input File Validation#

After the component tree is built, Settings::initialize() calls validate(). Validation proceeds in two stages:

  1. Settings::validate() calls Composite::validate() on itself, which iterates through all child components in validation_order (the order in which components were registered with the factory). Each component validates its own value string, parsing it into typed member variables.

  2. Settings::validate() then explicitly validates each TRAJECTORY, ZMATRIX, and GRAPH block by calling validate() on each named command block in their respective cmm maps.

Individual component validation typically follows this pattern:

  1. Parse the value string (set during add_runtime_component)
  2. Verify the parsed values are valid (e.g., referenced trajectories exist, numeric ranges are correct)
  3. Call set_vm() to store the parsed values in the component's type-safe value map (vm)
  4. Log the parsed result at debug level

For example, SETTINGS_TIMESTEPS parses its value string into initial, final, and stride integers, validates that initial >= 0, stride > 0, and initial <= final, then stores them via set_vm().

Timestep Execution#

Once initialization is complete, ChemNetworks::run() iterates through each requested timestep. Timesteps are distributed across MPI ranks using round-robin assignment.

For each timestep, a Timestep object (src/timestep.hpp) is created with a fresh Data container (src/data/data.hpp). The Data object holds a map of named igraph objects and the current timestep index.

Three-Pass Execution#

The Timestep::execute() method runs three passes over the execution_order list:

Pass 1: Set Objects#

for (const auto& component : settings->execution_order) {
    component->set_objects(data);
}

Each component creates its internal command objects and retrieves references to shared resources (trajectories, graphs, configuration values). For example, GRAPH_ADD_NODES_TRAJECTORY creates its Read_Trajectory, Write_Node_Graph, and PBC_set command objects during this pass.

Pass 2: Set Listeners#

for (const auto& component : settings->execution_order) {
    component->set_listeners(data);
}

Components wire up the Observer/Subject relationships between their command objects. For example, GRAPH_ADD_NODES_TRAJECTORY attaches its Write_Node_Graph observer to its Read_Trajectory subject, so that when the trajectory reader processes atoms, the graph writer is notified and adds corresponding nodes.

Pass 3: Execute#

for (const auto& component : settings->execution_order) {
    component->execute(data);
    if (data->end_of_trajectory) {
        break;
    }
}

Each component executes its core logic for the current timestep. Execution stops early if any component signals end-of-trajectory. After all components execute, Data::destroy_igraphs() cleans up the igraph objects for that timestep.

ChemNetworks identifies structures within the trajectory using a recursive algorithm, then creates edges between atoms within the identified structure. Structures are defined by the user through a ZMATRIX block, which specifies atom types and connectivity parameters (bond lengths, angles, dihedrals) as ranges.

The search iterates through the Z-matrix rows recursively, attempting to assign a particle to each row. As the search is recursive, the complexity scales as $O(n^r)$, where $n$ is the number of particles and $r$ is the number of rows in the Z-matrix. When a set of atoms matches all Z-matrix parameters, an edge is created between the selected atoms in the graph (see: GRAPH.ADD.EDGES.ZMATRIX).

Implementation#

The search is implemented in ZMatrix_Search (src/commands/zmatrix_search.hpp). The core algorithm is the recursively_build_adj_tensor() method, which walks through each row of the Z-matrix and attempts to assign a particle to that row.

Verlet Neighbor Lists#

Before the recursive search begins, a Verlet neighbor list is constructed by Generate_Verlet_Neighbor_List (src/commands/generate_verlet_neighbor_list.hpp). The neighbor lists are keyed by command name (e.g., "dist"). For each Z-matrix command type, the range of acceptable values for that command, across all Z-matrices in the graph block, is determined. A neighbor list is then constructed for each command, where the command's value for a given node and a neighbor falls within that acceptable range. Pairs that don't fall within the range are excluded from the neighbor list. During the recursive search, when the algorithm needs to find candidate particles for a given Z-matrix row, it looks up the neighbor list for that row's two-body command and iterates only over the neighbors of the previously matched particle. This avoids iterating over all particles in the system for each row, significantly speeding up the search.

Periodic Boundaries#

When periodic boundary conditions are enabled, the PBC_set command computes an upper-triangular box matrix from the cell parameters and stores it as graph-level attributes (Lx, Ly, Lz, xy, xz, yz). The minimum image convention is then applied at every distance calculation — in the neighbor list construction, the Z-matrix search commands, and post-search edge attribute calculations — using GraphUtils::minimum_image(). This supports both orthogonal and triclinic cells without creating additional nodes.

Observer Notification#

ZMatrix_Search extends ZMatrix_Subject and notifies attached observers at two points:

  • notify_zmatrix() is called each time a complete structure match is found. This provides per-match data (the matched node indices and their Z-matrix values) to observers such as the Z-matrix file writer.
  • notify_zmatrix_results() is called once after the entire search completes, providing the full set of matched edges and values. This is used by Graph_Add_Edges_ZMatrix to create edges in the graph.

Data Flow Summary#

The three-pass structure ensures that all objects are created before any listeners are attached, and all listeners are attached before any component executes. This ordering guarantees that the Observer/Subject wiring is complete before data flows through the system.

A typical data flow for a single timestep looks like:

  1. GRAPH_ADD_NODES_TRAJECTORY reads the trajectory frame and builds graph nodes
  2. GRAPH_ADD_ATTRIBUTE_NODE_* components add attributes to the nodes
  3. GRAPH_ZMATRIX_SEARCH searches for structures using the minimum image convention for PBC
  4. GRAPH_ADD_EDGES_ZMATRIX creates edges between matched atoms
  5. GRAPH_ADD_ATTRIBUTE_EDGE_* components add attributes to the edges
  6. GRAPH_ANALYSIS runs graph analyses (e.g., degree, modularity)
  7. GRAPH_WRITE_* components write output files